SUMMARY - Dump Problem

From: Burch Seymour RTPS <bseymour_at_encore.com>
Date: Thu, 25 Jun 1998 09:20:50 -0400 (EDT)

I had asked about a problem with a dump run which was done after hours,
backgrounded, unattended, and I found it had a partial failure with this
being the first clue:

dump: Cannot open device file /dev/nrmt0h

followed later by scores of:

dump: NEEDS ATTENTION: Do you want to retry the open?: ("yes" or "no")


No one came up with an answer that solved the mystery. Though some had similar
problems from time to time.

Alan from Digital suggested:

        Dump was never meant to run unattended.

Which is great in an ideal world, but it's also not a good thing to run it
during prime use hours either, and I've seen it bring a system to it's
knees. Still that's a philosophical issue, and doesn't address my problem.

He also suggests:

        For some reason, perhaps as the result of drive error or
        the need for a tape change, dump was unable to open the
        tape device. Use DECevent or uerf(8) to check the error
        for errors on the tape drive.

I checked uerf. No tape errors, in fact no errors at all in the last two
weeks. As to end of tape, Many files were dumped AFTER the error occurred,
and the error was in opening the device, not a lack of space.

Someone suggested checking the write protect tab on the tape. Again, other
dumps worked. I verified the save with restore just to be certain.

Lastly someone suggested that possibly I was being fooled by the mixing
of stderr and stdout messages. I suppose that's possible, but I wouldn't
think that they could be scrambled too awfully much, and with plenty
of success messages following the failure, I think something else is
going on.

I have a theory, maybe someone can tell me if this even makes sense.

It seems that dump forks off copies of itself when it runs. I don't know
how the various copies synchronize access to the tape drive which (I hope)
is a non-shared device. If one of the copies of dump got the idea that
the tape drive was available, when in fact it was not, it could very
well get an open error. If this condition was not anticipated in the
code, then recovery might be a problem. - Just a guess. Anyone out there
ever look at the source for dump?

So for me at least, the mystery remains. If I ever solve it I will
post another followup.

Thanks for all the messages.

-- 
This message from,		Encore Computer Corporation   MS/108
Burch Seymour                   6901 W Sunrise Boulevard
Senior Software Engineer	Fort Lauderdale, Fl 33313 
email: bseymour_at_encore.com      Vox: (954)316-4480   Fax: (954)316-4454
Received on Thu Jun 25 1998 - 15:18:09 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT