SUMMARY: Corrupt or missing disk label on system disk

From: Remsing, Steven \(SR\) <"Remsing,>
Date: Tue, 15 Jan 2002 10:25:04 -0500

SUMMARY:

First I would like to say thank you to everyone that replied to my request.
A list follows the summary. Several people gave the same suggestions, there
was no one winning answer.

To start with many suggest I verify that the system was booting from the
correct disk, point out that if the CMOS battery was dead it was possible
that the default boot disk setting was incorrect. Good suggestion but my
system only had one disk. I'm sure most already know this but you can list
the list of devices with "show dev" from the >>> prompt, show dev for my
system reported:

dkb0 - the CD-ROM
dkc0 - the RZ2CC
dva0 - the floppy

Next several people suggested using dd to make a copy of the disk (either to
tape or another disk) before attempting any type of recovery. A very good
idea. Normally I would do this but in this case I did not (explained
later).

Many people pointed out that /etc/disktab is obsolete, it can be ignored.

Lastly, several people warned that yes indeed scu -f format would damage any
data that remained on the disk. The general suggestion was to use disklabel
-wR with the label from one of the other systems. Originally I had been
told the partition information was different between the systems but after
further investigation that proved incorrect. During that investigation we
also confirmed that the filesystems were AdvFS.

Using disklabel I was able to write a valid label on the disk. The advscan
and verify commands failed when I attempted to use them to recover the
filesystems. On an attempted reboot the system now saw a valid boot block
but could not open osf_boot. After a little more investigation it was
determined the filesystems had been seriously damaged and a re-install was
the best solution.

The reason I did not make a dd image and the re-install was the final
solution is during the investigation it was determined that the amount of
data lost would be minimal as the data was recently copied to another
system. The cost of the downtime was exceeding the cost regenerating the
data.

I have suggested the system owner make regular backups of the systems in the
future. Also for anyone faced a similar problem, I recommend the Disaster
Recovery for Digital UNIX guide I found at
http://www.siscom.net/~welter/professional/dunix-quickref/disk-recovery.html
.

Thank again to the following people (apologizes to anyone I missed):
Franz Fischer - franz.fischer_at_franz-fischer.de
Joe Fletcher - joe.fletcher_at_metapack.com
Kevin Partin - kevin.partin_at_compaq.com
James Sainsbury - j.sainsbury_at_chem.usyd.edu.au
Selden E Ball Jr. - seb_at_lns62.lns.cornell.edu
David J. DeWolfe - sxdjd_at_java.sois.alaska.edu
Tom Linden - tom_at_kednos.com
Brenden Phillips - b.c.phillips_at_massey.ac.nz
Pat O'Brien - pobrien_at_mitidata.com
Alan - alan_at_nabeth.cxo.cpqcorp.net
Dr. Thomas Blinn - tbp_at_doctor.zk3.dec.com


ORIGINAL QUESTION:

Fellow Alpha OSF Managers,

I have been asked to help recover the system disk in a Alpha 433au system
that has a missing or corrupt disk label. Unfortunately I do not manage
this system and have limited Digital UNIX experience and no manuals (limited
man pages). To make matters worse, the owner of the system does not have
any backups of this disk.

The history of the problem is this:

The battery for the CMOS is dead. Not knowing this, someone turned the
system off. When they turned it back on, it tried to boot in Windows NT.
That failed since this is a UNIX system. I don't know what happened after
that. Later the system owner found directions to set the System Console to
UNIX and attempted to boot the system. The boot fails with the following
error message:

block 0 of dkc0.0.0.1004.0 is not a valid boot block

This is where I came in. I booted from the Digital UNIX 4.0E CD and went to
a shell prompt to try to examine the disk. I don't know if the filesystems
on the disk are AdvFS or UFS. The system owner claims AdvFS but I can't be
sure. I tried advscan but could not find anything because the disk label is
corrupt. I tried reading it with disklabel -r rz16 but all I get is:

Disk is unlabeled or, /dev/rrz16a is not in block 0 of the disk.

If I just use disklabel rz16 I get:

Invalid disk label (label is corrupt or disk is unlabeled)

Searching the mailing lists and newsgroups I found a few people who had
similar problems. The suggested fix was to run: scu -f /dev/rrz16c format.
My problem is that I don't want to damage the data on the disk. I assume
this command will erase the disk, is that correct? I didn't see any thing
in the man page that said exactly what 'format' does.

Is there any way I can rebuild a disklabel without damage to the filesystems
on the disk? I have access to two other 433au systems with the same disk
drive but with different partitions. Does the disk label contain the
partition information? Can I copy a disk label from one system to another?
The disk in question appears to be a RZ2CC-KA, but I can't find that in the
/etc/disktab file.

Thanks in advance. Please reply to me and I will summarize to the mailing
list.

Steve Remsing
The Dow Chemical Company
UNIX System Administrator
1776 Bldg / C-211
Phone: 989-636-2949
Fax: 989-638-9707
Email: sremsing_at_dow.com
Received on Tue Jan 15 2002 - 15:25:25 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:43 NZDT