SUMMARY: disk failure or hack?

From: Scott Taylor <smt_at_gamma.physics.uiowa.edu>
Date: Mon, 27 Sep 1999 17:12:47 -0500

I know this was a wierd one, and the jury is still out. Only four replies --
they are listed below.
Thanks to: John J. Francini
           Oisin McGuinness
           Thomas M. Payerle
           alan_at_nabeth.cxo.dec.com
           
-------------------------------------------------------------------
I'd be thinking in terms of some sort of truly bizarre file system
screwup. Note that whenever you set your working directory (via cd),
that directory file is held OPEN by the shell until you cd out of it.
(This is why a umount for a user disk will fail if anyone is still
cd'ed there.)

So, if you had a bunch of windows whose shells were cd'ed to
different places in your directory tree, and if the filesystem
somehow burped and started using the handles into those directory
files instead of whatever it was supposed to be doing, it could
create the mayhem you saw.

I have seen something very similar to this on other OSes, so it's not
beyond the realm of possibility.

p.s. You didn't specify what version of Digital UNIX/Tru64 UNIX you
were running...

Just my $0.02.

-----------------------------------------------------------------------
I once saw a case on a brand-new AS1000 where we inadvertently ran CDE for
a while (we generally take it off servers, and run them with serial consoles
multiplexed), and the /var/dt directory got thoroughly hosed on an AdvFS volume,
so much so we had to reinstall.
I would suspect some part of the CDE stuff did evil things to your files.
Of course, proving that to DEC's (oops, Compaq's) satisfaction may be hard.

----------------------------------------------------------------------------
 it is unlikely that a sector failure would affect only one person's files. Or
DEC uses 512 byte sectors, so doesn't seem unreasonably that entire sector
was filled with one user's files. If that sector had contained a directory
in addition to normal files, could easily have damaged a lot of data.

---------------------------------------------------------------------------
        Disk failure. In addition to caching data, the kernel caches
        name/inode translation information (the namei cache). For a
        sufficiently small working directory, the information that
        ls(1) needs is easily cached. However, once it had to go
        outside that it (presumably) started getting I/O errors,
        which it passed on as "file not found". I'd guess that
        you were using UFS. AdvFS would probably have paniced at
        the first I/O error.
--------------------------------------------------------------------------

Original Post:

I had a peculiar event on my AlphaStation 500 last week and I'm not quite sure
what to make of it. I had been logged in overnight but my screen was locked,
and after I unlocked it in the morning I found that typical unix commands would
not work or worked erroneously and that many of my directories were empty.
Oddly enough an ls command would list the files but ls -l would indicate that
the files could not be found.

I logged out meaning to log back on as root but before the logout was completed
the blue screen came up with a continuous stream of errors reported. I shut
down and restarted; fsck ran fine on all but the /usr/users/ partition. There
it required a manual fsck. I had to run that twice because on the first attempt
I was trying to be somewhat judicious as to what I took out. All the problems
were in my personal directory. When the machine came up what was left of my
home directory was mangled. Other users on the system were not affected
apparently. This is what bothers me. If the disk had failed wouldn't more than
my directory have suffered. Files are physically spread across the disk so that
it is unlikely that a sector failure would affect only one person's files. Or
am I wrong about this? Fortunately I had a recent backup, and since restoring
my files all seems well -- well almost. The system and binary logs didn't
record anything strange. Also, there weren't any jobs or applications running
on the system when the trouble occurred. My directories were being used for
code development -- nothing there that would affect the system.

My question is: Are these symptoms of disk failure or does this look like a
malicious hack? I considered myself lucky in that I haven't suffered either
before, but at the moment I'm a bit bewildered. What should I be looking for?
Any commentary will be greatly appreciated, and I'll summarize, of course.

Thanks,

Scott Taylor
smt_at_gamma.physics.uiowa.edu
Received on Mon Sep 27 1999 - 22:17:27 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT