Hello Managers,
I'd like to thank David, J. DeWolfe, alan_at_nabeth, Brandon Addison,
Selden E Ball Jr., Bryan Lavelle, and Chris Ruhnke for their responses.
Nearly everyone believes that the AdvFs directory or domain has been
corrupted. The only way to check and fix it is to use
/sbin/advfs/verify, then try /sbin/advfs/salvage second-to-last resort.
If that dosen't work, I would need to recreate the domain, and restore
from backup.
After using verify, I now know I do have a bad directory, I'm getting
the message:
Cannot read fileset frag page [number] to cross check frag information
for file <MOUNT_POINT>/[dir name] or the fileset frag file may be
corrupted.
I am running salvage right now. But I'm sure I'll need to recreate the
domain.
Here is Alan's response:
Seems very clear. The directory is corrupt. The AdvFS
management documentation may have a recommendation, but
usually amounts to recreate the domain and restore from
the last good backup. You might want to see if verify
can fix it, but it isn't much for real repairs.
If you don't have a very recent backup, salvage may be
able to recover much of data.
The AdvFS management documentation (last I knew) was in
the documentation directory of the AdvFS utilities on
one of the Associated Product CDROMs.
You might also want to consider a physical backup of
the relevant disks, and logging a call with services.
If the corruption was due to a bug, it may be something
that can be tracked down and corrected.
You might also want to check the error log for I/O
errors or other hints of I/O subsystem corruption.
---Original question---
> I'm having no end of troubles with our ES40 which is running 4.0f with
> latest patches. We've isolated our rebooting problem to trying to
> delete a certain directory. It's on a HSZ70 RAID, and mounted as a
> 178GB AdvFs partition and connected locally to our ES40.
>
> On a certain directory, there are a few other large directories, seems
> like if there is an attempt to remove any of them it will result in the
> machine rebooting. I can touch a file and remove it in that directory,
> but I can't remove already existing files within that directory.
>
> In the crash dump file, I see the panic_string is "bs_frag_dealloc:
> pinpg (1) failed\n N1 = -1035"
>
> I also see ADVFS EXCEPTION
>
> Here's the /etc/sysconfigtab
>
> proc:
> max-per-proc-address-space = 38441425310
> max-per-proc-data-size = 38441425310
> max-per-proc-stack-size = 134217728
> maxusers = 2048
> per-proc-address-space = 38441425310
>
> vm:
> ubc-maxpercent = 90
> vm-maxvas = 38441425310
>
> vfs:
> bufcache = 1
>
> This machine has 16 GB of main memory, and about 20 GB more of swap.
> Which is why those numbers are so high.
>
> crash-dump file is available upon request. Email me if something is not
> clear.
--
Kevin Dea
UNIX System Administrator
Alpine Electronics Research of America
Received on Mon Jul 30 2001 - 18:05:15 NZST