Original question (shortened a bit) :
>>>Greetings !
>>>
>>>Scenario : Alphaserver 8400, HSZ70, RAID5, DU 4.0B Patch #7
>>>
>>>A vdump is taken on a cloned filesystem (base7, a database location),
>>>while there is a fair amount of activity on the database (don't know
>>>specifically about the activity on this location). And then, suddenly...
>>>
> >>>>From DECevent :
>>>
>>>SWI Minor class 9. ASCII Message
>>>SWI Minor sub class 1. Panic
>>>
>>>ASCII Message panic (cpu 0): can't pin page
>>> N1 = 5
[snip snip snip ]
Thanks to :
Tony McGovern (Tony.J.McGovern_at_aib.ie)
Allan Rollow (alan_at_nabeth.cxo.dec.com)
Summary:
Tony reminded me of pointing out that the file system
was about at 50% capacity. Disks at 95% are known to cause
severe discomfort to cloned filesystems.
Thus spake Alan :
>>>Part of the problem, may be that between the RAID and AdvFS
>>>somebody isn't waiting long enough for the RAID to recover
>>>from an error. AdvFS doesn't handle I/O errors well. It
>>>shouldn't have paniced the whole system, which may be one
>>>of the leftover bugs of trying to convert as many system
>>>panics to domain panics as possible.
>>>It is also possible for RAID-5 to have real errors. The
>>>HS family of controllers keeps track of which blocks of
>>>data should be readable. If you try to read one that it
>>>knows doesn't have the correct data and it can't regenerate
>>>the correct data, it will produce an I/O error. There was
>>>an old bug in the HSZ40 or HSZ50 that cause it do this rather
>>>more than it was supposed to, but I'm pretty sure that got
>>>fixed by the time the HSZ70 was released.
We're still tracking for hardware problems. Maybe this is a
4.0B - only related issue. And I understand there are no patches
higher than #7 for this version....Oh well.
Thanks again !
Miguel Fliguer - MINIPHONE S.A.
Buenos Aires, Argentina
Received on Mon Nov 16 1998 - 14:27:05 NZDT