All...
Dunix V4.0e + pk2, GS140 system.
We added two z ESA10000 cabinets recently with a pair of dual-redundant
HSZ70 controllers running the whole cabinet. Our support groups are seeing
quite a lot of times where a plex in several of the volumes on this cabinet
are dropping out. They have resynchronised the volumes and all has been ok.
However, the error appears to have 'gone hard' during the evening. An
attempt to resynch one of the two volumes which lost plexs overnight fails
within a minute or two with the following error:
fsgen/volplex: volume busxyd102, plex cyanleftd102, block 178240 plex write:
error: write failure
fsgen/volplex: i/o error on plex cyanleftd102, not attached to volume
busxyd102
Looking at the HSZ, there are a large number of errors on port 1 target 10
and p1/t11. Unfortunately, the time is not set so we cant identify when
they occurred. All we do know is that they have definitely occurred in the
last month. All the disks themselves look ok though and no orange lights
are on.
The question is, how can I identify which physical disk (within the stripe
set) block 178240 is on? The 'disk' which Unix uses as its plex is a six
volume hsz70 based stripe set where the chunk size is 128 blocks. I am
assuming that the block the o/s is referring to is not a 512 byte block but
something larger, maybe 8k?
p1/t10 is one of the disks in the offending stripe set. Although I am happy
to get this changed, I would like to know with a greater degree of certainty
if this is the problem device.
DECevent gives me a lot of bus resets occurring relating to the appropriate
device (bus, target & LUN) but not much else that I can interpret.
Your help is anticipated.
Thanks - Tony
Received on Wed Nov 17 1999 - 13:55:53 NZDT