Hi,
I have an AlphaServer 800 4/400 with a Mylex DAC960 KZPSC RAID
Controller (SWXCR) and 4 disks, 1 JBOD and 3 as a raid5 set. The
machine has LSM installed and most of the filesystems are AdvFS on LSM
except for root which is ufs (not LSM encapsulated). This config
seemed ok with DU4.0D.
I upgraded to T64v5.1 and applied T64V51AS0003-20010521 Aggregate
Patch Kit, rebuilt the kernel and all looked ok. I also needed to make
some changes to the LSM config, and I think I may have made a mistake
here, certainly I get an error at boot time, though it doesn't show up
with any of the vol* commands I've tried and the system seems to work
ok _EXCEPT_ disk accesses seem very slow. During a normal boot, I see
the error:
lsm:volio: Cannot open disk dsk0f: kernel error 16
then it carries on as normal. Similarly, if I boot single user and
mount the /usr and /var aprtitions (as instructed in the patch install
docs) I get:
lsm: volio: Illegal vminor encountered
Error: /dev/vol/rootdg/vol_var is an invalid device
or cannot be opened
and again, running /sbin/bcheckrc gives:
starting LSM
lsm: volio: Cannot open disk re0f: kernel error 16
This partition is marked as LSMsimp in the disklabel and contains the
/usr and /var volumes, one plex each. Using the vol* commands seems to
give the results I would expect, but I'm not really very knowledgeable
about LSM. Certainly vold is runnig and there are two voliod threads,
and other checks in the LSM manual troubleshooting section all seem
ok. The only other wierdness I can see is that the
/dev/vol/rootdg/vol_var device has an unusual timestamp, ie:
brw-r--r-- 1 root daemon 40, 7 Jul 12 21:43 vol_misc
brw-r--r-- 1 root daemon 40, 5 Jan 1 1996 vol_var
This may not be related, but using the Compaq (unsupported) monitor
tool, I see the system spending what seems like a lot of cycles in
'wait' and often the queue on the disks seems quite big.
This can be a real problem for disk intensive operations, for example
I was using a simple application that uses SleepyCat DB to create an
on-disk DB of about 1.8GB. I ran the process niced down, but the whole
system ground to a near halt. I took the same code to another AS and
it ran with minimal impact on the system, same with my own Linux PC,
so it looks to me like there is a problem on the main machine.
I ran swxcrmgr and checked the disks for errors, all but one had zero
errors, the last one (part of the raid set) had 127 misc errors,
should I get Compaq to swap it out? I did notice that the raid set
sometimes seems to spend more time accessing one disk (the same one I
think) than the others, but then again, a VMS system next to it does
the same.
Any suggestions for how to further diagnose this would be much
appreciated, much more info available on request :-)
Thanks,
Simon
--
Simon Greaves Voice: +679 212114
Systems & Networks Fax: +679 304089
ITS, USP, Suva Email: Simon.Greaves_at_usp.ac.fj
Fiji
Received on Mon Sep 03 2001 - 07:18:33 NZST