SUMMARY Hot-swapping in BA350

From: <pat_at_krl.caltech.edu>
Date: Thu, 8 Feb 1996 09:46:01 -0800 (PST)

        I posted the following question regarding the hot-swap capabilities
        of a BA350. As usual I was impressed with the helpful nature of so
        many alpha managers. In this case opinions were not unanimous and I
        have included all answers received and extend my thanks. At the end
        I relate my personal experience.

| When I bought my 3000/400 alpha 2 1/2 years ago, a big selling
| point was the BA350's hot-swap capability. Then the first time
| I needed a swap (DAT tape, TLZ06) the DEC service guy asked me to
| shut down and power down the BA350. My question: does anyone out there
| have experience hot-swapping devices in the BA350? If so, do you
| shutdown the OS? I would think that pulling the drive would be
| equivalent to powering off a device (which was not the provider
| of TERMPOWER) on a SCSI bus. That is, OK if the OS wasn't doing
| anything with it at the time.

        REPLIES:

--
From: "James T. McDuffie" <jt_at_mcduffie.net>
    I've had a BA350 on my 3000/800 ever since there was a model 800.  We
    use it for lots of extra disks.  We've also done hot swaps of disk
    drives when needed.  At most, a device reset via scu is required, but
    this was an extreme case.  Only once did I have to re-boot the system.
    I do believe that most service techs are conservative by nature.  And
    this is not a bad thing, generally.  But you have to make the final
    call: can you afford the down time of the reset?  Is what ever risk
    there may be acceptable? etc, etc, etc.
    Yours,
      JT McDuffie
-- 
From: Tim Llewellyn <tjl_at_siva.bris.ac.uk>
Reply-To: tjl_at_siva.bris.ac.uk
What I heard is that the storage works cabinets
do have hot-swap capability but the SCSI controllers
in the 3000-400's do not. Thats a bit vague but all my
attempts to hot-swap disks have required me to reboot
eventually.
Hope this helps.
--
From: Arrigo Triulzi <arrigo_at_lpac.ac.uk>
Reply-To: arrigo_at_lpac.ac.uk
It was marketing hot air. You can "warm" swap which means that you
don't need to power down but the bus needs to be quiescent. I
discovered that by quiescent it basically means turn everything
off. The SCSI controller in my 3000/500, even with no accesses on the
bus hung the system the moment I tried swapping an HD.
Ciao,
	Arrigo
--
From: aidan_at_cse.unsw.edu.au (Aidan Williams)
Yep.  I have ripped out tape drives just like that.
Seems to work OK, just as long as you aren't talking to it at the time.
I've even put in new disks and mounted them, fiddled, umounted and pulled
them out.
Hot swapping with RAID-5 also seems to be fine.
regards
	aidan
--
From: "Dr. Tom Blinn, 603-881-0646" <tpb_at_zk3.dec.com>
Sounds to me like the server person was being overly cautious.  As far as I
know, the on-board SCSI controller on the 3000/400 is compatible with the
hot swap features of the BA350.
Was the service person wearing both a belt and suspenders?
Tom
--
From: Kurt Carlson <SXKAC_at_orca.alaska.edu>
Subject: Re: Hot swapping in BA350
Pulling or plugging devices is primarily an issue for the
rest of the bus.  In a ba350 connected to an hsz40 you
quiesce the bus at the controller (there's a procedure for
it) and you can safely pull or push a brick.  They don't
call it "hot swap" because you do first quiesce the bus
and the hsz40 has enough smarts to keep the host happy
for the 30 seconds you have to deal with it.  As for 
pulling a disk in a ba350 connected to a host scsi controller...
sometimes it will work, sometimes it won't... likely will
over half the time (particularly on a non-busy system).
what can go wrong?  i once saw a new disk which was doa 
new: plugged into an hsz40 backed ba350 it masked the 
entire bus causing two raidsets to failover to the spareset, 
plugged into a powered off 2100 internal shelf, no disks 
were recognized on power up.
if you have the choice, shut it down.  if it's not going
to seriously impact others, you may be willing to risk it
on the fly.   kurt carlson, u of alaska
--
From: alan_at_nabeth.cxo.dec.com (Alan Rollow - Dr. File System's Home for Wayward Inodes.)
I've hot swapped a lot.  Doing it while a device is active is
probably bad.  Doing it while the bus is active, may be dicey,
but the BA350 and the carriers were designed to reduce the
chances of things going wrong.  Some versions of Digital UNIX 
remember the SCSI device type that was at a target and will 
continue to insist that a device was the first thing it found 
there (swapping a disk for a tape).  Within device class it 
seems to go well.
--
From: David Gempton <ttcdg_at_cyberspace.co.nz>
One of the things I used to do at DEC was argue the point the BA350 does NOT sup
port hot swap !
It seems that all sales staff and a lot of field service engineers have been con
vinced that it does support hot swap.
In test that I have done where I kepted the SCSI BUS busy writing to one disk dr
ive while I pulled out another, I found that by pulling out the inactive device 
from the BA350 you would corrupt the SCSI transactions going to the active devic
e about 20% of the time. This is not what should happen with hot swap support.
In further test I did with a SWXCR RAID controller configured for RAID 5 and als
o RAID 1 connected to the storage array inside a 2100, the pulling of drives and
 inserting of drives did not corrupt the SCSI transactions. This is what you wou
ld expect from hot swap support.
David Gempton
TTC New Zealand
--
	WHAT HAPPENED:
	I was in a bit of a hurry and didn't want to take down the server if
	possible. After receiving a few responses which seemed to indicate that
	pulling a tape drive would be harmless, I did so (syncing the disks first).
	I should mention that the BA350 also houses (but never hoses) two very
	busy RZ74s. No ill effects were observed.
	Unfortunately my service guy (non-DEC) failed to reinstall the drive properly
	in the Storage Works carrier. We stuck the tape back in (again syncing the 
	disks (actually, I guess this is a kind of religious observance on my part)) 
	and the system didn't see it (of course).
	Eventually, with the help of a rather bright graduate student, all connections
	were made. We figured out how to set switches etc. by taking the system
	down and doing ">>>SHOW DEVICE" commands at the monitor. At one point we 
	had the SCSI address set to 7 which caused the tape to occupy every unused
	address on the bus. I assume that if we had plugged this configuration into
	a running system, it might have been be bad. We did conclude that a 120
	reset on the BA350 (with the OS down) had no effect insofar as recognition
	of the devices was concerned.
	CONCLUSIONS:
	
	(1) "warm" swapping (OS down) is clearly OK.
	(2) I definitely yanked a tape from a running OS with no ill effects.
	(3) Some time when I don't have 30 people depending on the server I'll	
	perform the experiment of removing and reinstalling a tape on a running
	system. Unfortunately, we didn't get to do that here.
	(4) Be real careful about resetting EVERYTHING on the device before putting
	back in.
		-pat huber (pat_at_krl.caltech.edu)
Received on Thu Feb 08 1996 - 19:07:58 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT