SUMMARY: what does TruCluster give me? from Monjar, Daniel on 1998-10-22 (tru64-unix-managers)

From: Monjar, Daniel <Monjar_at_orgtek.com>
Date: Wed, 21 Oct 1998 10:08:34 -0400

I'm not in Kansas anymore. Current version of TruCluster (1.5) doesn't
support VMS style clustering. V2.0 with DUnix 5.0 is supposed to have this
capability. Right now I can setup an NFS service that will fail over to
another box but that is about as far as it goes.

Thanks to:

alan_at_nabeth.cxo.dec.com
Balaji Chandrasekar [digital_at_astro.ocis.temple.edu]
Dr. Tom Blinn, 603-884-0646 [tpb_at_doctor.zk3.dec.com]
Vipin Gokhale, Compaq SBU, Oracle Corporation [VGOKHALE_at_us.oracle.com]
Randall R. Cable [randy.cable_at_mci.com]
Bruce Hines [Bruce.Hines_at_mci.com]
Lars Bro [lbr_at_dksin.dk]
C.Ruhnke [i769646_at_smrs013a.mdc.com]
Randy Rodgers [randy.rodgers_at_ci.ft-wayne.in.us]
K.McManus_at_greenwich.ac.uk
Ryan Ziegler [Zieglerr_at_novachem.com]

All of the respondents said basically the same thing. For completeness and
the archives I'm going to insert a couple of the replies I got.

First from Chris Ruhnke:
--------------------------------------------
In 25 words (more or less)...

TruCluster doesn't (yet) give you True Clustering in the VMS sense.
Primarily
it gives you fault tolerant disk and application servicing. A UNIX
filesystem
can be physically mounted on only one DU system as of DU 4.0D and TCR V1.5.
The other members of the cluster must access the filesystem via NFS. Of
course with Memory Channel available for TruCluster, these cluster NFS
accesses
can be done more quickly than via Ethernet. The fault tolerance comes into
play when a member of the cluster dies. The disk service (or application
service e.g. database server) can be restarted on a surviving member and
accesses to the disk/application can be resumed -- something VMScluster
did not automatically provide.

Hope that made things a little clearer.

----------------------------------------------

and some good history from Lars Bro:

----------------------------------------------

You cannot do this kind of mounts. DU 5.0 should be able to.

History:
        DECSafe Available Server was the first product to let
        DU systens communicate over a shared SCSI bus and the
        ethernet to decide who were alive. All systems could
        see all disks via the shared SCSI but if two or more
        systems mounted the same filesystem, crash was inevitable.

        It is possible to do reservations on the shared bus, ie.:
        a member reserves a disk and it is now impossible for other
        members to access that disk. Unfortunately, when a system
        boots, it resets all devices on the shared buses meaning that
        all reservations disappear. There is a menu in the management
        program ´asemgr´ to re-reserve the disks according to ASE´s
        understanding of what is the state. Unfortunately, it is at the
        time when systems crash or reboot problems occur and the above
        resetting thing is a great risk. I have personally seen clusters
        go down due to more than one machine mounting the same filesystem
        (and properly logging this) just because of some misunderstanding
        among the machines.

        This is the worst problem since ´failover´ cannot occur
        before all filesystems have been properly unmounted. And
        if just one process has its current directory in a file
        system that is owned by a service, this service will not
        be able to fail over unless the system is rebooted.

         This is normally dealt with by use of the system call fuser(2)
        thet is able to provide a list of processes that have files
        open on a given filesystem. You kan then kill those processes.
        (however, a strange property of DU is that a zombie process,
        that once it was alive, executed a file that were located on
        the filesystem still holds the lock although it is completely
        deallocated and therefore not visible by fuser(2). It is thus
        possible to have a situation where a service cannot failover
        and the reason cannot be detected). Digital claim that this is
        not an error but merely a dispute over the design. The standards
        do not specify what exactly shall be deallocated upon exit(). I
        have though tried the same on Solaris and that one also frees
        the lock on the executable.

        With TruCluster came the memory channel, a wire that could connect
        the systems and the lock manager. The idea was to have processes
        of different members sharing disks. Today, RDBMS´s like Oracle
        can do this. You will then be forced to have Oracle place its
        tablespaces on ´raw devices´ so that it can manage its own locks
        by the lock manager.

        In DU5.0 (so I am told) the filesystems may also take advantage
        of the lock manager. This will enable you to have the same
        filesystems mounted on more than 1 member. And you may put Oracle
        on such a filesystem instead of having Oracle manage the locks.
---------------------------------------------------

Daniel Monjar
Manager, Systems
Organon Teknika
Mailto:Daniel.Monjar_at_orgtek.com

> -----Original Message-----
> From: Monjar, Daniel [mailto:Monjar_at_orgtek.com]
> Sent: Tuesday, October 20, 1998 4:02 PM
> To: 'alpha-osf-managers_at_ornl.gov'
> Subject: what does TruCluster give me?
>
>
> I have three 4100s and an RA7000. I'm using TruCluster 1.5
> and Unix 4.0D.
> I need some pointers to some docs that will tell me what I can do with
> TruCluster. I have a lot of experience with clustering on
> VMS but I have a
> feeling I am in a different world with TruCluster.
>
> To give you something specific to answer: I have created a
> disk set on my
> HSZ and formatted it as a AdvFs file system. I want each of the three
> 4100's to mount this file system and see what the others see,
> just like VMS
> makes possible. Can I? The TruCluster stuff talks about a
> distributed lock
> manager which sounds like a VMSish thing. Is it the same?
>
> Daniel Monjar
> Manager, Systems
> Organon Teknika
> Mailto:Daniel.Monjar_at_orgtek.com
>
Received on Wed Oct 21 1998 - 14:09:27 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:38 NZDT