Hello !
Bad result for my question:
Udo de Boer and Arnold Sutter (thanks for your help, although it's
so disenchanting...) both sent the same bad news: It does not work
like we expected it should ! That's pretty bad for us, as we spent
a lot of money for the memory channel equipment to circumvent NFS
and waited for 5.1 to get what the documents pretend we can get.
Section 9.4 on direct access IO of the cluster administration doc
http://tru64unix.compaq.com/docs/cluster_doc/cluster_51/HTML/ARHGYCTE/TITLE.HTM
was totally misleading us into wrong assumptions (esp.9.4.1 and the sentence:
"Because dsk3 is a direct-access I/O device on the shared bus, all three
systems on the bus serve it. This means that, when any member on the shared
bus accesses the disk, the access is directly from the member to the device.")
Our main failure was to misinterpret the fact that the device request
dispatcher (DRD) is a lower level build in functionality below the CFS
driver. What we derived from all the docs was the following scenario:
1. the accessing host sends a message to the nominal disk server's CFS
layer.
2. the server CFS driver manages the locking on the file and sends a
request to the DRD layer.
3. the DRD layer is smart enough to find out that the accessor has
a direct connection to the disk, initiates the IO on the accessor
(maybe backwards via the CFS layer) and informs the CFS what's
happening below.
4. The CFS manages the necessary caching and unlocks after IO.
The misunderstanding occured in step 3. What's happening there probably more
looks like this:
3. the DRD layer gets a request from the CFS driver and checks if the current
machine the CFS request comes from (which is always the current SERVER
as this is the requestor to the DRD layer in the case of a CFS request!)
has a direct connection to the disk. If not, it simply notifies CFS on that.
It does not even know where the originating CFS access came from !
This seems to make the DRD layer totally unnecessary. But the DRD layer
can also be contacted directly from the accessing host (but only via
Direct IO or raw disk access) which means the CFS is just ignored, and
cache synchronization on the cluster members does not work automatically.
So the attribute of being a direct access device only means that it is
POSSIBLE to access a device directly, NOT that it is accessing it always
that way !
The main reason why it works as it works seems that CFS coherent caching
currently can only be done on one machine. Our assumptions on step 3
would imply that the data piece to be cached is on the accessor,
reintroducing the synchronization problem throughout the cluster.
A distributed cluster cache would be the solution to this problem,
but I would not like to be the person who must derive the complicated
logics to manage such a monster through the memory channel. Or is
there already work in progress to make such things possible ?
I would propose that Compaq adds some clarifiying notes at the tops of
chapters 9.3 and 9.4 to prevent others from misunderstanding this again.
As we now need to mimick the functionality missing here, I will post another
question separate from this summary.
====================================================================
Original question:
>> After killing advfsd, I finally was able to do some performance
>> tests on our TruCluster 5.1/EMA 12000 (HSG80) 8 member system.
>> Monitoring throughput using portperfshow on the switch, I noticed
>> that access to a disk on the HSG80 always goes through the current
>> server of the disk, although that disk is a direct access (don't
>> mix this up with direct IO!) disk on a shared bus (fibre channel
>> SCSI 2) as drdmgr shows:
>>
>> drdmgr dsk4:
>>
>> View of Data from member imkdec43 as of 2001-08-07:18:10:13
>>
>> Device Name: dsk4
>> Device Type: Direct Access IO Disk
>> Device Status: OK
>> Number of Servers: 7
>> Server Name: imkdec41
>> Server State: Server
>> Server Name: imkdec42
>> Server State: Server
>> Server Name: imkdec43
>> Server State: Server
>> Server Name: imkdec44
>> Server State: Server
>> Server Name: imkdec45
>> Server State: Server
>> Server Name: imkdec46
>> Server State: Server
>> Server Name: imkdec47
>> Server State: Server
>> Access Member Name: imkdec43
>> Open Partition Mask: 0x1 < a >
>>Statistics for Client Member: imkdec43
>> Number of Read Operations: 203871
>> Number of Write Operations: 11568
>> Number of Bytes Read: 14998511616
>> Number of Bytes Written: 94765056
>>----
>> cfsmgr /Incoming (is on dsk4)
>>
>> Domain or filesystem name = /Incoming
>> Server Name = imkdec47
>> Server Status : OK
>>
>> The disk has a domain#fileset which is mounted on /Incoming, the server
>> for / is currently the imkdec41. In the documents is stated explicitly:
>>
>> "Because dsk3 is a direct-access I/O device on the shared bus, all three
>> systems on the bus serve it. This means that, when any member on the shared
>> bus accesses the disk, the access is directly from the member to the
>> device."
>>
>> Which it is definitely not in our case ! Even when I force high throughput
>> with several large files written to that directory from all servers, the
>> access goes still through memory channel and the imkdec47.
>>
>> The only thing that might influence this could be that the current server
>> and the imkdec46 have an additional hop over a second FC switch to the
>> target controller, all others are directly connected over the main switch
>> (16 Port EL). Access pathes are explicitly set to all servers on the HSG80.
>>
>> Did I miss something in the setup for the disks ? Is this a bug ?
--
Dr. Udo Grabowski email: udo.grabowski_at_imk.fzk.de
Institut f. Meteorologie und Klimaforschung II, Forschungszentrum Karslruhe
Postfach 3640, D-76021 Karlsruhe, Germany Tel: (+49) 7247 82-6026
http://www.fzk.de/imk/imk2/ame/grabowski/ Fax: " -6141
Received on Thu Aug 09 2001 - 09:34:15 NZST