OSF/1 server + SunOS client + special files ==> grief [ LONG ]

From: Mark Bartelt <sysmark_at_chipmunk.cita.utoronto.ca>
Date: Wed, 12 Apr 95 10:48:55 EDT

Apologies in advance for the spam-like crossposting, but it's hard to tell
whether this is an OSF/1 bug, a SunOS bug, a combination of the two, or an
ambiguity in (or omission from) the NFS spec (in which case, I suppose one
could say that neither vendor is at fault, but I would nonetheless have a
major problem). Anyway ...

We have several diskless Sparcs and (don't laugh! ;-) Sun3 systems, which
for many years have been served by Sun and SGI systems. But since we've
acquired several DEC AXP boxes, which are by far our fastest machines, I'd
hoped to move the diskless systems' root, /usr, /var, and swap areas onto
the AXPs.

The first thing I did was to move the swap areas. No problem. Then, when
I tried to move the /var partitions, things broke. This turned out to be
a bug in in OSF/1 V2 (attempts to create named pipes on an OSF/1 NFS server
would fail (what got created instead was a character-type special file with
major and minor device numbers == 0, if I recall correctly), so creation of
/var/spool/cron/FIFO would fail if /var was NFS-mounted from an OSF server)
which was reported to be fixed in V3. Sure enough, when V3 came out and we
upgraded our AXPs to that release, we were able to put the diskless systems'
/var on an OSF/1 server.

Then I decided to move the root partitions. No joy. To skip the (possibly
interesting, but ultimately irrelevant) intermediate details, the executive
summary is this:

        Attempts by a SunOS 4 client to use a special file which
        lives on an OSF/1 NFS server simply don't work.

My first tendency would be to suspect an OSF/1 bug, but surprisingly an IRIX
client can use NFS-mounted special files from an OSF server without problem
(details below). So now I'm not sure where the problem is.

One probably-relevant fact, of course, is the fact that the major and minor
device numbers are different sizes on the AXP than they are in SunOS (twenty
bits and eight, respectively). However, I'd expect (or at least hope) that
the client and server would work this out between them. I've appended some
examples of things not working with SunOS (followed by an example of things
working correctly with an IRIX client and an OSF server), with commentary
interspersed.

For the NFSspec-savvy, what *are* the semantics of special files on an NFS
server? Particularly in cases where the client and server have different
concepts of how many bits the major and minor device numbers take up? Are
things supposed to work? Or is it one of those "sorry, not guaranteed to
work anywhere; if it worked for you elsewhere, you were just lucky" kinds
of things?

Since it's rather critical to move our diskless systems' stuff off the IRIX
server (it's old, and no longer on maintenance, and is probably near the end
of its days), I'd like to know whether the problems are being caused by:

-- An OSF/1 V3 bug (and, if so, whether it's fixed in 3.2, or will be fixed
    in 3.3 (or 4.0, or whatever the next release will be called)).

-- A SunOS bug. (If so, I guess I'm out of luck, unless there are patches
    that I don't know about. We have no plans to switch to Solaris 2 in the
    immediate (or even distant) future, so a "fixed in Solaris 2" comment is
    something I'd find interesting, but ultimately not useful.)

-- A combination of the two, requiring bugs at *both* ends for the problems
    to be seen. (That might explain why an IRIX client doesn't suffer from
    the same problem as the SunOS client, even when both use the same OSF/1
    server.)

Hints, suggestions, workarounds, wild theories, or whatever are appreciated.

---------------

# Commands prefixed by "DEC>", "SUN>", and "SGI>" are issued
# on OSF/1, SunOS, and IRIX systems, respectively.

# System "decbox" exports /sun_root to the Sun and SGI systems,
# with "-root=0".

DEC> mkdir /sun_root/dev

# Here we see things not working from a SunOS client.

SUN> mount decbox:/sun_root /mnt
SUN> mknod /mnt/dev/null c 3 2
SUN> mknod /mnt/dev/tty c 2 0
SUN>
SUN> ls -l /dev/null /dev/tty /mnt/dev/null /mnt/dev/tty
crw-rw-rw- 1 root 3, 2 Apr 11 14:23 /dev/null
crw-rw-rw- 1 root 2, 0 Apr 11 14:21 /dev/tty
crw-rw-rw- 1 root 3, 2 Apr 11 14:30 /mnt/dev/null
crw-rw-rw- 1 root 2, 0 Apr 11 14:30 /mnt/dev/tty
SUN>
SUN> echo foo >/dev/null
SUN> echo foo >/dev/tty
foo
SUN>
SUN> echo foo >/mnt/dev/null
/mnt/dev/null: File exists
SUN> echo foo >/mnt/dev/tty
/mnt/dev/tty: File exists
SUN>
SUN> rm /mnt/dev/*
SUN> umount /mnt

# Here we see that things *do* work from an IRIX client.

SGI> mount decbox:/sun_root /mnt
SGI> mknod /mnt/dev/null c 1 2
SGI> mknod /mnt/dev/tty c 2 0
SGI>
SGI> ls -l /dev/null /dev/tty /mnt/dev/null /mnt/dev/tty
crw-rw-rw- 1 root sys 1, 2 Apr 11 14:26 /dev/null
crw-rw-rw- 1 root sys 2, 0 Apr 11 14:26 /dev/tty
crw-rw-rw- 1 root daemon 1, 2 Apr 11 14:35 /mnt/dev/null
crw-rw-rw- 1 root daemon 2, 0 Apr 11 14:35 /mnt/dev/tty
SGI
SGI> echo foo >/dev/null
SGI> echo foo >/dev/tty
foo
SGI>
SGI> echo foo >/mnt/dev/null
SGI> echo foo >/mnt/dev/tty
foo
SGI>
SGI> rm /mnt/dev/*
SGI> umount /mnt

---------------

Additional comments:

Of course, when viewed from the server side, major and minor device numbers
for special files created by the client are wrong: Major device numbers are
always zero, and the minor device number shows up as (major<<8)|minor, which
is what you'd expect, given the different (eight-bit vs twenty-bit) sizes of
these things on SunOS and OSF/1 ...

DEC> ls -l /sun3_root/dev/null /sun3_root/dev/tty
crw-rw-rw- 1 root daemon 0,770 Apr 11 14:30 /sun3_root/dev/null
crw-rw-rw- 1 root daemon 0,512 Apr 11 14:30 /sun3_root/dev/tty

Just in the off chance (unlikely, but hey, worth a try) that the major and
minor device numbers should be correct when viewed from the server end (but
with values matching what the client expects them to be), I also tried the
following:

DEC> mknod /sun3_root/dev/NULL c 3 2
DEC> mknod /sun3_root/dev/TTY c 2 0
DEC> ls -l /sun3_root/dev/*
crw-rw-rw- 1 root system 3, 2 Apr 11 14:42 /sun3_root/dev/NULL
crw-rw-rw- 1 root system 2, 0 Apr 11 14:42 /sun3_root/dev/TTY
crw-rw-rw- 1 root daemon 0,770 Apr 11 14:30 /sun3_root/dev/null
crw-rw-rw- 1 root daemon 0,512 Apr 11 14:30 /sun3_root/dev/tty

Of course, the major device number for these new files always appear to be
zero when viewed by the client (again, it's what you'd expect):

SUN> ls -l /mnt/dev/*
crw-rw-rw- 1 root 0, 2 Apr 11 14:42 /mnt/dev/NULL
crw-rw-rw- 1 root 0, 0 Apr 11 14:42 /mnt/dev/TTY
crw-rw-rw- 1 root 3, 2 Apr 11 14:30 /mnt/dev/null
crw-rw-rw- 1 root 2, 0 Apr 11 14:30 /mnt/dev/tty

Whatever, it doesn't matter. Attempts to write to /mnt/dev/{NULL,TTY} give
the same problems as writing to /mnt/dev/{null,tty}. (Also, trying it with
an IRIX client gives a result which make it clear that the major and minor
device numbers should look correct when viewed by the client, and not when
viewed by the server (unless, of course, the client and server agree on such
things, in which case things will look the same both places).)

---------------

Mark Bartelt 416/978-5619
Canadian Institute for mark_at_cita.toronto.edu
Theoretical Astrophysics mark_at_cita.utoronto.ca

"Clothes not busy being worn are busy drying." - Dylan, on laundry day
            [ singing "It's all right, ma (I'm only bleaching)" ]
Received on Wed Apr 12 1995 - 10:49:38 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT