rpc.rquotad problem with TCRv1.4 and DU4.0b

From: Todd V. Minnella <minnella_at_husc.harvard.edu>
Date: Sun, 9 Mar 1997 16:47:54 -0500 (EST)

Problem Summary:
rpc.rquotad on a pair of AlphaServer 4100s, each running TruCluster
Available Server Software v1.4 and DU4.0b, fails to respond to
quota requests for local or ASE filesystems that are NFS mounted on other
UNIX systems (DU and non-DU). DEC Engineering has replicated the problem
and is working on a fix.

Question(s):
Has anyone else seen this? Is anyone else running ASE v1.4 on DU4.0b? Does
someone know a solution?

Detailed Problem Info:
We're building two core servers to handle NFS service for our other
systems. These core servers are AlphaServer 4100s, with multiple SCSI
cards and network interfaces, front-ending a SW800 cabinet with a pair of
dual-redundant HSZ50s.

We brought in a DEC Consultant, and with his assistance, installed Digital
UNIX v4.0b and ASE v1.4, along with the Dec. and Jan. patch kits. With the
exception of rpc.rquotad, everything is working as we expected; the ASE
software nicely adds much-needed redundancy and reliability to our NFS
services. Unfortunately, we can't go live on these systems until
rpc.rquotad is working.

The problem with rpc.rquotad is very simple - it doesn't respond to quota
requests on these two systems (and ONLY on these two systems - we have
other DU4.0b NFS servers on which rpc.rquotad works fine). The request can
be for an ASE NFS service, or for a local filesystem (not an ASE service);
it makes no difference. When rpc.rquotad is probed with "rpcinfo -u
server_name 100011", the request times out. If rpc.rquotad is killed,
rpcinfo requests are processed normally by the newly restarted daemon, but
once rpc.rquotad receives a quota request from another server, the daemon
refuses to respond to further quota queries, or to rpcinfo probes. It
makes no difference if quotas are enabled or disabled on each AdvFS
filesystem; no rpc.rquotad request is ever fulfilled. The username also
makes no difference.

Trying quota on a client system produces a 40-second delay for each NFS
filesystem served by these two systems mentioned in /etc/fstab. Quota
information for these filesystems is never returned, although quota
information for all NFS filesystems _not_ served by this ASE cluster is
returned normally. Commenting out the filesystems served by these two
boxes in /etc/fstab causes quota to return almost instantly with correct
information (well, correct except for the fact that it doesn't include the
ASE filesystems).

At this point, the problem is in the hands of a DEC Engineering Team. As
we last heard, they _had_ reproduced the problem and are working on a
solution. We're curious to know if anyone else is running a similar system
as ours, and whether this problem has surfaced elsewhere. Ideally, we'd
find that someone has experienced this problem, and knows a simple
solution. :-)

Any and all suggestions will be welcomed; our consultant and my group here
have tried many things (except deinstalling the ASE software), but we're
always open to other approaches.

Todd V. Minnella
UNIX Systems Analyst
Faculty of Arts and Sciences Computer Services, Harvard University
Received on Sun Mar 09 1997 - 23:01:12 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT