SUMMARY: DU4.0B and TCR1.4, NFS, lpd and arp. from Christophe DIARRA on 1997-03-03 (tru64-unix-managers)

From: Christophe DIARRA <diarra_at_ipno.in2p3.fr>
Date: Mon, 3 Mar 1997 11:09:53 +0100 (MET)

Hello,

I received 2 replies. Thanks to:

Gunther Feuereisen <gunther_at_ibm.net>
John Kohl <jtk_at_atria.com>

You will found my original request at the end of the mail.

1) SUMMARY concerning quotas
----------------------------

According to Gunther Feuereisen it is preferable to avoid ADVSF quotas under
DECsafe. I agree whith him and we have temporarily disabled
quotas until we will get an insurance from Digital that they work. If not,
we will be obliged to make many ADVFS domains and restrict each domain to
a limited number of users. This way, since we don't have quotas, if a user
create a huge file (accidentally for example), he will fill only the
ADVFS domain in which he belongs. I know this is not the best solution
because we are loosing one of the ADVFS usefulness: the ability to create
very large multi-disque volumes.

Following is Gunther Feuereisen's answer:

> I've heard of similar problems with several people.

> In ASE 1.2 DEC said No, you can't.
> In ASE 1.3 DEC said Yes, you can, and it _should_ work.
> In ASE 1.4 I gather they have said yes (I run 1.3 at this stage).
>
> However, I have never run into a site which uses user or disk quotas
> successfully.
> I know of one in particular, an ISP, who have problems with AdvFS filesets
> corrupting themselves.

> Sorry I can't be of more help. From experience, if you can avoid it, do so.
> However, there are cases when you need them..in that case all I can suggest
> is grouping users into managable filesets..it gets kinda messy..

> kind regards,
> Gunther Feuereisen

2) SUMMARY concerning NFS lock problems
---------------------------------------

John Kohl found the origin of the problem.

In fact, we have 2 NFS services under TCR and one of the lockd was dead. I
found a message about it in daemon.log:

Feb 21 17:25:22 xxx lockd[30611]: rpc.lockd bind (tcp) Address already in use

lockd didn't restart after an NFS service modification at 17:25:22 (Feb 21)
but the tool we use (asemgr) didn't print any error message.

Now we have restarted the lockd and things are working.

Following is John Kohl's answer :

CD> Question 2: the NFS server under DU4.0B
CD> ----------------------------------------

CD> After upgrading DECUNIX from 3.2D to 3.2G/4.0A/4.0B our
Sun/Solaris2.5 clients
CD> have problems:

CD> - the file manager (filemgr) take a long time to start and print the
CD> following messages: nfs_server: RPC: timed out. When
started, filemgr
CD> have a very long response time and hangs the 'X-Display'
when one
CD> click for instance on a directory incon.

CD> - mailx and pine are unable to start (only 1% of success).
CD> 'trussing the mailx' or pine process, I have the following
messages:

CD> # truss -p 22268
CD> fcntl(5, F_SETLK, 0xDFFFDF08) (sleeping...)

CD> On the NFS server, ps prints:

CD> root 899 1 0.0 Feb 21 ?? 0:00.18 /usr/sbin/nfsd
-t32 -u32

> This smells like an NFS locking issue--have you enabled NFS locking on
> your Digital UNIX host? (rerun nfssetup to check or to do so)

> ==John

3) SUMMARY lpd and hosts.equiv + hosts.lpd
-------------------------------------------

The lpd problem is known by Digital and we are waiting for a
patch. It seems that the presence of a netgroup in /etc/hosts.equiv
causes trouble to the lpd.

4) SUMMARY concerning arp
-------------------------

After disabling TCP with nfssetup, we no longer have messages like:

> Feb 24 09:40:51 the_host vmunix: arp: local IP address 0.0.0.0 in use by
> hardware address a_variable_hardware_address.
>

5) Problems still remaining:
-----------------------------

        o filemgr not working from Solaris
        o unable to access to NT files via NFS
        o console messages: chk_bf_quota: user/group underflow
        o consoles messages: tu1: transmit FIFO underflow: threshold raised to
          256 bytes.

Christophe.

***
Christophe DIARRA
Institut de Physique Nucleaire
Bat 100 - S2I
91406 ORSAY Cedex
Tel: (33) 01 69 15 65 60
Fax: (33) 01 69 15 64 70
E-mail: diarra_at_ipno.in2p3.fr
***

On Mon, 24 Feb 1997, Christophe DIARRA wrote:

>
> Hello,
>
> I have some questions to ask about TruCluster-1.4, NFS, lpd, arp under DU4.0B.
>
> Question 1: TruCluster Available Server 1.4
> --------------------------------------------
>
> Is it possible to use 'user quotas' and 'group quotas' under ASE 1.4 ?.
>
> Here, we are unable to do it and DEC is working on the problem.
> On our NFS server, we have to disable quotas to have ASE working even
> with ASE1.4.
>
> In the previous version of ASE (ASE 1.3/DU3.2D), we had a lot of problems to
> relocate a service or to shutdown one member. All attempts to do this kind of
> operation have failed with a crash. Each time we disabled quotas before a
> relocation or a shutdown things worked fine.
>
> Now, with ASE 1.4/DU4.0B, there are some enhancements. Relocations and
> the shutdown of a member work even when quotas are enabled and we always have
> a continuous NFS service. THE PROBLEM IS THAT THE FIRST AND THE SECOND NIGHT
> AFTER INSTALLING THE NEW VERSION (1.4), THE PRIMARY SERVER (the favor member
> for all services) HANGED-UP AND ONLY ELECTRICAL ON/OFF SOLVED THE PROBLEM.
> After disabling quotas, things seem working but we can accept this
> solution just for a short period or we will have in the next days an
> 'file system is full' message.
>
> We really need quotas to control disque usage and are waiting for any
> suggestion.
>
> Question 2: the NFS server under DU4.0B
> ----------------------------------------
>
> After upgrading DECUNIX from 3.2D to 3.2G/4.0A/4.0B our Sun/Solaris2.5 clients
> have problems:
>
> - the file manager (filemgr) take a long time to start and print the
> following messages: nfs_server: RPC: timed out. When started, filemgr
> have a very long response time and hangs the 'X-Display' when one
> click for instance on a directory incon.
>
> - mailx and pine are unable to start (only 1% of success).
> 'trussing the mailx' or pine process, I have the following messages:
>
> # truss -p 22268
> fcntl(5, F_SETLK, 0xDFFFDF08) (sleeping...)
>
> On the NFS server, ps prints:
>
> root 899 1 0.0 Feb 21 ?? 0:00.18 /usr/sbin/nfsd -t32 -u32
>
> Question 3: the NFS client under DU4.0B and WNT4.0 NFS server
> -------------------------------------------------------------
>
> It is impossible to create or to view a file which is in an NFS imported
> directory: DU4.0B is the client, WNT4.0 is the server. All the commands
> hang, and even 'kill -9' will not destroy the processes.
>
> Some times the ls command works (but only once). The second time, the command
> hangs with the message "NFS3 server 'the_server' not responding still trying".
>
> >From Sun/Solaris2.X clients, everything works well.
>
> Question 4: the /etc/hosts.lpd file
> -----------------------------------
>
> In the previous version of DECUNIX (3.2D) we had both /etc/hosts.equiv to
> enable r-commands and /etc/hosts.lpd to enable remote print requests.
>
> With, DU4.0B, the lpd doesn't accept requests from remote clients if the file
> /etc/hosts.equiv exists even if /etc/hosts.lpd have good clients names.
>
> How to have this two file to co-exist ? We need both files. In our current
> configuartion, we have renamed /etc/hosts.equiv to /etc/hosts.equiv.old to
> have remote printing working.
>
> Question 5: arp
> ---------------
>
> On our ASE members, we have continuously the following message :
>
> Feb 24 09:40:51 the_host vmunix: arp: local IP address 0.0.0.0 in use by
> hardware address a_variable_hardware_address.
>
> Why do we have this messages ? It generates big kern.log files.
>
> Thanks in advance for your answers.
>
> Christophe.
Received on Mon Mar 03 1997 - 11:44:15 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:36 NZDT