I got lots of responses to this one (too many to enumerate; if you
responded, Thank you!).
Most of the responses involved some use of fuser, du, or lsof. I probably
didn't do as good a job as I might have of listing what we're already tried,
since we tried these already.
fuser -d /usr returns nothing.
All the files returned by lsof /usr do appear in directories (I did a little
script to awk the output and test for the file). lsof does report a lot of
instances of /usr, which I don't know the significance of.
du is not helpful since it just reports the 3gb that we can see is used, not
the missing 2gb+.
There are no clones.
 
The application is a mix of some home-grown code (which we have the sources
for) and some middle-ware products (which we don't have the sources for).
I'll take any further ideas. Thanks!
                                                - Bluejay Adametz
A good listener is not only popular everywhere,
but after a while, he knows something.          -Wilson Mizner
> We have this application running on a V4.0G PK3 PS cluster 
> that consists of
> a couple hundred processes. Over the course of time, the /usr 
> file system
> runs out of space, but we are unable to locate where it's 
> going. du reports
> only ~3gb (out of ~6gb) used, but df and showfdmn shows the 
> (advfs) file
> system filling up, and eventually writes fail because of lack 
> of space. 
> 
> # du -ks /usr
> 3800152 /usr
> # df -k
> Filesystem       1024-blocks        Used   Available Capacity 
>  Mounted on
> root_domain#root      262144       93312      161592    37%    /
> /proc                      0           0           0   100%    /proc
> usr_domain#usr       6526976     3455289      798720    82%    /usr
> home_domain#home    17778192    14696676     2940192    84%    /home
> # showfdmn -k usr_domain
>                Id              Date Created  LogPgs  Domain Name
> 387ac2f0.00090660  Tue Jan 11 00:43:12 2000     512  usr_domain
> 
>   Vol    1K-Blks        Free  % Used  Cmode  Rblks  Wblks  Vol Name
>    1L    6526976      798720     88%     on    128    128  /dev/re0g
> K5MESAP3#
> 
> If we stop the application, all the space comes back.
> 
> At the advice of HP, we tried enabling quotas on this file system and
> periodically running quotacheck, but that just results in the 
> inconsistent
> numbers shown by df above. We tried using trace to track down 
> the offending
> file(s), but that lead nowhere.
> 
> I've suggested killing the application one process at a time 
> to narrow down
> the culprit, but the application administrator doesn't want 
> to do that.
> 
> Any ideas on how we can track this down?
> 
Received on Wed Jul 23 2003 - 19:52:55 NZST