I got lots of responses to this one (too many to enumerate; if you
responded, Thank you!).
Most of the responses involved some use of fuser, du, or lsof. I probably
didn't do as good a job as I might have of listing what we're already tried,
since we tried these already.
fuser -d /usr returns nothing.
All the files returned by lsof /usr do appear in directories (I did a little
script to awk the output and test for the file). lsof does report a lot of
instances of /usr, which I don't know the significance of.
du is not helpful since it just reports the 3gb that we can see is used, not
the missing 2gb+.
There are no clones.
The application is a mix of some home-grown code (which we have the sources
for) and some middle-ware products (which we don't have the sources for).
I'll take any further ideas. Thanks!
- Bluejay Adametz
A good listener is not only popular everywhere,
but after a while, he knows something. -Wilson Mizner
> We have this application running on a V4.0G PK3 PS cluster
> that consists of
> a couple hundred processes. Over the course of time, the /usr
> file system
> runs out of space, but we are unable to locate where it's
> going. du reports
> only ~3gb (out of ~6gb) used, but df and showfdmn shows the
> (advfs) file
> system filling up, and eventually writes fail because of lack
> of space.
>
> # du -ks /usr
> 3800152 /usr
> # df -k
> Filesystem 1024-blocks Used Available Capacity
> Mounted on
> root_domain#root 262144 93312 161592 37% /
> /proc 0 0 0 100% /proc
> usr_domain#usr 6526976 3455289 798720 82% /usr
> home_domain#home 17778192 14696676 2940192 84% /home
> # showfdmn -k usr_domain
> Id Date Created LogPgs Domain Name
> 387ac2f0.00090660 Tue Jan 11 00:43:12 2000 512 usr_domain
>
> Vol 1K-Blks Free % Used Cmode Rblks Wblks Vol Name
> 1L 6526976 798720 88% on 128 128 /dev/re0g
> K5MESAP3#
>
> If we stop the application, all the space comes back.
>
> At the advice of HP, we tried enabling quotas on this file system and
> periodically running quotacheck, but that just results in the
> inconsistent
> numbers shown by df above. We tried using trace to track down
> the offending
> file(s), but that lead nowhere.
>
> I've suggested killing the application one process at a time
> to narrow down
> the culprit, but the application administrator doesn't want
> to do that.
>
> Any ideas on how we can track this down?
>
Received on Wed Jul 23 2003 - 19:52:55 NZST