TRU64 and optimizing Disk Performance

From: Loucks Guy <Guy.Loucks_at_det.nsw.edu.au>
Date: Wed, 12 Apr 2000 13:30:14 +1000

People,

Sorry for the delay in posting this, I received a fair bit of information,
probably the most relevant was from Alan. I however did receive some
contradicting information on ADVFS. Some of which concurs with our
experience of it with SQUID proxy (similiar situation to our WEB DNS, lots
of small files, and many UFS partitions handle this better than one ADVFS).

I think the final solution will be a set of solid state DASD. MFS may have
been an option, but state persistance is desireable.

A few people mentioned the use of a database, however I do not see this as
being a bennefit, as long table scans over a data table with 2.2million+
rows does not scale well, as much as Oracle and the other vendors may
believe, we would end up holding replicated data, snapshots, ... not
managable.

Alan's note about a few larger files might be an option and the use of a
C-ISAM / VSAM / HSAM database may be apropriate.

The answers and the original questions below, also the URL's for WEB DNS as
some people were interested in this:
Web: http://www.cc.utah.edu/~keide/Software/UofU_DNS_Tools/
Author: http://www.utah.edu/~keide/Kirk_Eide.html

Original Post:
People,

I am looking for peoples feedback to the following situation:

We have a data base structure consisting of some 3280 directories containing
about 6560 files. The directory structure is about 65MB in size. The average
file size is quite small.

The change rate is fairly slow, say a couple hundred transactions / hour
(This is absolute TOP rate).

This is a WEB interface to our DNS management system. We need to be able to
search and locate entries in this structure rapidly (lets say sub 10sec). I
am interested in peoples experience and options.

The ideas on the table so far:
* RAMDISK, it is small enough, keep a snapshot in RAM access at memory
speed (What RAM DISK OPTIONS USED IN TRU64?)
* Hard Ware this is always an option, I would like to keep this as a
last resort, use existing resources to their maximum. We could always throw
an EMC with a few gig's and pre-emptive frame access...
* Alternate FS: currently using ADVFS, go to UFS. muck around with
UBC....
* other options???

If people can forward their thoughts I will summarise.

Cheers,

Guy

Responses:

Nikola Milutinovic [Nikola.Milutinovic_at_ev.co.yu]
UBC is supposed to handle caching vs. vmem issues quite well. Of course, you
can
create RAM disk (see man for mfs and newfs). That is like locking a part of
a
filesystem in memory.

What Web software are you using for DNS management? I was planning to write
my
own, but if there is something free, I wouldn't mind using it untill I get
my
own.

Nix.

alan_at_nabeth.cxo.dec.com

        First, a general comment. The time to open and close a file
        will always dominate the access time to the data in the file
        when the file is sufficiently small. I don't know where the
        crossover is, but it is probably above the 10 KB average
        file size you have. If better performance is the goal you
        might want to look at something that uses fewer, larger files.

        I think there are kernel parameters that can adjust the size
        of the hash tables used by the namei code (file name lookup).
        Sizing those tables correctly can reduce the amount of disk
        access needed to lookup a file name when it is opened, which
        may improve performance. I think modern versions of sys_check
        can look at namei cache hit statistics and will make recommend-
        ations.

        I don't think the directory is large enough to make search
        times a significant issue. A switch to UFS might have an
        adverse affect since the directories will probably be
        further apart on disk, increasing seek time when you have
        to go to the disk for data. AdvFS in V5 also improves the
        internal search structure, which may help.

        Finally, there are two ways to interpret "ramdisk"; a memory
        based file system or a solid state SCSI device. You can
        use MFS, but it is volatile making unsuitable for persistent
        read-write data. A 65 MB MFS isn't particular hard to
        create, if you have the memory to back it. Such memory
        is a candidate for paging and there isn't an easy to
        lock it down.

        SCSI based solid state disks generally provide exceptional
        seek performance. Data rates may not be much better than
        rotating disks simply because the SCSI bus data rate limits
        transfer speed often as not. Still, they generally support
        a backend disk to allow the data to be persistent and it
        may offer better performance in some applications.

        You don't seem to have said what version of Tru64 UNIX
        you're using, so check the SPD for that version to see
        what solid state disks are supported (EZxx model names).
        I think the SPD is kept in the DOCUMENTATION directory
        of the base operating system installation CDROM.

Andrew Leahy [alf_at_cit.nepean.uws.edu.au]
On Mon, 10 Apr 2000, Loucks Guy wrote:

> The ideas on the table so far:
> * RAMDISK, it is small enough, keep a snapshot in RAM access at memory
> speed (What RAM DISK OPTIONS USED IN TRU64?)

man mfs - The mfs command builds a memory file system (mfs), which is a
UFS file system in virtual memory, and mounts it on the specified
mount-node. When the file system is unmounted, mfs exits and the contents
of the file system are lost.

> * Hard Ware this is always an option, I would like to keep this as a
> last resort, use existing resources to their maximum. We could always
throw
> an EMC with a few gig's and pre-emptive frame access...
> * Alternate FS: currently using ADVFS, go to UFS. muck around with
> UBC....

I'd certainly try a small UFS partition. I moved our Squid proxy servers
from AdvFS to UFS because of poor AdvFS performance when handling hundreds
of thousands of small files.

Jim Belonis [belonis_at_dirac.phys.washington.edu]


I still say a database is the way to go. High numbers of records
just makes it all the more important to do it right.
And rapidly increasing size means you can't afford to screw around
with solutions that don't scale.

I would (if I were a database guru) set up a database easy and fast to
search
(a few million records should be reasonable speed hashed properly).
[ I'm not sure a simple perl-hash scales that large with speed
since I've never had to use one with more than a few thousand records. ]
And use that database to generate the text files to be used by the
DNS service periodically, or just keep the database and your DNS service
files
in sync by modifying them together.

Depending on the complexity of the records and whether you want to search on
sub-parts of the records (like individual words in a TXT record or HINFO
record)
you might want to do a fully indexed full-text search database.
I've never used one, but I understand they can be incredibly fast.
My only practial knowlege about this is that the original
altavista search engine
was essentially this and searched billions of records in a few seconds
(but the whole index was in RAM).

Come to think of it, it might be neat to consider using a web search engine
even though your 'pages' are not out on the web.
If your files are ordinary text files, they can be treated as web pages
I believe even though they are not written in HTML.

Alternatively, you can throw hardware at the problem as you suggested in
your
original message and get a RAMdisk which should be findable.
I remember one came with VMS that they used for standalone backup
or booting off CDROM or something. And I used it for some other purpose.
But I don't remember one for Digital Unix.

Good luck.

Jim Belonis

> Thanks Jim,
>
> Even using a real database would be a problem. There are currently a
little
> over 2.2 million records (we are searching the content), doing a text
search
> or even a hash search, could be problematic.
>
> I am now considering setting up the likes of a data store, to search for
the
> required info. Essentially we want to ensure when an A or PTR record is
> removed (or someone tries to remove it more to the point) it does not
leave
> any CNAME or MX etnries. Our managed zones are going to double in the next
6
> months as we bring on-line the 2850 schools we connected to our network
last
> year.
>
> The objective is to push out / delegate the DNS management, without
> maintaining 3 500 DNS servers across the state. (NB: NSW is about the size
> of or a little larger than Texas.)
>
> Any suggestions on databases would be entertained, the store management is
> held within a few routines, and using PERL DBI is always an option...
>
> Cheers,
>
> Guy
>
> Guy R. Loucks
> Senior Unix Systems Administrator
> Networks Branch
> NSW Department of Education & Training
> Information Technology Bureau
> Direct +61 2 9942 9887
> Fax +61 2 9942 9600
> Mobile +61 (0)429 041 186
> Email guy.loucks_at_det.nsw.edu.au
>
>
>
>
> -----Original Message-----
> From: Jim Belonis [mailto:belonis_at_dirac.phys.washington.edu]
> Sent: Monday, April 10, 2000 8:37 PM
> To: Loucks Guy
> Subject: Re: TRU64 and optimizing Disk Performance
>
>
>
> If you are searching by filename, even 'find' should be able to search
> 65MB in 6500 files in under 10 seconds.
>
> If you are searching the actual content of the files, I have nothing much
> to say, except "why not use a real database ?" You need not answer,
> I assume you have your reasons.
>
> > People,
> >
> > I am looking for peoples feedback to the following situation:
> >
> > We have a data base structure consisting of some 3280 directories
> containing
> > about 6560 files. The directory structure is about 65MB in size. The
> average
> > file size is quite small.
> >
> > The change rate is fairly slow, say a couple hundred transactions / hour
> > (This is absolute TOP rate).
> >
> > This is a WEB interface to our DNS management system. We need to be able
> to
> > search and locate entries in this structure rapidly (lets say sub
10sec).
> I
> > am interested in peoples experience and options.
> >
> > The ideas on the table so far:
> > * RAMDISK, it is small enough, keep a snapshot in RAM access at memory
> > speed (What RAM DISK OPTIONS USED IN TRU64?)
> > * Hard Ware this is always an option, I would like to keep this as a
> > last resort, use existing resources to their maximum. We could always
> throw
> > an EMC with a few gig's and pre-emptive frame access...
> > * Alternate FS: currently using ADVFS, go to UFS. muck around with
> > UBC....
> > * other options???
> >
> > If people can forward their thoughts I will summarise.
> >
> > Cheers,
> >
> > Guy
> >
> > Guy R. Loucks
> > Senior Unix Systems Administrator
> > Networks Branch
> > NSW Department of Education & Training
> > Information Technology Bureau
> > Direct +61 2 9942 9887
> > Fax +61 2 9942 9600
> > Mobile +61 (0)429 041 186
> > Email guy.loucks_at_det.nsw.edu.au
> >
> >
> >
> >
>
>
> --
> J.James(Jim)Belonis II, U of Washington Physics Computer Cost Center
Manager
> belonis_at_phys.washington.edu Internet University of Washington Physics
> Dept.
> http://www.phys.washington.edu/~belonis r. B234 Physics Astronomy Building
> 1pm to midnite 7 days (206) 685-8695 Box 351560 Seattle, WA
> 98195-1560
>


-- 
J.James(Jim)Belonis II, U of Washington Physics Computer Cost Center Manager
belonis_at_phys.washington.edu Internet    University of Washington Physics
Dept. 
http://www.phys.washington.edu/~belonis r. B234 Physics Astronomy Building
1pm to midnite 7 days  (206) 685-8695   Box 351560      Seattle, WA
98195-1560
Guy R. Loucks
Senior Unix Systems Administrator
Networks Branch
NSW Department of Education & Training
Information Technology Bureau
Direct +61 2 9942 9887
Fax +61 2 9942 9600
Mobile +61 (0)429 041 186
Email guy.loucks_at_det.nsw.edu.au
Received on Wed Apr 12 2000 - 03:31:26 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:40 NZDT