SUMMARY: 1 to 2 Tb disk space

From: <maurerf_at_post.ch>
Date: Wed, 18 Aug 1999 15:58:29 +0200

Thanks a lot to the following persons who replied to my enquiry:

Loucks Guy
Arne Steinkamm
Jim Fitzmaurice
Stanley Horwitz
Alan
Don Rye

The original posting was:

We have to plan soon a system with very large disk space, at leat for me!
That's the reason why I'd like to ear your experiences:

What are the possibilities to have good uptime, online backup?
How much time would take a restore?
What do I have to know to manage a such configuration?

The most complete answer came from Alan:

>What are the possibilities to have good uptime, online backup?

Ignoring software and system problems, uptime is dependent on
the frequency that devices fail and how many of them you have.
Using 36 GB disks you only need 57 of them to get 2 TB. Using
9 GB disks you'll need nearly 230 of them. The more things you
have the higher the chance that one of them will fail. I don't
know what the current MTBF for drives nor how that translates to
a useful probability that it will fail in a given time period.

60 devices is getting into the neighborhood that you'll want to
think about online redundancy; RAID-5 for read heavy devices and
RAID-1 for write heavy. Then if the subsystem supports it get
a couple of extra devices as spares.

Online backup is easy. For file systems you the appropriate
file system backup tool; vdump or dump. For raw devices you
can use dd(1), though a tool that understands the underlying
data is better, since it can skip blocks not being used.

The problem is whether you can get a Consistent online backup.
A simple multiple user system with lots of user file space
typically doesn't any particular requires for consistency. Each
file is separate and most files never change. The odds of a
file changing during a backup is relatively small.

Database systems on the other hand are all about consistency.
Two separate files may have dependencies on the contents of
the other files. Making a backup of one, without a consistent
backup of the other is nearly as bad as not having backups at
all.

You haven't described the application for your data, so it hard
to advise on a workable backup strategy. In general the best
advise is:

o Don't backup blocks that aren't being used.

o Don't backup data that hasn't changed since the last backup.

There are applications where it makes sense to do infrequent full
backups and regular incrementals. There are also applciations
where large files change with great frequency and file by file
incremental backup is next to worthless, compared to a full.

>How much time would take a restore?

Longer than it takes to backup.

Much of the work of creating a file involves synchronous writes
that are relatively slow. As the file gets smaller, the extra
work of creating and closing the files, dominates the time needed
to simply write the data. Restoring large files is usually dominated
by the time needed to write the data, which is likely to be as fast
as reading if large transfers are used.

Spreading a large amount of space over multiple file systems and/or
domains can also help reduce the restore time. In one big file
systems if one contributing device breaks, the whole file system
is lost. If you spread the devices out among multiple file systems
(and AdvFS domains), if one device breaks it only affects the one
domain or file system.

>What do I have to know to manage a such configuration?

The advantages and disadvantages of the different file systems
and storage management tools for particular tasks. From the
software side, the different components of interest are:

LSM - A volume manager. It simply takes bare devices, allows
        combining their space and then carving it up as simple
        volumes, stripe sets and mirror sets. It allows expansion
        of a volume, but the neither UFS nor AdvFS have utilities
        in the base system to take advantage of the new space.
        I tend to only use it for I/O tracing, mirroring and
        striping.

AdvFS - A file system with extensive space management features.
        It allows building space domains of multiple devices and
        then having multiple file systems share that space. New
        devices can be added and removed online. Utilities are
        available to defragment domains and balance the space
        among multiple volumes. It even has the ability to do
        per-file striping.

        However, it is sensitive to I/O errors and the tools to
        repair a broken file systems are either lacking or very
        poorly documented. Previous versions had trouble with
        very dynamic file systems, but current versions are
        better behaved.

UFS - Your basic old Berkeley Fast File System implementation.
        It has seen very little change or functional improvement
        since DEC OSF/1 was first released. That has the advantage
        of making it very stable. It is a hair more tolerant of
        transient I/O errors than AdvFS, but becomes as badly
        broken in the face of hard I/O errors. However, fsck(8)
        can fix a lot of problems. It is relatively untested with
        very large file systems. If fsck actually has to work
        its way through a file system it can use a lot of memory
        on a very large one (about one byte per file system fragment
        I think).

Other sugestions are to use HSZ70 or EMC and StoregeTek
__________________________________________________

Felix Maurer
Die Schweizerische Post Tel: +41-31-338 98 49
Informatik POST Fax: +41-31-338 98 80
Messaging Management
Webergutstrasse 12 Mailto:maurerf_at_post.ch
CH-3030 Bern http://www.post.ch
__________________________________________________
Received on Wed Aug 18 1999 - 14:00:07 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT