Summary: Networker failing to change tape

From: <johanl_at_basys.svt.se>
Date: Wed, 10 May 95 16:28:56 +0100

Hi!

In my original question, I mentioned some problems we experienced with
Networker Save and Restore in combination with a TZ877 jukebox, running on an
Alpha 3000/600 with OSF/1 3.0. The problems included abandoned backups and
processes hanging when trying to change tapes.

A few suggested that I should restart NSR with nsr_shutdown -a. Well I tried
that, but nsr_shutdown doesn't kill off a nsrjb-process in the U state. Gary
Rosenblum wondered what version of nsr we're using, since he believed there is
a lot of problems with nsr and jukeboxes in versions older than 4.1. Version
3.1 of nsr should be much improved though. Well, we bought nsr 6 weeks ago, and
got version 3.0

But there is hope, at least patches from DEC, as Olaf Grossman in Dresden,
Germany, pointed out to me. Here are the descriptions from the README-files:

nsrv3.0-001.tar

This patch is for those systems that have TZ87 tape devices (TZ877,TZ87,
 TL820, DLT2000) and NetWorker V3.0 *only* on OSF (V1.3-V3.0)
 Symptoms:
        - system hangs with NSR processes running but not doing anything
       - tape device hangs requiring power cycle to clear
        - tapes being marked full arbitrarily by NSR
           * COUPLED WITH THE ABOVE SYMPTOMS *

 Description:
        NetWorker V3.0 may hang or report errors during saves
        while writing end-of-files. This problem is known to
        occur with TZ87, TZ877 and DLT2000 type devices.

        The patched nsrmmd avoids the system call that creates this problem
        and should eliminate these failures.

        This image is usable *only* on systems running NetWorker V3.0 on
       OSF V1.3 or higher.
           DO NOT APPLY TO ANY OTHER VERSION OF NSR

nsrv3.0-002.tar

This patch is for those systems that have high capacity tape devices
(TZ87 and the like) and NetWorker V3.0, *only*, using pools to segregate full
vs. incremental saves on OSF (V1.3-V3.0)
 Symptoms:
                - Tape has *lots* of savesets on it (>2500), and is typically
                  used exclusively for incremental saves via pools ("NonFull",
                   etc.)
                - During saves, the message,
                   "MEDIA EMERGENCY: update_volume failed" is seen in the
                   message window, but saves continue.
                - Savesets for which this message appears do not show up in
                  the mminfo display for that volume.
                - Switching to a new tape causes the problem to go away.

 Description:
        NetWorker V3.0 may fail to update the media database for a volume
        with more than approximately 2500 savesets on it. The exact point
        of failure varies with the values in several other data fields, but
        is within +/- 150 or so.

        The patched nsrmmdbd raises the limit from 2500 to 25,300 which should
        be more than sufficient for current tape devices.

        The patched nsrmmd detects that the volume is close to this threshold
        and marks the volume full regardless of the volume of data on it. This
        is necessary since data put on the tape without proper index entries
        is essentially unrecoverable.

        These images are usable *only* on systems running NetWorker V3.0 on
       OSF V1.3 or higher.
           DO NOT APPLY TO ANY OTHER VERSION OF NSR

The first patch looks pretty much like my solution. I got the patches from DEC
Sweden.

Many thanks to:
rosenblg_at_nyu.edu Gary J. Rosenblum
gachamb_at_milp.jsc.nasa.gov
Knut.Hellebo_at_nho.hydro.com
st_at_hp735c.csc.cuhk.hk S. T. Wong
grossm.Rcs1.urz.tu-dresden.de Olaf Grossman

Keep rocking,
Johan Larsson
Swedish Television
Stockholm SWEDEN
Received on Wed May 10 1995 - 10:29:49 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:45 NZDT