[SUMM] csq_cleanup kernel panic from Karen Byrd on 1997-10-23 (tru64-unix-managers)

From: Karen Byrd <BYRD_at_mscf.med.upenn.edu>
Date: Wed, 22 Oct 1997 09:05:08 -0400 (EDT)

The original question:

> Anyone know what a csq_cleanup kernal panic is about? This
> on a 2100 4/200 running DU 4.0a.
>
> ___________________________________________________________________
> Karen Y. Byrd C511 Richards Bldg.
> Systems Manager 3700 Hamilton Wlk.
> Univ. of Pa. Philadelphia, PA 19104-6062
> School of Medicine Voice: 215/898-6865
> Computing and Info. Tech. Fax: 215/573-2277

I received two replies from Gary Jarrell(Jarrel_at_mail.dec.com) and Dr Thomas
P. Blinn(tpb_at_zk3.dec.com)

The most useful answer was from Dr Thomas Blinn which follows:

___________________________________________________________________________
Your chances of getting an accurate answer from anyone who doesn't have the
kernel sources is really slim.

I looked at the kernel sources; in the source module streams/str_synch.c in
the kernel, which is part of the streams subsystem, there is a routine
called csq_cleanup. This is the description (from the 4.0D sources, but it
is not the sort of thing that changes):

/*
* csq_cleanup - discard SQ elements which have been queued for a SQH
*
* This routine is used during shutdown of a queue, prior to deallocation
* of the queue. While "anonymous jobs" (put and service procedures) may
* sneak in harmlessly, we choose to panic if others are found. Examples
* are timeouts and bufcalls, or other processes acquiring the SQH. The
* problem lies elsewhere, but this gives us a chance to detect it now,
* instead of hanging a thread or allowing it to dereference freed memory.
* Of course, this may still happen! An uncancelled timeout might still be
* lurking, for example.
*
* The caller must be in control of the SQH, and must somehow be sure that
* nothing gets queued for this SQH anymore. We cannot assure that on this
* level.
*
* If we should be disassembling the parent queue itself, we give it a
* wildcard target, so that in this case, we actually remove all SQ's.
* However, since there really SHOULDN'T be other requests queued, we
* insert a consistency check here, and send a warning message in that
* case. The problem would not be here, but at some other place.
*/

You can see the "we choose to panic" comment; sounds to me like there is
potential for a "race" and sometimes you lose.

Here's where it does the test that leads to the panic:

                        /*
                         * Get rid of the SQ and associated messages:
                         * - message-related SQ's are contained in the
                         * message header, to which their sq_arg1 points.
                         * - others are unexpected and cause a panic.
                         */

                        if (sq->sq_flags & (SQ_IS_HEAD|SQ_HOLD|SQ_IS_TIMEOUT))
                                panic("csq_cleanup");

otherwise, it just pitches the entry and keeps going.

So you're getting into a situation where this happens. What changed on your
system recently, if anything? Some new application? A new network device?

If *nothing* changed (you're running the same kernel you were running all
along, check the date on /vmunix, and so on), then if it doesn't happen
again, you're probably not going to see it ever again. But if it happens
repeatedly, something may be badly broken.

Tom

Dr. Thomas P. Blinn, UNIX Software Group, Digital Equipment Corporation
  110 Spit Brook Road, MS ZKO3-2/U20 Nashua, New Hampshire 03062-2698
   Technology Partnership Engineering Phone: (603) 884-0646
    Internet: tpb_at_zk3.dec.com Digital's Easynet: alpha::tpb
     ACM Member: tpblinn_at_acm.org PC_at_Home: tom_at_felines.mv.net

  Worry kills more people than work because more people worry than work.

      Keep your stick on the ice. -- Steve Smith ("Red Green")

     My favorite palindrome is: Satan, oscillate my metallic sonatas.
                                         -- Phil Agre, pagre_at_ucsd.edu

  Opinions expressed herein are my own, and do not necessarily represent
  those of my employer or anyone else, living or dead, real or imagined.

___________________________________________________________________
Karen Y. Byrd C511 Richards Bldg.
Systems Manager 3700 Hamilton Wlk.
Univ. of Pa. Philadelphia, PA 19104-6062
School of Medicine Voice: 215/898-6865
Computing and Info. Tech. Fax: 215/573-2277
Received on Wed Oct 22 1997 - 15:44:10 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:37 NZDT