Summary RE: Legato timing out after patch kit 3 from Cohen.Jessica_at_ic.gc.ca on 2004-05-28 (tru64-unix-managers)

From: <Cohen.Jessica_at_ic.gc.ca>
Date: Thu, 27 May 2004 17:04:31 -0400

There is a patch kit: T64KIT0021340-V51BB24-E-20040120 that supposedly
fixes this problem, which is a TCP timeout issue and seems to match the
symptoms I noticed.

Thanks to David J. DeWolfe, Iain Barker, and mcaplin_at_miami.edu for letting
me know about this patch kit.

-----Original Message-----
From: tru64-unix-managers-owner_at_ornl.gov
[mailto:tru64-unix-managers-owner_at_ornl.gov]On Behalf Of Cohen, Jessica:
DGRB
Sent: Thursday, May 27, 2004 1:48 PM
To: tru64-unix-managers_at_ornl.gov
Subject: Legato timing out after patch kit 3

We have two ES40's that were recently upgraded to 5.1b, and then to patch
kit 3. The 5.1b upgrade went fine, and our backups that evening went
smootlhly, but our backups since we applied patch kit 3 (which we needed for
some of our Oracle products) have been quite erratic. When I tried testing
them during the day they went fine, and it isn't consistent which file
systems successfully save and which don't, although the smallers ones are
more likely to.

An example of the problem is shown below from the savegroup completion
report:

--- Unsuccessful Save Sets ---

* dino:/clones/home 1 retry attempted
* dino:/clones 1 retry attempted

* enzo:/clones/home/dba/oracle/data/temp 1 retry attempted
* enzo:/clones/home/dba/oracle/data/index 1 retry attempted

--- Successful Save Sets ---

  dino: /clones/home/dba/oracle/data/index level=incr, 14 GB 00:26:19 53
files
  dino: /clones/usr level=incr, 399 KB 00:00:25 11
files
* dino:/clones/var 1 retry attempted
  dino: /clones/var level=incr, 14 MB 00:00:46 59
files
  dino: /clones/home/dba/oracle/data/temp level=incr, 7166 MB 00:13:23
22 files
  dino: /clones/home/dba/oracle/product level=incr, 2404 MB 00:05:31 310
files
  rye: index:dino level=9, 966 KB 00:00:03 17
files

  enzo: /clones level=incr, 0 KB 00:00:05 0
files
  enzo: /clones/usr level=incr, 12 MB 00:00:38 81
files
  enzo: /clones/var level=incr, 33 MB 00:00:14 72
files
  enzo: /clones/home level=incr, 0 KB 00:00:30 0
files
  enzo: /clones/home/dba/oracle/product level=incr, 2872 MB 00:05:21 345
files
  rye: index:enzo level=9, 363 KB 00:00:03 5
files

And then the cron outputs these types of errors:

* enzo:/clones/home/dba/oracle/data/temp ! no output
05/26/04 01:37:55 savegrp: enzo:/clones/home/dba/oracle/data/temp will retry
1 more time(s)
* dino:/clones/home ! no output
05/26/04 01:46:52 savegrp: dino:/clones/home will retry 1 more time(s)
* dino:/clones/var ! no output

or in the case of this morning these types:

* enzo:/clones/home/dba/oracle/data/index lost connection to server, exiting
05/27/04 00:41:10 savegrp: enzo:/clones/home/dba/oracle/data/index will
retry 1 more time(s)
* dino:/clones/home/dba/oracle/data/index lost connection to server, exiting
05/27/04 00:47:25 savegrp: dino:/clones/home/dba/oracle/data/index will
retry 1 more time(s)
* enzo:/clones/home/dba/oracle/data/temp lost connection to server, exiting

I was running Legato 6.1.3, I upgraded yesterday to 7.1, though this did not
solve the problem.
I haven't ruled out a network thing, but the discrete changes between "when
it last worked perfectly" to "when we had issues" is the change from
unpatched 5.1b to installing patch kit 3, and adding the following
Oracle-requested changes to sysconfigtab:
ipc:
shm_mni=256
shm_seg=256
ssm_threshold=0
rdg:
msg_size=32768
max_objs=5120
max_async_req=256
max_sessions=5000
rdg_max_auto_msg_wires=0
rdg_auto_msg_wires=0
rt:
aio_task_max_num=8193
vm:
new_wire_method=0

We aren't clustered, so I don't believe the rdg does anything other than
keep the Oracle installation program from squawking.

Anyone seen any similar issues, either networky or legato-y after patch 3?
Received on Thu May 27 2004 - 21:06:41 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:44 NZDT