|
OpenVMS User's Manual
9.8.2 Omitting Records and Fields
From a specification file, you can improve Sort efficiency by using the
/CONDITION, /INCLUDE, and /OMIT qualifiers to process only those
records needed in the output file. (The high-performance Sort/Merge
utility does not support specification files. Implementation of this
feature is deferred to a future OpenVMS Alpha release.) You can also
use specification file qualifiers to reformat records, omitting
unnecessary fields from the output file. These qualifiers are not
available as command line qualifiers.
9.8.3 Assigning Work Files
During a Sort operation, records from the input file are read into
memory. If the allocated memory cannot hold all the records, Sort
transfers the sorted data to one or more temporary work files. Merge
does not use work files.
You can increase sort efficiency by changing the number of work files
and by assigning them to specific devices:
- The Sort command line qualifier /WORK_FILES=n overrides
the number of work files allocated.
- Normally, Sort places work files on the device SYS$SCRATCH and
accesses them in an arbitrary order. You can assign work files to
specific devices in two ways:
- In a specification file, the /WORK_FILES=(device,...)
qualifier places the work files on the specified devices. See
Section 9.9.3 for more information about using the /WORK_FILES
qualifier in a specification file.
- If you are not using a specification file, you can use the DCL
command ASSIGN to assign the work files to specific devices.
Sort
uses the SORTWORKn logical names to identify user-specified
device names for the workfiles, where n is a value from 0
through 9. (For the high-performance Sort/Merge utility, n is
a value from 0 to 254.) Define a SORTWORKn logical as follows:
For example,
$ ASSIGN WORK$2: SORTWORK1
$ ASSIGN WORK$3: SORTWORK2
|
This example defines SORTWORK1 as the device WORK$2: and SORTWORK2
as the device WORK$3:. For more information on logical names, see
Chapter 11.)
Consider the following when you assign work files to devices:
- Assign work files to the fastest devices available. For example,
random-access, mass storage devices such as disks.
- Choose devices with the least activity and the most space available.
- Assign each work file to a different physical device to maximize
overlapping input and output.
9.8.4 Modifying the Working Set Extent
If Sort requires work files (for example, if you are sorting a large
file), a larger working set can increase sort efficiency. However, if
your system is used heavily, it might be unable to allocate all the
pages in the working set extent to your process. This can result in
paging, which occurs when the operating system transfers parts of a
process between physical memory and memory on a paging device; only the
active part of the process remains in the physical memory. To avoid
excessive paging, you can decrease the working set extent for your
process. (Use the SET WORKING_SET command to decrease the working set
extent.)
9.9 Summary of Sort/Merge Qualifiers
The following list describes command qualifiers used with the SORT and
MERGE commands. To use a command qualifier, include the qualifier
immediately after the SORT or MERGE command.
/[NO]CHECK_SEQUENCE
Applies to the MERGE command only. Verifies the sequence of the records
in MERGE input files. Merge checks the sequence of records by default.
The /CHECK_SEQUENCE qualifier checks whether the records of one or
more files (up to 10; the high-performance Sort/Merge utility supports
up to 12) have been sorted. (The records will still be directed to an
output file, which you must specify.) If you are checking whether
records are sorted on a key field other than the entire record, you
must specify key information, along with the requesting sequence.
Use the /NOCHECK_SEQUENCE qualifier to prevent Merge from checking
the sequence of records. Example
$ MERGE/KEY=(SIZE:4,POSITION:3)/NOCHECK_SEQUENCE -
_$ PRICE1.DAT,PRICE2.DAT PRICE.LIS
|
In this example, the /NOCHECK_SEQUENCE qualifier specifies that the
sequence of the input files, PRICE1.DAT and PRICE2.DAT, is not to be
checked.
/COLLATING_SEQUENCE=sequence
Selects one of three predefined collating orders for character key
fields, or specifies the name of a National Character Set (NCS)
collating sequence to be used in comparing character keys. (The
high-performance Sort/Merge utility does not support the NCS collating
sequences. Support for NCS collating sequences is deferred to a future
OpenVMS Alpha release.) Sort can arrange characters in ASCII (default),
EBCDIC, or Multinational sequences. Example
$ SORT/COLLATING_SEQUENCE=MULTINATIONAL -
_$ NAMES.DAT,NOM.DAT LIST.LIS
|
This SORT command arranges the input files NAMES.DAT and NOM.DAT
according to the Multinational collating sequence to create the output
file LIST.LIS.
/[NO]DUPLICATES
By default, Sort retains all multiple records with duplicate keys. The
/NODUPLICATES qualifier eliminates all but one of multiple records with
duplicate keys. The retained records may not appear in the same order
as they appeared in the input file. If you want to specify which
duplicate record to keep, invoke Sort at the program level and specify
an equal-key routine. The /STABLE and the /NODUPLICATES qualifiers
are mutually exclusive. Example
$ SORT/KEY=(POSITION:3,SIZE:5,DECIMAL)/NODUPLICATES -
_$ ACCT1,ACCT2 ACCT.LIS
|
This SORT command arranges the two input files according to the key
supplied and eliminates all but one of multiple records with equal keys.
/KEY=(POSITION:n,SIZE:n[,field,...])
Describes key fields, including the position, size, sorting order
(ASCENDING or DESCENDING), priority (NUMBER:n), and data type (such as
character, binary, h_floating). By default, Sort reorders a file by
sorting entire records with character data in ascending order. See
Section 9.2.1 for detailed information about the /KEY qualifier.
/PROCESS=type
(Applies to the SORT command only.) Defines the internal sorting
process. The /PROCESS qualifier allows you to choose one of four
processes: record, tag, address, or index. (The high-performance
Sort/Merge utility supports only the record process. Implementation of
tag, address, and index processes is deferred to a future OpenVMS Alpha
release.) See Section 9.2.6 for detailed information about the
/PROCESS qualifier. Example
$ SORT/KEY=(POS:40,SIZ:2,DESC)/PROCESS=TAG YRENDAVG.DAT -
_$ DESCYRAVG.LIS
|
This Sort operation uses a tag sorting process to create the output
file DESCYRAVG.LIS.
/SPECIFICATION=filespec
(The high-performance Sort/Merge utility does not support this
qualifier. Implementation of this feature is deferred to a future
OpenVMS Alpha release.)
Identifies a Sort or Merge specification file to be used in a Sort or
Merge operation. The default specification file type is .SRT. See
Section 9.7 and Section 9.9.3 for information about using
specification files.
/[NO]STABLE
By default, records with equal keys are not guaranteed to be placed in
the output file in the order they appear in the input file. The /STABLE
qualifier maintains the records in that order. The /STABLE and
/NODUPLICATES qualifiers are mutually exclusive.
Example
$ SORT/KEY=(POS:1,SIZ:5,DECIMAL)/STABLE PRICESA.DAT, -
_$ PRICESB.DAT,PRICESC.DAT SUMMARY.LIS
|
In this Sort operation, records with equal keys from PRICESA.DAT
will be listed first, followed by those from PRICESB.DAT, followed by
those from PRICESC.DAT.
/[NO]STATISTICS
Displays a statistical summary to SYS$OUTPUT that can be used for
optimization. To save these statistics in a file, use the following
command:
$ DEFINE/USER SYS$ERROR output-file
|
The statistical summary contains the following information:
Statistic |
Description |
Records read
|
The number of records read by Sort or Merge.
|
Records sorted
|
The number of records that have been processed using Sort. This number
could be less than the number of records read if a specification file
is used to select only certain records for the Sort or Merge operation.
|
Records output
|
The number of records written to the output file. This number could be
less than the number of records sorted if /NODUPLICATES was selected or
if I/O errors occurred when the output records were being written.
|
Working set extent
|
The number of pages in the process working set extent. This value is
used as an upper limit on the size of the sort data structure.
Adjusting this value is one way to improve the efficiency of a Sort
operation.
|
Virtual memory
|
The number of pages of virtual memory added to the Sort image to hold
the data.
|
Direct I/O + buffered I/O
|
This total is the number of I/O movements needed to read and write
data. The lower this total value is, the more efficient the ordering
operation.
|
Page faults
|
Indicates how well the data fits into memory: the higher the number of
page faults, the less efficient the ordering operation.
|
Elapsed time
|
The total wall clock time used by the Sort or Merge operation in hours,
minutes, seconds, and hundredths of seconds.
|
Input record length
|
This value is obtained from the Record Management Services (OpenVMS
RMS) unless the user supplies it.
|
Internal length
|
The size in bytes of an internal format node. This includes any keys,
data, a word to store the length, record file addresses (RFAs), and
converted keys.
|
Output record length
|
The length of the output record. The length is computed from the input
record length, the sort process, and the record reformatting requested.
|
Sort tree size
|
The number of records that fit in the Sort internal data structure.
|
Number of initial runs
|
One indication of how well the data fits into memory.
|
Maximum merge order
|
The maximum number of sorted strings that are merged at one time.
|
Number of merge passes
|
The number of times the Sort utility merges strings until one sorted
output string is produced. The number of initial runs and the number of
merge passes indicate how well the data fits into memory. The higher
these numbers, the further the working set size is from containing the
data and the longer the sorting takes.
|
Work file allocation
|
The number of blocks used for the work files. When more than one merge
pass is needed, this size is approximately twice the size of the input
file allocation.
|
Elapsed CPU
|
The CPU time used by the ordering operation; it does not include time
spent waiting for I/O operations to complete or time spent waiting
while another process executes.
|
Example
$ SORT/STATISTICS PRICE1.DAT,PRICE2.DAT PRICE.LIS
|
This SORT /STATISTICS command results in the following statistical
display:
OpenVMS Sort/Merge Statistics
Records read: 793 Input record length: 80
Records sorted: 793 Internal length: 80
Records output: 793 Output record length: 80
Working set extent: 100 Sort tree size: 412
Virtual memory: 433 Number of initial runs: 2
Direct I/O: 22 Maximum merge order: 2
Buffered I/O: 9 Number of merge passes: 1
Page faults: 3418 Work file allocation: 114
Elapsed time: 00:00:05.98 Elapsed CPU: 00:00:03.63
|
/WORK_FILES[=n]
(Applies to the SORT command only.) Increases the number of Sort work
files by any number, from 1 to 10 (the high-performance Sort/Merge
utility supports up to 255) inclusively, to make each work file
smaller. If the available disks are too small or too full for work
files, increasing the number of files can improve the efficiency of the
Sort operation. Sort does not create work files until it needs
them. If Sort needs work files, it creates two by default (SORTWORK0,
SORTWORK1), which are placed in the SYS$SCRATCH directory.
Example
$ ASSIGN DRA5: SORTWORK0
$ ASSIGN DB0: SORTWORK1
$ ASSIGN DB1: SORTWORK2
$ SORT/KEY=(POS:1,SIZ:80)/WORK_FILES=3 -
_$ STATS1,STATS2,STATS3,STATS4 SUMMARY.LIS
|
Because the input files in this Sort operation are large files,
specifying three work files improves the efficiency of the sort
operation. Note that you can also assign the work files to a
specific directory on a device by including the directory name. For
example, to assign SORTWORK0 to the [WORKSPACE] directory on DRA5,
enter the following command:
$ ASSIGN DRA5:[WORKSPACE] SORTWORK0
|
9.9.1 Input File Qualifier
The following input qualifier should be included immediately after the
input file specification in the SORT or MERGE command line:
/FORMAT=(RECORD_SIZE:n,FILE_SIZE:n)
Defines input file characteristics; allows you to specify or override
record or file size. It must be specified immediately after the input
file specification in the Sort or Merge command line. Sort uses
input file size information to determine the amount of memory needed,
as well as the size of the work files for the Sort operation. If the
file size is unknown (for example, you are sorting files that do not
reside on disk or standard ANSI magnetic tape), Sort assumes a fairly
large file size. Specify the following qualifier values:
RECORD_SIZE:
n
|
Specifies the input file's longest record length (LRL) in bytes. The
maximum longest record length that can be specified depends on the file
organization:
Sequential
|
32,767
|
Relative
|
16,383
|
Indexed-sequential
|
16,362
|
|
|
These values include control bytes for variable records with
fixed-length control (VFC) format.
|
FILE_SIZE:
n
|
Specifies input file size in blocks. The maximum file size accepted is
4,294,967,295 blocks.
|
You can also use /FORMAT as an output file qualifier. See
Section 9.9.2 for more information. Example
$ SORT/KEY=(POS:40,SIZ:2,DESC) -
_$CRA0:YRENDAVG.DAT/FORMAT=(RECORD_SIZE:41,FILE_SIZE:3) -
_$DESCYRAVG.LIS
|
Because the input file YRENDAVG.DAT does not reside on a disk
device or ANSI magnetic tape, file organization must be described by
the /FORMAT qualifier.
9.9.2 Output File Qualifiers
The following output qualifiers can be used with the SORT and MERGE
commands. To use an output file qualifier, include the qualifier
immediately after the output file specification in the SORT or MERGE
command line.
/ALLOCATION=n
Specifies the number of blocks, from 1 through 4,294,967,295, to be
preallocated to the output file for optimization. Use this qualifier
when you know that the output file allocation will differ substantially
from the total input file allocation (for example, when reformatting
data or omitting records). The /ALLOCATION qualifier is required if
the /CONTIGUOUS qualifier is used. Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT -
_$ SUMMARY.LIS/ALLOCATION=1000/CONTIGUOUS
|
This SORT command allocates 1000 contiguous blocks for the output
file SUMMARY.LIS.
/BUCKET_SIZE=n
Specifies OpenVMS RMS bucket size (the number of 512-byte blocks per
bucket) to be used by relative and indexed sequential output disk files
for optimization. A value of 1 through 32 is allowed. If the output
file organization is the same as for the input files, the default value
is the same as the bucket size of the first input file. If output file
organization is different, the default value is 1.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS1.DAT,STATS2.DAT -
_$ SUMMARY.LIS/BUCKET_SIZE=16/RELATIVE
|
This SORT command results in the output file SUMMARY.LIS that has a
bucket size of 16 with relative organization.
/CONTIGUOUS
Requests that the output file be stored in contiguous disk blocks to
decrease access time. Must be used with the /ALLOCATION qualifier. By
default, Sort/Merge does not allocate contiguous disk blocks for the
output file. Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT -
_$ SUMMARY.LIS/ALLOCATION=1000/CONTIGUOUS
|
This SORT command allocates 1,000 contiguous blocks for the output
file SUMMARY.LIS.
/FORMAT=(type:n[,...])
Specifies the output file record format (FIXED:n, VARIABLE:n, or
CONTROLLED:n) if it differs from the input file format. You can also
specify the size (SIZE:n) or the block size (BLOCK_SIZE:n) of the file
records. If the Sort operation is a record or tag sort, the default
output record format is the same as the first input file record format.
If the Sort operation is an address or index sort, the default output
record format is fixed record format. If the input files have different
record formats, Sort provides an output record size that is large
enough to contain the largest record in the input files. You can
specify the following qualifier values.
BLOCK_SIZE:
n
|
Specifies the output file's block size, in bytes, if you have directed
the file to magnetic tape. If the input file is a tape file, the block
size of the output file defaults to that of the input file. Otherwise,
the output file block size defaults to the size used when the tape was
mounted.
|
|
Acceptable values for
n range from 20 to 65,532. To ensure correct data interchange
with other Compaq systems, however, specify a block size of not more
than 512 bytes. For compatibility with systems that are not made by
Compaq, the block size should not exceed 2,048 bytes.
|
CONTROLLED:
n
|
Specifies variable with fixed-length control (VFC) records in the
output file.
|
FIXED:
n
|
Specifies fixed-length records in the output file.
|
SIZE:
n
|
Specifies the size, in bytes, of the fixed portion of VFC (CONTROLLED)
records, up to a maximum of 255 bytes. If you do not specify SIZE, the
default is the size of the fixed portion of the first input file. If
you specify this size as 0, OpenVMS RMS defaults the value to 2 bytes.
|
VARIABLE:
n
|
Specifies variable-length records in the output file.
|
For any qualifier value, you can optionally specify n as
the maximum record size (in bytes) of the output records. The maximum
record size allowed depends on the file organization:
Sequential files
|
32,767
|
Relative files
|
16,383
|
Indexed-sequential files
|
16,362
|
These maximum record size values include control bytes for variable
records with fixed-length control (VFC) format.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT SUMMARY.LIS/FORMAT=FIXED:80
|
The input file STATS.DAT consists of variable-length records that
are 80 bytes in length. The /FORMAT qualifier specifies that the output
file, SUMMARY.LIS, consists of fixed-length records.
/INDEXED_SEQUENTIAL
Defines the file organization for the output file as indexed
sequential. Note that the output file must already exist and must be
empty. In addition, you must specify that the empty file is to be
overlaid with the sorted records by using the /OVERLAY qualifier.
Example
$ CREATE/FDL=NEW.FDL AVERAGE.DAT
$ SORT/KEY=(POS:1,SIZ:80) DATA.DAT,STATS.DAT -
_$ AVERAGE.DAT/INDEXED_SEQUENTIAL/OVERLAY
|
The CREATE/FDL command creates the empty file AVERAGE.DAT. The SORT
command specifies that the output file have an indexed-sequential
organization and be written to the empty file AVERAGE.DAT.
/OVERLAY
Specifies an existing empty file that the output file is to be overlaid
on, or written to. The /OVERLAY qualifier is required when you use the
/INDEXED_SEQUENTIAL qualifier. If the input file organization is
indexed-sequential, the output file must already exist and be empty. If
the output file is not empty, /OVERLAY does not write over the file.
Instead, it appends the result of the sort to the existing output file.
You can use the CREATE/FDL utility to create an empty data file.
Any attributes that you specify when creating the empty file then
become attributes of the Sort output file. Example
$ CREATE/FDL=NEW.FDL AVERAGE.DAT
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT AVERAGE.DAT/OVERLAY
|
The FDL file NEW.FDL specifies special attributes for the file
AVERAGE.DAT. When Sort writes output to that file, the resulting Sort
output file has the attributes specified by the FDL file.
/RELATIVE
Defines the file organization for the output file as relative.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT SUMMARY.LIS/RELATIVE
|
Because the input file STATS.DAT is not a relative file and the
output file SUMMARY.LIS will be, /RELATIVE qualifies the output file
specification.
/SEQUENTIAL
Defines the file organization for the output file as sequential. This
is the default for address and index sorting operations. The default
for record and tag sorting operations is the organization of the first
input file. Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT SUMMARY.LIS/SEQUENTIAL
|
Because the input file STATS.DAT is not a sequential file and the
output file SUMMARY.LIS will be, /SEQUENTIAL qualifies the output file
specification.
|