HP OpenVMS Systems Documentation

OpenVMS User's Manual

9.8.2 Omitting Records and Fields

From a specification file, you can improve Sort efficiency by using the /CONDITION, /INCLUDE, and /OMIT qualifiers to process only those records needed in the output file. (The high-performance Sort/Merge utility does not support specification files. Implementation of this feature is deferred to a future OpenVMS Alpha release.) You can also use specification file qualifiers to reformat records, omitting unnecessary fields from the output file. These qualifiers are not available as command line qualifiers.

9.8.3 Assigning Work Files

During a Sort operation, records from the input file are read into memory. If the allocated memory cannot hold all the records, Sort transfers the sorted data to one or more temporary work files. Merge does not use work files.

You can increase sort efficiency by changing the number of work files and by assigning them to specific devices:

The Sort command line qualifier /WORK_FILES=n overrides the number of work files allocated.
Normally, Sort places work files on the device SYS$SCRATCH and accesses them in an arbitrary order. You can assign work files to specific devices in two ways:
- In a specification file, the /WORK_FILES=(device,...) qualifier places the work files on the specified devices. See Section 9.9.3 for more information about using the /WORK_FILES qualifier in a specification file.
- If you are not using a specification file, you can use the DCL command ASSIGN to assign the work files to specific devices.
  Sort uses the SORTWORKn logical names to identify user-specified device names for the workfiles, where n is a value from 0 through 9. (For the high-performance Sort/Merge utility, n is a value from 0 to 254.) Define a SORTWORKn logical as follows:
  
  ASSIGN device: SORTWORKn
  
  For example,
  $ ASSIGN WORK$2: SORTWORK1 $ ASSIGN WORK$3: SORTWORK2
  This example defines SORTWORK1 as the device WORK$2: and SORTWORK2 as the device WORK$3:. For more information on logical names, see Chapter 11.)

Consider the following when you assign work files to devices:

Assign work files to the fastest devices available. For example, random-access, mass storage devices such as disks.
Choose devices with the least activity and the most space available.
Assign each work file to a different physical device to maximize overlapping input and output.

9.8.4 Modifying the Working Set Extent

If Sort requires work files (for example, if you are sorting a large file), a larger working set can increase sort efficiency. However, if your system is used heavily, it might be unable to allocate all the pages in the working set extent to your process. This can result in paging, which occurs when the operating system transfers parts of a process between physical memory and memory on a paging device; only the active part of the process remains in the physical memory. To avoid excessive paging, you can decrease the working set extent for your process. (Use the SET WORKING_SET command to decrease the working set extent.)

9.9 Summary of Sort/Merge Qualifiers

The following list describes command qualifiers used with the SORT and MERGE commands. To use a command qualifier, include the qualifier immediately after the SORT or MERGE command.

/[NO]CHECK_SEQUENCE

Applies to the MERGE command only. Verifies the sequence of the records in MERGE input files. Merge checks the sequence of records by default.
The /CHECK_SEQUENCE qualifier checks whether the records of one or more files (up to 10; the high-performance Sort/Merge utility supports up to 12) have been sorted. (The records will still be directed to an output file, which you must specify.) If you are checking whether records are sorted on a key field other than the entire record, you must specify key information, along with the requesting sequence.
Use the /NOCHECK_SEQUENCE qualifier to prevent Merge from checking the sequence of records.
Example
$ MERGE/KEY=(SIZE:4,POSITION:3)/NOCHECK_SEQUENCE - _$ PRICE1.DAT,PRICE2.DAT PRICE.LIS
In this example, the /NOCHECK_SEQUENCE qualifier specifies that the sequence of the input files, PRICE1.DAT and PRICE2.DAT, is not to be checked.

/COLLATING_SEQUENCE=sequence

Selects one of three predefined collating orders for character key fields, or specifies the name of a National Character Set (NCS) collating sequence to be used in comparing character keys. (The high-performance Sort/Merge utility does not support the NCS collating sequences. Support for NCS collating sequences is deferred to a future OpenVMS Alpha release.) Sort can arrange characters in ASCII (default), EBCDIC, or Multinational sequences.
Example
$ SORT/COLLATING_SEQUENCE=MULTINATIONAL - _$ NAMES.DAT,NOM.DAT LIST.LIS
This SORT command arranges the input files NAMES.DAT and NOM.DAT according to the Multinational collating sequence to create the output file LIST.LIS.

/[NO]DUPLICATES

By default, Sort retains all multiple records with duplicate keys. The /NODUPLICATES qualifier eliminates all but one of multiple records with duplicate keys. The retained records may not appear in the same order as they appeared in the input file. If you want to specify which duplicate record to keep, invoke Sort at the program level and specify an equal-key routine.
The /STABLE and the /NODUPLICATES qualifiers are mutually exclusive.
Example
$ SORT/KEY=(POSITION:3,SIZE:5,DECIMAL)/NODUPLICATES - _$ ACCT1,ACCT2 ACCT.LIS
This SORT command arranges the two input files according to the key supplied and eliminates all but one of multiple records with equal keys.

/KEY=(POSITION:n,SIZE:n[,field,...])

Describes key fields, including the position, size, sorting order (ASCENDING or DESCENDING), priority (NUMBER:n), and data type (such as character, binary, h_floating). By default, Sort reorders a file by sorting entire records with character data in ascending order.
See Section 9.2.1 for detailed information about the /KEY qualifier.

/PROCESS=type

(Applies to the SORT command only.) Defines the internal sorting process. The /PROCESS qualifier allows you to choose one of four processes: record, tag, address, or index. (The high-performance Sort/Merge utility supports only the record process. Implementation of tag, address, and index processes is deferred to a future OpenVMS Alpha release.)
See Section 9.2.6 for detailed information about the /PROCESS qualifier.
Example
$ SORT/KEY=(POS:40,SIZ:2,DESC)/PROCESS=TAG YRENDAVG.DAT - _$ DESCYRAVG.LIS
This Sort operation uses a tag sorting process to create the output file DESCYRAVG.LIS.

/SPECIFICATION=filespec

(The high-performance Sort/Merge utility does not support this qualifier. Implementation of this feature is deferred to a future OpenVMS Alpha release.)

Identifies a Sort or Merge specification file to be used in a Sort or Merge operation. The default specification file type is .SRT.
See Section 9.7 and Section 9.9.3 for information about using specification files.

/[NO]STABLE

By default, records with equal keys are not guaranteed to be placed in the output file in the order they appear in the input file. The /STABLE qualifier maintains the records in that order.
The /STABLE and /NODUPLICATES qualifiers are mutually exclusive.
Example
$ SORT/KEY=(POS:1,SIZ:5,DECIMAL)/STABLE PRICESA.DAT, - _$ PRICESB.DAT,PRICESC.DAT SUMMARY.LIS
In this Sort operation, records with equal keys from PRICESA.DAT will be listed first, followed by those from PRICESB.DAT, followed by those from PRICESC.DAT.

/[NO]STATISTICS

Displays a statistical summary to SYS$OUTPUT that can be used for optimization. To save these statistics in a file, use the following command:
$ DEFINE/USER SYS$ERROR output-file
The statistical summary contains the following information:

Statistic Description

Records read The number of records read by Sort or Merge.

Records sorted The number of records that have been processed using Sort. This number could be less than the number of records read if a specification file is used to select only certain records for the Sort or Merge operation.

Records output The number of records written to the output file. This number could be less than the number of records sorted if /NODUPLICATES was selected or if I/O errors occurred when the output records were being written.

Working set extent The number of pages in the process working set extent. This value is used as an upper limit on the size of the sort data structure. Adjusting this value is one way to improve the efficiency of a Sort operation.

Virtual memory The number of pages of virtual memory added to the Sort image to hold the data.

Direct I/O + buffered I/O This total is the number of I/O movements needed to read and write data. The lower this total value is, the more efficient the ordering operation.

Page faults Indicates how well the data fits into memory: the higher the number of page faults, the less efficient the ordering operation.

Elapsed time The total wall clock time used by the Sort or Merge operation in hours, minutes, seconds, and hundredths of seconds.

Input record length This value is obtained from the Record Management Services (OpenVMS RMS) unless the user supplies it.

Internal length The size in bytes of an internal format node. This includes any keys, data, a word to store the length, record file addresses (RFAs), and converted keys.

Output record length The length of the output record. The length is computed from the input record length, the sort process, and the record reformatting requested.

Sort tree size The number of records that fit in the Sort internal data structure.

Number of initial runs One indication of how well the data fits into memory.

Maximum merge order The maximum number of sorted strings that are merged at one time.

Number of merge passes The number of times the Sort utility merges strings until one sorted output string is produced. The number of initial runs and the number of merge passes indicate how well the data fits into memory. The higher these numbers, the further the working set size is from containing the data and the longer the sorting takes.

Work file allocation The number of blocks used for the work files. When more than one merge pass is needed, this size is approximately twice the size of the input file allocation.

Elapsed CPU The CPU time used by the ordering operation; it does not include time spent waiting for I/O operations to complete or time spent waiting while another process executes.

Example
$ SORT/STATISTICS PRICE1.DAT,PRICE2.DAT PRICE.LIS
This SORT /STATISTICS command results in the following statistical display:
OpenVMS Sort/Merge Statistics Records read: 793 Input record length: 80 Records sorted: 793 Internal length: 80 Records output: 793 Output record length: 80 Working set extent: 100 Sort tree size: 412 Virtual memory: 433 Number of initial runs: 2 Direct I/O: 22 Maximum merge order: 2 Buffered I/O: 9 Number of merge passes: 1 Page faults: 3418 Work file allocation: 114 Elapsed time: 00:00:05.98 Elapsed CPU: 00:00:03.63

/WORK_FILES[=n]

(Applies to the SORT command only.) Increases the number of Sort work files by any number, from 1 to 10 (the high-performance Sort/Merge utility supports up to 255) inclusively, to make each work file smaller. If the available disks are too small or too full for work files, increasing the number of files can improve the efficiency of the Sort operation.
Sort does not create work files until it needs them. If Sort needs work files, it creates two by default (SORTWORK0, SORTWORK1), which are placed in the SYS$SCRATCH directory.
Example
$ ASSIGN DRA5: SORTWORK0 $ ASSIGN DB0: SORTWORK1 $ ASSIGN DB1: SORTWORK2 $ SORT/KEY=(POS:1,SIZ:80)/WORK_FILES=3 - _$ STATS1,STATS2,STATS3,STATS4 SUMMARY.LIS
Because the input files in this Sort operation are large files, specifying three work files improves the efficiency of the sort operation.
Note that you can also assign the work files to a specific directory on a device by including the directory name. For example, to assign SORTWORK0 to the [WORKSPACE] directory on DRA5, enter the following command:
$ ASSIGN DRA5:[WORKSPACE] SORTWORK0

9.9.1 Input File Qualifier

The following input qualifier should be included immediately after the input file specification in the SORT or MERGE command line:

/FORMAT=(RECORD_SIZE:n,FILE_SIZE:n)

Defines input file characteristics; allows you to specify or override record or file size. It must be specified immediately after the input file specification in the Sort or Merge command line.
Sort uses input file size information to determine the amount of memory needed, as well as the size of the work files for the Sort operation. If the file size is unknown (for example, you are sorting files that do not reside on disk or standard ANSI magnetic tape), Sort assumes a fairly large file size.
Specify the following qualifier values:

RECORD_SIZE: n Specifies the input file's longest record length (LRL) in bytes. The maximum longest record length that can be specified depends on the file organization:

Sequential 32,767

Relative 16,383

Indexed-sequential 16,362

These values include control bytes for variable records with fixed-length control (VFC) format.

FILE_SIZE: n Specifies input file size in blocks. The maximum file size accepted is 4,294,967,295 blocks.

You can also use /FORMAT as an output file qualifier. See Section 9.9.2 for more information.
Example
$ SORT/KEY=(POS:40,SIZ:2,DESC) - _$CRA0:YRENDAVG.DAT/FORMAT=(RECORD_SIZE:41,FILE_SIZE:3) - _$DESCYRAVG.LIS
Because the input file YRENDAVG.DAT does not reside on a disk device or ANSI magnetic tape, file organization must be described by the /FORMAT qualifier.

9.9.2 Output File Qualifiers

The following output qualifiers can be used with the SORT and MERGE commands. To use an output file qualifier, include the qualifier immediately after the output file specification in the SORT or MERGE command line.

/ALLOCATION=n

Specifies the number of blocks, from 1 through 4,294,967,295, to be preallocated to the output file for optimization. Use this qualifier when you know that the output file allocation will differ substantially from the total input file allocation (for example, when reformatting data or omitting records).
The /ALLOCATION qualifier is required if the /CONTIGUOUS qualifier is used.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT - _$ SUMMARY.LIS/ALLOCATION=1000/CONTIGUOUS
This SORT command allocates 1000 contiguous blocks for the output file SUMMARY.LIS.

/BUCKET_SIZE=n

Specifies OpenVMS RMS bucket size (the number of 512-byte blocks per bucket) to be used by relative and indexed sequential output disk files for optimization. A value of 1 through 32 is allowed.
If the output file organization is the same as for the input files, the default value is the same as the bucket size of the first input file. If output file organization is different, the default value is 1.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS1.DAT,STATS2.DAT - _$ SUMMARY.LIS/BUCKET_SIZE=16/RELATIVE
This SORT command results in the output file SUMMARY.LIS that has a bucket size of 16 with relative organization.

/CONTIGUOUS

Requests that the output file be stored in contiguous disk blocks to decrease access time. Must be used with the /ALLOCATION qualifier. By default, Sort/Merge does not allocate contiguous disk blocks for the output file.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT - _$ SUMMARY.LIS/ALLOCATION=1000/CONTIGUOUS
This SORT command allocates 1,000 contiguous blocks for the output file SUMMARY.LIS.

/FORMAT=(type:n[,...])

Specifies the output file record format (FIXED:n, VARIABLE:n, or CONTROLLED:n) if it differs from the input file format. You can also specify the size (SIZE:n) or the block size (BLOCK_SIZE:n) of the file records.
If the Sort operation is a record or tag sort, the default output record format is the same as the first input file record format. If the Sort operation is an address or index sort, the default output record format is fixed record format. If the input files have different record formats, Sort provides an output record size that is large enough to contain the largest record in the input files.
You can specify the following qualifier values.

BLOCK_SIZE: n Specifies the output file's block size, in bytes, if you have directed the file to magnetic tape. If the input file is a tape file, the block size of the output file defaults to that of the input file. Otherwise, the output file block size defaults to the size used when the tape was mounted.

Acceptable values for n range from 20 to 65,532. To ensure correct data interchange with other Compaq systems, however, specify a block size of not more than 512 bytes. For compatibility with systems that are not made by Compaq, the block size should not exceed 2,048 bytes.

CONTROLLED: n Specifies variable with fixed-length control (VFC) records in the output file.

FIXED: n Specifies fixed-length records in the output file.

SIZE: n Specifies the size, in bytes, of the fixed portion of VFC (CONTROLLED) records, up to a maximum of 255 bytes. If you do not specify SIZE, the default is the size of the fixed portion of the first input file. If you specify this size as 0, OpenVMS RMS defaults the value to 2 bytes.

VARIABLE: n Specifies variable-length records in the output file.

For any qualifier value, you can optionally specify n as the maximum record size (in bytes) of the output records. The maximum record size allowed depends on the file organization:

Sequential files 32,767

Relative files 16,383

Indexed-sequential files 16,362

These maximum record size values include control bytes for variable records with fixed-length control (VFC) format.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT SUMMARY.LIS/FORMAT=FIXED:80
The input file STATS.DAT consists of variable-length records that are 80 bytes in length. The /FORMAT qualifier specifies that the output file, SUMMARY.LIS, consists of fixed-length records.

/INDEXED_SEQUENTIAL

Defines the file organization for the output file as indexed sequential. Note that the output file must already exist and must be empty. In addition, you must specify that the empty file is to be overlaid with the sorted records by using the /OVERLAY qualifier.
Example
$ CREATE/FDL=NEW.FDL AVERAGE.DAT $ SORT/KEY=(POS:1,SIZ:80) DATA.DAT,STATS.DAT - _$ AVERAGE.DAT/INDEXED_SEQUENTIAL/OVERLAY
The CREATE/FDL command creates the empty file AVERAGE.DAT. The SORT command specifies that the output file have an indexed-sequential organization and be written to the empty file AVERAGE.DAT.

/OVERLAY

Specifies an existing empty file that the output file is to be overlaid on, or written to. The /OVERLAY qualifier is required when you use the /INDEXED_SEQUENTIAL qualifier.
If the input file organization is indexed-sequential, the output file must already exist and be empty. If the output file is not empty, /OVERLAY does not write over the file. Instead, it appends the result of the sort to the existing output file.
You can use the CREATE/FDL utility to create an empty data file. Any attributes that you specify when creating the empty file then become attributes of the Sort output file.
Example
$ CREATE/FDL=NEW.FDL AVERAGE.DAT $ SORT/KEY=(POS:1,SIZ:80) STATS.DAT AVERAGE.DAT/OVERLAY
The FDL file NEW.FDL specifies special attributes for the file AVERAGE.DAT. When Sort writes output to that file, the resulting Sort output file has the attributes specified by the FDL file.

/RELATIVE

Defines the file organization for the output file as relative.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT SUMMARY.LIS/RELATIVE
Because the input file STATS.DAT is not a relative file and the output file SUMMARY.LIS will be, /RELATIVE qualifies the output file specification.

/SEQUENTIAL

Defines the file organization for the output file as sequential. This is the default for address and index sorting operations. The default for record and tag sorting operations is the organization of the first input file.
Example
$ SORT/KEY=(POS:1,SIZ:80) STATS.DAT SUMMARY.LIS/SEQUENTIAL
Because the input file STATS.DAT is not a sequential file and the output file SUMMARY.LIS will be, /SEQUENTIAL qualifies the output file specification.

Contents

Index