HP OpenVMS Systems Documentation

Guide to OpenVMS File Applications

10.3.2 Optimizing a Data File

To improve the performance of a data file, use a 3-step procedure that includes analysis, FDL optimization, and conversion of the file. If used periodically during the life of a data file, this procedure yields a file that performs optimally.

For the analysis, use the ANALYZE/RMS_FILE/FDL command to create an output file (analysis-fdl-file) that reflects the current state of the data file. The command syntax for creating the analysis-fdl-file follows:

ANALYZE/RMS_FILE/FDL/OUTPUT=analysis-fdl-file original-data-file

The output file analysis-fdl-file contains all of the information and statistics about the data file, including create-time attributes and information that reflects changes made to the structure and contents of the data file over its life.

For FDL optimization, use the Edit/FDL utility to produce an optimized output file (optimized-fdl-file). You can do this by modifying either the orginal FDL file (original-fdl-file) if available, or the FDL output of the file analysis analysis-fdl-file.

Modification of an FDL file can be performed either interactively using a terminal dialogue or noninteractively by allowing the Edit/FDL utility to calculate optimal values based on analysis information.

To optimize the file interactively using an OPTIMIZE script, use a command with the following format:

EDIT/FDL/ANALYSIS=analysis-fdl-file/SCRIPT=OPTIMIZE-
/OUTPUT=optimized-fdl-file original-fdl-file To optimize the file noninteractively, use a command with the following format:

EDIT/FDL/ANALYSIS=analysis-fdl-file/NOINTERACTIVE-
/OUTPUT=optimized-fdl-file original-fdl-file The optimized-fdl-file parameter is the optimized version of the original FDL file.

Conversion is the process of applying the optimized FDL file to the original data file. You use the Convert utility to do this using a command with the following syntax:

CONVERT/FDL=optimized-fdl-file original-data-file new-data-file

10.4 Making a File Contiguous

If your file has been used for some time or if it is extremely volatile, the numerous deletions and insertions of records may have caused the optimal design of the file to deteriorate. For example, numerous extensions will degrade performance by causing window-turn operations. In indexed files, deletions can cause empty but unusable buckets to accumulate.

If additions or insertions to a file cause too many extensions, the file's performance will also deteriorate. To improve performance, you could increase the file's window size, but this uses an expensive system resource and at some point may itself hurt performance. A better method is to make the file contiguous again.

This section presents techniques for cleaning up your files. These techniques include using the Copy utility, the Convert utility, and the Convert/Reclaim utility.

10.4.1 Using the Copy Utility

You can use the COPY command with the /CONTIGUOUS qualifier to copy the file, creating a new contiguous version. The /CONTIGUOUS qualifier can be used only on an output file.

To use the COPY command with the /CONTIGUOUS qualifier, use the following command syntax:

COPY input-filespec output-filespec/CONTIGUOUS

If you do not want to rename the file, use the same name for input-filespec and output-filespec.

By default, if the input file is contiguous, COPY likewise tries to create a contiguous output file. By using the /CONTIGUOUS qualifier, you ensure that the output file is copied to consecutive physical disk blocks.

The /CONTIGUOUS qualifier can only be used when you copy disk files; it does not apply to tape files. For more information, see the COPY command in the OpenVMS DCL Dictionary.

10.4.2 Using the Convert Utility

The Convert utility can also make a file contiguous if contiguity is an original attribute of the file.

To use the Convert utility to make a file contiguous, use the following command syntax:

CONVERT input-filespec output-filespec

If you do not want to rename the file, use the same name for input-filespec and output-filespec.

10.4.3 Reclaiming Buckets in Prolog 3 Files

If you delete a number of records from a Prolog 3 indexed file, it is possible that you deleted all of the data entries in a particular bucket. RMS generally cannot use such empty buckets to write new records.

With Prolog 3 indexed files, you can reclaim such buckets by using the Convert/Reclaim utility. This utility allows you to reclaim the buckets without incurring the overhead of reorganizing the file with CONVERT.

As the data buckets are reclaimed, the pointers to them in the index buckets are deleted. If as a result any of the index buckets become empty, they too are reclaimed.

Note that RFA access is retained after bucket reclamation. The only effect that CONVERT/RECLAIM has on a Prolog 3 indexed file is that empty buckets are reclaimed.

To use CONVERT/RECLAIM, use the following command syntax, in which filespec specifies a Prolog 3 indexed file:

CONVERT/RECLAIM filespec

Please note that the file cannot be open for shared access at the time that you give the CONVERT/RECLAIM command.

10.5 Reorganizing a File

Using the Convert utility is the easiest way to reorganize a file. In addition, CONVERT cleans up split buckets in indexed files. Also, because the file is completely reorganized, buckets in which all the records were deleted will disappear. (Note that this is not the same as bucket reclamation. With CONVERT, the file becomes a new file and records receive new RFAs.)

To use the Convert utility to reorganize a file, use the following command syntax:

CONVERT input-filespec output-filespec

If you do not want to rename the file, use the same name for input-filespec and output-filespec.

10.6 Making Archive Copies

Another part of maintaining files is making sure that you protect the data in them. You should keep duplicates of your files in another place in case something happens to the originals. In other words, you need to back up your files. Then, if something does happen to your original data, you can restore the duplicate files.

The Backup utility (BACKUP) allows you to create backup copies of files and directories, and to restore them as well. These backup copies are called save sets, and they can reside on either disk or magnetic tape. Save sets are also written in BACKUP format; only BACKUP can interpret the data.

Unlike the DCL command COPY, which makes new copies of files (updating the revision dates and assigning protection from the defaults that apply), BACKUP makes copies that are identical in all respects to the originals, including dates and protection.

To use the Backup utility to create a save set of your file, use the following command syntax:

BACKUP input-filespec output-filespec[/SAVE_SET]

You have to use the /SAVE_SET qualifier only if the output file will be backed up to disk. You can omit the qualifier for magnetic tape.

For more information about BACKUP, see the description of the Backup utility in the OpenVMS System Management Utilities Reference Manual.

Appendix A
Edit/FDL Utility Optimization Algorithms

This appendix lists the algorithms used by the Edit/FDL utility to determine the optimum values for file attributes.

A.1 Allocation

For sequential files with block spanning, the Edit/FDL utility allocates enough blocks to hold the specified number of records of mean size. If you do not allow block spanning, the Edit/FDL utility factors in the potential wasted space at the end of each block.

For relative files, the Edit/FDL utility calculates the total number of buckets in the file and then allocates enough blocks to hold the required number of buckets and associated overhead. The Edit/FDL utility calculates the total number of buckets by dividing the total number of records in the file by the bucket record capacity. The overhead consists of the prolog which is equal to one block and is stored in VBN 1.

For indexed files, the Edit/FDL utility calculates the depth to determine the actual bucket size and number of buckets at each level of the index. It then allocates enough blocks to hold the required number of buckets. Areas for the data level (Level 0) have separate allocations from the areas for the index levels of each key.

In all cases, allocations are rounded up to a multiple of bucket size.

A.2 Extension Size

For sequential files, the Edit/FDL utility sets the extension size to one-tenth of the allocation size and truncates any fraction. For relative files and indexed files, the Edit/FDL utility extends the file by 25 percent rounded up to the next multiple of the bucket size.

A.3 Bucket Size

Because most records that the Edit/FDL utility accesses are close to each other, it makes the buckets large enough to hold 16 records or the total record capacity of the file, whichever is smaller. The maximum bucket size is 63 blocks.

For indexed files, the Edit/FDL utility permits you to decide the bucket size for any particular index. The data and index levels get the same bucket size but you can use the MODIFY command to change these values.

The Edit/FDL utility calculates the default bucket size by first finding the most common index depth produced by the various bucket sizes. If you specify smaller buffers rather than fewer levels, the Edit/FDL utility establishes the default bucket size as the smallest size needed to produce the most common depth. On Surface_Plot graphs, these values are shown on the leftmost edge of each bucket size.

Note

If you specify a separate bucket size for the Level 1 index, it should match the bucket size assigned to the rest of the index.

The bucket size is always a multiple of disk cluster size. The ANALYZE/RMS_FILE primary attribute ANALYSIS_OF_KEY now has a new secondary attribute called LEVEL1_RECORD_COUNT that represents the index level immediately above the data. It makes the tuning algorithm more accurate when duplicate key values are specified.

A.4 Global Buffers

The global buffer count is the number of I/O buffers that two or more processes can access. This algorithm tries to cache or "map" the whole Key 0 index (at least up to a point) into memory for quicker and more efficient access.

A.5 Index Depth

The indexed design routines simulate the loading of data buckets with records based on your data regarding key sizes, key positions, record sizes (mean and maximum), compression values, load method, and fill factors.

When the Edit/FDL utility finds the number of required data buckets, it can determine the actual number of index records in the next level up (each of which points to a data bucket). The process is repeated until all the required index records for a level can fit in one bucket, the root bucket. When a file exceeds 32 levels, the Edit/FDL utility issues an error message.

With a line_plot, the design calculations are performed up to 63 times---once for each legal bucket size. With a surface_plot, each line of the plot is equivalent to a line_plot with a different value for the variable on the Y-axis.

Glossary

This glossary defines terms used in this manual.

accessor: A process that accesses a file or a record stream that accesses a record.

alternate key: An optional key within the data records in an indexed file; used by RMS to build an alternate index. See also key (indexed file) and primary key.

area: An RMS-maintained region of an indexed file. It allows you to specify placement or specific bucket sizes, or both, for particular portions of a file. An area consists of any number of buckets, and there may be from 1 to 255 areas in a file.

asynchronous record operation: An operation in which your program may possibly regain control before the completion of a record retrieval or storage request. Completion ASTs and the Wait service are the mechanisms provided by RMS for programs to synchronize with asynchronous record operations. See also synchronous record operation.

bits per inch: The recording density of a magnetic tape. Indicates how many characters can fit on one inch of the recording surface. See also density.

block: The smallest number of consecutive bytes that RMS transfers during read and write operations. A block is 512 8-bit bytes on a Files--11 On-Disk Structure disk; on magnetic tape, a block may be anywhere from 8 to 8192 bytes.

block I/O: The set of RMS procedures that allows you direct access to the blocks of a file regardless of file organization.

block spanning: In a sequential file, the option for records to cross block boundaries.

bootstrap block: A block in the index file of a system disk. Can contain a program that loads the operating system into memory.

bucket: A storage structure, consisting of 1 to 32 blocks, used for building and processing relative and indexed files. A bucket contains one or more records or record cells. Buckets are the units of contiguous transfer between RMS buffers and the disk.

bucket split: The result of inserting records into a full bucket. To minimize bucket splits, RMS attempts to keep half of the records in the original bucket and transfer the remaining records to a newly created bucket.

buffer: A memory area used to temporarily store data. Buffers are generally categorized as being either user buffers or I/O buffers.

cluster: The basic unit of space allocation on a Files--11 On-Disk Structure volume. Consists of one or more contiguous blocks, with the number being specified when the volume is initialized.

contiguous area: A group of physically adjacent blocks.

count field: A 2-byte field prefixed to a variable-length record that specifies the number of data bytes in the record. This field may be formatted in either LSB or MSB format.

cylinder: The tracks at the same radius on all recording surfaces of a disk.

density: The number of bits per inch (bpi) of magnetic tape. Typical values are 800 bpi and 1600 bpi. See also bits per inch.

directory: A file used to locate files on a volume. A directory file contains a list of files and their unique internal identifications.

directory tree: The subdirectories created beneath a directory and the subdirectories within the subdirectories (and so forth).

disk: See volume (disk).

extent: One or more adjacent clusters allocated to a file or to a portion of a file.

FDL: See File Definition Language.

file: An organized collection of related items (records) maintained in an accessible storage area, such as disk or tape.

File Definition Language (FDL): A special-purpose language used to write file creation and run-time specifications for data files. These specifications are written in text files called FDL files; they are then used by the RMS utilities and library routines to create the actual data files.

file header: A block in the index file describing a file on a Files--11 On-Disk Structure disk, including the location of the file's extents. There is at least one file header for every file on the disk.

file organization: The physical arrangement of data in the file. You select the specific organization from those offered by RMS, based on your individual needs for efficient data storage and retrieval. See also indexed file organization, relative file organization, and sequential file organization.

Files--11 On-Disk Structure: The standard physical disk structure used by RMS.

fixed-length control field: A fixed-size area, prefixed to a VFC record, containing additional information that can be processed separately and that may have no direct relationship to the other contents of the record. For example, the fixed-length control field might contain line sequence numbers for use in editing operations.

fixed-length record format: Property of a file in which all records are the same length. This format provides simplicity in determining the exact location of a record in the file and eliminates the need to prefix a record size field to each record.

global buffer: A buffer that many processes share.

home block: A block in the index file, normally next to the bootstrap block, that identifies the volume as a Files--11 On-Disk Structure volume and provides specific information about the volume, such as volume label and protection.

index: The structure that allows retrieval of records in an indexed file by key value. See also key (indexed file).

index file: A file on each Files--11 On-Disk Structure volume that provides the means for identification and initial access to the volume. Contains the access information for all files (including itself) on the volume: bootstrap block, home block, file headers.

indexed file organization: A file organization that allows random retrieval of records by key value and sequential retrieval of records in sorted order by key value. See also key (indexed file).

interrecord gap (IRG): An interval of blank space between data records on the recording surface of a magnetic tape. The IRG enables the tape unit to decelerate, stop if necessary, and accelerate between record operations.

I/O buffer: A buffer used for performing input/output operations.

IRG: See interrecord gap.

key (indexed file): A character string, a packed decimal number, a 2- or 4-byte unsigned binary number, or a 2- or 4-byte signed integer within each data record in an indexed file. You define the length and location within the records; RMS uses the key to build an index. See also primary key, alternate key, and random access by key value.

key (relative file): The relative record number of each data record cell in a data file; RMS uses the relative record numbers to identify and access data records in a relative file in random access mode. See also relative record number.

local buffer: A buffer that is dedicated to one process.

locate mode: Technique used for a record input operation in which the data records are not copied from the I/O buffer, but a pointer is returned to the record in the I/O buffer. See also move mode.

move mode: Technique used for a record transfer in which the data records are copied between the I/O buffer and your program buffer for calculations or operations on the record. See also locate mode.

multiblock: An I/O unit that includes up to 127 blocks. Use is restricted to sequential files.

multiple-extent file: A disk file having two or more extents.

native mode: The processor's primary execution mode in which the programmed instructions are interpreted as byte-aligned, variable-length instructions that operate on the following data types: byte, word, longword, and quadword integers; floating and double floating character strings; packed decimals; and variable-length bit fields. The other instruction execution mode is compatibility mode.

OpenVMS RMS: See RMS (Record Management Services).

primary key: The mandatory key within the data records of an indexed file; used to determine the placement of records within the file and to build the primary index. See also key (indexed file) and alternate key.

random access by key (indexed file): Retrieval of a data record in an indexed file by either a primary or alternate key within the data record. See also key (indexed file).

random access by key (relative file): Retrieval of a data record in a relative file by the relative record number of the record. See also key (relative files).

random access by record file address (RFA): Retrieval of a record by the record's unique address, which RMS returns to you. This record access mode is the only means of randomly accessing a sequential file containing variable-length records.

random access by relative record number: Retrieval of a record by its relative record number. For relative files and sequential files (on disk devices) that contain fixed-length records, random access by relative record number is synonymous with random access by key. See also random access by key (relative files only) and relative record number.

read-ahead processing: A software option used for sequentially accessing sequential files using two buffers. One buffer holds records to be read from the disk. The other buffer awaits I/O completion.

record: A set of related data that your program treats as a unit.

record access mode: The manner in which RMS retrieves or stores records in a file. Available record access modes are determined by the file organization and specified by your program.

record access mode switching: Term applied to the switching from one type of record access mode to another while processing a file.

record blocking: The technique of grouping multiple records into a single block. On magnetic tape, an IRG is placed after the block rather than after each record. This technique reduces the number of I/O transfers required to read or write the data, and, in addition (for magnetic tape), it increases the amount of usable storage area. Record blocking also applies to disk files.

record cell: A fixed-length area in a relative file that can contain a record. Fixed-length record cells permit RMS to directly calculate the record's actual position in the file.

record file address (RFA): The unique address RMS returns to your program whenever it accesses a record. Using the RFA, your program can access disk records randomly regardless of file organization. The RFA is valid only for the life of the file, and when an indexed file is reorganized, each record's RFA will typically change.

record format: The way a record physically appears on the recording surface of the storage medium. The record format defines the method for determining record length.

record length: The size of a record in bytes.

record locking: A facility that prevents access to a record by more than one record stream or process until the initiating record stream or process releases the record.

Record Management Services: See RMS (Record Management Services).

record stream: The access environment for reading, writing, deleting and updating records.

relative file organization: The arrangement of records in a file in which each record occupies a cell of equal length within a bucket. Each cell is assigned a successive number, called a relative record number, which represents the cell's position relative to the beginning of the file.

relative record number: An identification number used to specify the position of a record cell relative to the beginning of the file; used as the key during random access by key mode to relative files.

reorganization: A record-by-record copy of an indexed file to another indexed file with the same key attributes as the input file.

RFA: See record file address.

RMS (Record Management Services): The file and record access subsystem of the operating system. RMS helps your application program process records within files, thereby allowing interaction between your application program and the data.

RMS--11: A set of routines that is linked with compatibility mode and PDP--11 programs and provides similar features for RMS. The file organizations and record formats used by RMS--11 are very similar to those of RMS; one exception is that RMS--11 does not support Prolog 3 indexed files, which are supported by RMS.

root bucket: The primary routing bucket for an index; geometrically, the top of the index tree. When a key search begins, RMS goes first to the index root bucket to determine which bucket, at the next lower level, is the next link in the bucket chain.

seek time: The time required to position the read/write heads over the selected track.

sequential file organization: The arrangement of records in a file in one-after-the-other fashion. Records appear in the order in which they were written.

sequential record access mode: Record storage or retrieval that starts at a designated point in the file and continues in one-after-the-other fashion through the file. That is, records are accessed in the order in which they physically appear in the file.

shared access: A file management technique that allows more than one user to simultaneously access a file or a group of files.

stream: An access window to a file associated with a record access control block (RAB) supporting record operation requests.

Contents

Index

HP OpenVMS Systems Documentation

Guide to OpenVMS File Applications

10.3.2 Optimizing a Data File

Appendix AEdit/FDL Utility Optimization Algorithms

Glossary

Appendix A
Edit/FDL Utility Optimization Algorithms