HP OpenVMS I/O User’s Reference Manual: OpenVMS Version 8.4 > Chapter 10 Optional Features for Improving I/O Performance

10.2 Fast Path (Alpha and Integrity servers Only)

Fast Path is an optional feature designed to improve I/O performance. Three factors serve to throttle performance for OpenVMS on SMP systems.

  1. Time spent by a CPU waiting for memory to be faulted into its cache.

  2. Contention for the SCS/IOLOCK8 spinlock.

  3. Contention for the primary CPU on which all I/O completion is processed.

Fast Path addresses these factors as follows:

  1. Select a secondary CPU for a given device or port. and cause all I/O for that device to originate and complete on that CPU. This offloads the primary CPU and reduces cache faults.

  2. Replace dependence upon SCS/IOLOCK8 spinlock by providing a port-specific spinlock whenever possible.

  3. For the most common I/O requests, preallocate resources and provide an optimized path through the mainline code.

Using Fast Path features does not require source-code changes. It does require major changes to device drivers, so it has been implemented only for the newer high-performance devices. These currently service many CI, Fibre Channel, parallel SCSI, and LAN devices.

Table 10-1 lists the supported ports for each OpenVMS Alpha version.

Table 10-1 Supported Ports for Each Version of OpenVMS Alpha and Integrity servers

VersionSupported Ports

7.3-2

SMART Array 53xx, many LAN devices

7.3-1

KZPEA

7.3

CIXCD, CIPCA, KGPSA, KZPBA

7.1

CIXCD, CIPCA

7.0

CIXCD

 

Prior to OpenVMS Alpha Version 7.3-1, all hardware interrupts took place on the primary CPU. Interrupts from Fast Path enabled devices would have to be redirected from the primary CPU to a ''preferred'' CPU. However, this redirection still involved the primary CPU, and also incurred interprocessor overhead.

Starting with OpenVMS Alpha Version 7.3-1, hardware interrupts that are targeted for a ''preferred'' CPU go directly to the ''preferred'' CPU, thereby eliminating any I/O processing in the primary CPU. This major Fast Path enhancement is known as distributed interrupts.

NOTE: This feature is available on Fibre Channel, CI, and some SCSI ports on AlphaServer DS20, ES40/45, and GS series systems.

For more information about Fibre Channel, SCSI, and CI configurations, see Guidelines for OpenVMS Cluster Configurations.

10.2.1 Using Fast Path Features

Preferred CPU Selection

All Fast Path ports are assignable to CPUs. You can set a system parameter specifying the set of CPUs that are allowed to serve as preferred CPUs. This set is called the set of allowable CPUs. At any point in time, the set of CPUs that currently can have ports assigned to them, called the set of usable CPUs, is the intersection of the set of allowable CPUs, and the current set of running CPUs.

Each Fast Path Port is initially assigned to a CPU by the FASTPATH_SERVER process that runs at port initialization time. This process executes an automatic assignment algorithm that spreads Fast Path ports evenly among the usable CPUs. The FASTPATH_SERVER process also runs whenever a secondary CPU is started, and whenever the set of system parameters specifying the allowable CPUs is changed.

If the primary CPU is in the set of allowable CPUs, the initial distribution is biased against the primary CPU in that a port will only be assigned to the primary after ports have been assigned to each of the other usable CPUs.

To identify a device or port's current preferred CPU, you can use either $GETDVI or the SHOW DEVICE/FULL command. To identify the Fast Path ports currently assigned to a CPU, you use the SHOW CPU /FULL command.

You can directly assign a Fast Path port to a CPU, or request the system to automatically select the port's preferred CPU from a specific set of CPUs. To do this, you either issue a $QIO or use the SET DEVICE/PREFERRED_CPU command. This also sets the port's User Preferred CPU to be the selected CPU.

You can clear the port's User Preferred CPU by issuing either a $QIO, or by using the SET DEVICE/NOPREFERRED CPU DCL command.

You can redistribute the system assignable Fast Path ports across a subset of the set of usable CPUs by calling the $IO_FASTPATH system service.

Optimizing Application Performance

Processes running on a port's preferred CPU have an inherent advantage when issuing I/O to a port in that the overhead to assign the I/O to the preferred CPU can be avoided. An application process can use the $PROCESS_AFFINITY system service to assign itself to the preferred CPU of the device to which the majority of its I/O is sent.

With proper attention to assignment, a process's execution need never leave the preferred CPU. This presents a scalable process and I/O scheme for maximizing multiprocessor system operation. Like most RISC systems, Alpha system performance is highly dependent on the performance of CPU memory caches. Process assignment and preferred CPU assignment are two keys to minimizing the memory stalls in the application and in the operating system, thereby maximizing multiprocessor system throughput.

10.2.2 Managing Fast Path

This section describes how to manage Fast Path.

10.2.2.1 Fast Path System Parameters

There are three FAST_PATH system parameters:

  • FAST_PATH

  • FAST_PATH_PORTS

  • IO_PREFER_CPUS

These parameters can be used to control Fast Path as follows:

FAST_PATH

FAST_PATH is a static system parameter that enables (1) or disables (0) the Fast Path performance features for all Fast Path-capable ports.

Fast Path is enabled by default.

FAST_PATH_PORTS

FAST_PATH_PORTS is a 32-bit mask. Once Fast Path has been enabled by setting FAST_PATH to 1, FAST_PATH_PORTS can be used to selectively disable Fast Path for some specific adapter types.

The value of the FAST_PATH_PORTS system parameter is the sum of the values of the bits that have been set. Table 10-2 describes the bit mask:

Table 10-2 FAST_PATH_PORTS Bit Masks

BitMaskDescription

0

00000001

0 = Fast Path is ENABLED for KZPBA ports when FAST_PATH is set to 1.

  

1 = Fast Path is DISABLED for KZPBA ports.

1

00000002

0 = Fast Path is ENABLED for KGPSA ports when FAST_PATH is set to 1.

  

1 = Fast Path is DISABLED for KGPSA ports.

2

00000004

0 = Fast Path is ENABLED for KZPEA ports when FAST_PATH is set to 1.

  

1 = Fast Path is DISABLED for KZPEA ports.

3

00000008

0 = Fast Path is ENABLED for LAN ports when FAST_PATH is set to 1.

  

1 = Fast Path is DISABLED for LAN ports.

4

00000010

0 = Fast Path is ENABLED for KZPDC ports when FAST_PATH is set to 1.

  

1 = Fast Path is DISABLED for KZPDC ports.

 

The remaining bits are reserved for possible future adapter types.

The default setting for FAST_PATH_PORTS is 0; therefore, all supported ports are enabled.

Note that CI drivers are not controlled by FAST_PATH_PORTS. Fast Path for CI is enabled and disabled exclusively by the FAST_PATH system parameter.

IO_PREFER_CPUS

IO_PREFER_CPUS is a dynamic system parameter that controls the set of CPUs available for use as Fast Path preferred CPUs.

IO_PREFER_CPUS is a CPU bit mask specifying the CPUs that are allowed to serve as preferred CPUs and thus can be assigned a Fast Path port. CPUs whose bit is set in the IO_PREFER_CPUS bit mask are enabled for Fast Path port assignment. IO_PREFER_CPUS defaults to -1, which specifies that all CPUs are allowed to be assigned Fast Path ports.

You may want to disable the primary CPU from serving as a preferred CPU by clearing its bit in IO_PREFER_CPUS. This reserves the primary for use by non-Fast Path IO operations.

Changing the value of IO_PREFER_CPUS causes the FASTPATH_SERVER process to execute the automatic assignment algorithm that spreads Fast Path ports evenly among the new set of usable CPUs.

10.2.2.2 Identifying and Setting a Port's Preferred CPU

Following are the commands used to identify and set a preferred CPU for a port.

DCL SHOW DEVICE/FULL or $GETDVI DVI$_PREFERRED_CPU

To identify the preferred CPU for any Fast Path-capable device when Fast Path is enabled, use the DCL command SHOW DEVICE/FULL to display — whether or not the device supports Fast Path — the current preferred CPU ID and, if set, the User Preferred CPU ID for a port or disk device.

Alternatively, the $GETDVI system service or the DCL F$GETDVI lexical function returns the preferred CPU for a given device or file. The $GETDVI system service item code is DVI$_PREFERRED_CPU, and the F$GETDVI item code string argument is PREFERRED_CPU. The return argument is a 32-bit CPU bit mask with a bit set indicating the preferred CPU. A return argument containing a bit mask of zero indicates that no preferred CPU exists, either because Fast Path is disabled or the device is not a Fast Path-capable device. The return argument serves as a CPU bit mask input argument to the $PROCESS_AFFINITY system service. The argument can be used to assign an application process to the optimal preferred CPU.

For an application seeking optimal Fast Path benefits, you can code each application process to identify and run on the preferred CPU where the majority of the process' I/O activity occurs.

A high-availability feature of OpenVMS Cluster Systems is that dual-pathed devices automatically fail over to a secondary path, if the primary path becomes inoperable. Because a Fast Path device could fail over to another path or port, and thereby, to another preferred CPU, an application should occasionally reissue the $GETDVI in a timer thread to check that process assignment is optimal.

DCL SHOW CPU /FULL

You can use this DCL command to identify whether a CPU is enabled for use as a preferred CPU, and the current set of ports assigned to that CPU.

DCL SET DEVICE /PREFERRED_CPU and /NOPREFERRED_CPU

These commands allow you to specify a CPU or a set of candidate CPUs from which the operating system chooses the CPU to assign to the Fast Path port. The chosen CPU is called the preferred CPU for this Fast Path port. The Fast Path port's interrupt I/O completion processing and I/O initiation processing is performed on this preferred CPU.

In addition to selecting the preferred CPU, the User Preferred CPU is set for this port. Setting the User Preferred CPU prevents the port from being reassigned to another CPU unless the User Preferred CPU is being stopped. The qualifier can be negated. When the /NOPREFERRED_CPUS qualifier is specified, the User Preferred CPU is cleared for the port, but it still remains a Fast Path port, and the current preferred CPU is not changed.

If both /PREFERRED_CPUS and /NOPREFERRED_CPUS are specified on the same command line, /NOPREFERRED_CPUS is ignored.

$QIO IO$_SETPRFPATH ! IO$M_PREFERRED_CPU [!IO$M_SYS_ASSIGNABLE]

You can change the assignment of a Fast Path port to a CPU by issuing a $QIO IO$_SETPRFPATH (Set Preferred Path) to the port device, for example, PNA0. The IO$M_PREFERRED_CPU modifier must be set, and the $QIO argument P1 must be set to either 0 or the address of a 32-bit CPU bit mask with a bit set indicating the new preferred CPU. On return from the I/O, the port and its associated devices are all assigned to a new preferred CPU. Note that explicitly setting the preferred CPU overrides any default assignment of Fast Path ports to CPUs. This interface allows you the flexibility to load balance I/O activity over multiple CPUs in an SMP system. This is important because I/O activity can change over the course of a day or week.

The $QIO passes in either a set containing one or more candidate CPUs, or 0 as a wildcard value indicating the set of usable CPUs. If the candidate set contains only one CPU, you are explicitly designating the new preferred CPU. If the candidate set contains multiple CPUs, you are requesting use of the automatic preferred CPU assignment algorithm to select a suitable CPU from the candidate set.

Including the IO$M_SYS_ASSIGNABLE modifier inhibits setting the selected CPU as the device's User Preferred CPU.

The $QIO or the SET DEVICE/PREFERRED_CPU command makes a best effort to assign the port to a CPU. However, it is possible for this request to return failure for the following reasons:

  • There is no intersection between the candidate set and the node's set of usable CPUs.

  • There is resource contention. If after a reasonable effort the request is unable to acquire a key system resource, the request fails. Some key resources include Fast Path spinlock, the CPU mutex, and a CPU transition lock.

If the $QIO or SET DEVICE/PREFERRED_CPU returns failure, you should consider retrying either immediately or after a short delay. It is possible that a large number of ports were being reassigned, and the request failed due to resource contention.

$IO_FASTPATH

The $IO_FASTPATH system service performs operations on the set of Fast Path devices and CPUs enabled for Fast Path use. The $IO_FASTPATHW system service completes synchronously. That is, it returns after the operation is complete.

The FP$K_BALANCE_PORTS function code specifies that the system service is to distribute the set of system assignable Fast Path ports across the intersection of a caller-supplied set of candidate CPUs.

10.2.3 Fast Path Restrictions

Fast Path restrictions include the following:

  • Only high-volume I/Os are optimized.

    Fast Path streamlines the operation of high-volume I/O. I/O that does not meet the definition of high-volume is not optimized.

    A high-volume Fast Path I/O is a read or write operation to a Fast Path device without special I/O modifiers issued at a time when necessary resources have been pre-allocated and there are no circumstances restricting I/O operations.

  • Send-credits resource must be managed for DSA controllers.

    Applications seeking maximum performance must ensure the availability of sufficient I/O resources.

    The only I/O resource that a Fast Path user needs to be concerned about is send credits. Send credits are extended by DSA controllers to host systems and represent the maximum number of I/Os that can be outstanding at any given point in time. If an application sends an unlimited number of simultaneous I/Os to a controller, it is likely that some I/O will back up waiting for send credits.

    You can tell whether the send-credit limit is being exceeded by using the DCL command SHOW CLUSTER/CONTINUOUS, followed by an ADD CONNECTIONS, CR_WAIT command. Rapidly increasing credit-wait counts for the disk-class driver connections (a LOC_PROC_NAME name of VMS$DISK_CL_DRVR) is a sign that an application may be incurring send-credit waits.

    To ensure sufficient send credits, some controllers, like the HSC and HSJ, allow the number of send credits to vary; however, not all controllers have this flexibility, and different controllers have different send-credit limits. The best workaround is to know your application access patterns and look for send-credit waits.

    If the number of send credits is being exhausted on one node, then add another controller to spread the load over multiple controllers. An alternative is to rework the application to load balance controller activity throughout the cluster, spreading a given controller's disk load over multiple nodes and allowing an application to exceed the send credits allotted to one node.

10.2.4 Special Considerations for Fast Path on Multi-RAD Systems

On systems supporting multiple resource affinity domains (RADs), the best performance for Fast Path ports is usually obtained by setting the Fast Path preferred CPU assignment to a CPU within the same RAD as the port.

The FASTPATH_SERVER restricts its distribution of ports accordingly whenever possible. If a port should be within a RAD without available Fast Path CPUs, the system sets the preferred CPU to the primary CPU.

Because you can override this assignment by the methods described in this chapter, care should be taken that reassignment does not sacrifice the performance improvements provided by localizing activity to a single RAD.