STL Complexity Specifications

STL container, algorithm, and concept specifications include asymptotic complexity specifications. For example, iterators are required to take constant time, that is the time required by an iterator operation should be no more than a fixed constant, independent of the size of the container to which it refers.

Clearly programs will still function if a program component ignores the complexity specifications. Nonetheless, these specifications are an important part of the interface between STL components and code that uses them. If they are ignored, the performance of the resulting program will often render it useless.

As an example, consider the STL vector container. Ignoring the complexity specification, it is possible to implement vector using the same underlying data structure as list, i.e. as a doubly linked list. But for a vector of length 10,000, this would probably slow down an average computation of v[i] by something like a factor of 5,000. For a program that requires many vector accesses, such as a typical numerical computation, this is likely to change an execution time of minutes to days.

This does not preclude the use of STL algorithms in conjunction with containers or iterators that do not meet the standard complexity specifications. This is occasionally quite useful, especially if the code is either not performance critical, or other requirements on the container make the performance specifications unrealizable. But this has two potential problems. First, the algorithm may no longer be the right one, or even a reasonable one, for the problem. A different algorithm may be better tailored to actual relative costs of the container operations. Second, the algorithm is, of course, unlikely to satisfy its specified complexity constraint.

The complexity specifications in STL are, of necessity, an oversimplification. A full specification would describe exactly how the running time of an operation varies with that of the operations it invokes. The result would be rather unmanageable for the user, who would have to be keep track of large amounts of irrelevant detail. It would be overly constraining on the implementor, since overall improvements on the existing algorithms may not satisfy such detailed constraints.

Concept specifications (e.g. Forward Iterator or Container) specify complexity requirements that should be met by all instances of the concept. This is the minimum behavior required by operations (e.g. sort) parameterized with respect to the concept. Any specific instance (e.g. vector) is likely to perform better in at least some cases.

It is difficult to specify precisely when an algorithm satisfies a performance constraint. Does copying a vector on a 16-bit embedded processor take constant time? After all, the size of the vector is limited to some value less than 65,536. Thus the number of memory operations involved in the copy operation is certainly bounded by a constant. It is even conceivable that the worst case vector copy time on this processor may be less than the worst-case time for a single memory access on a machine with paged virtual memory. Nonetheless, it would be intuitively wrong to describe a vector copy or a list traversal as being a constant time operation. Even on this machine, a vector implemented as a list is unlikely to yield satisfactory performance. (Of course, so would an implementation that looped for a second for every vector access, although that would clearly run in constant time. The point here is to communicate the proper intent between implementor and user, not to guard against malicious or silly implementations.)

Fundamentally, it is difficult to define the notion of asymptotic algorithm complexity precisely for real computer hardware instead of an abstract machine model. Thus we settle for the following guidelines:

For an algorithm A to have running time O(f(n)), there must be a corresponding algorithm A' that is correct on machines with arbitrarily long pointer and size_t types, such that A and A' perform essentially the same sequence of operations on the actual hardware. (In simple cases A and A' will be the same. In other cases A may have been simplified with the knowledge that adresses are bounded.) For inputs of sufficiently large size n, A' must take at most time Cf(n), where C is a constant, independent of both n and the address size. (Pointer, size_t, and ptrdiff_t operations are presumed to take constant time independent of their size.)
All container or iterator complexity specifications refer to amortized complexity. An individual operation may take longer than specified. But any sufficiently long sequence of operations on the same container or iterator will take at most as long as the corresponding sum of the specified operation costs.
Algorithms specify either worst-case or average case performance, and identify which. Unless otherwise stated, averages assume that container elements are chosen from a finite type with more possible values than the size of the container, and that container elements are independently uniformly distributed.
A complexity specification for an operation f assumes that operations invoked by f require at most the specified runtime. But algorithms generally remain appropriate if the invoked operations are no more than a logarithmic factor slower than specified in the expected case.
If operations are more expensive than assumed by a function F in the current STL, then F will slow down at most in proportion to the added cost. Any future operations that fail to satisfy this property will make that explicit.
To make this precise, assume F is specified to use time f(m) for input of size m. F uses operations G_k, with specified running times g_k(n) on input size n. If F is used in a context in which each G_k is slower than expected by at most a factor h(n), then F slows down by at most a factor h(m). This holds because none of the current algorithms ever apply the operations G_k to inputs significantly larger than m.

STL Main Page