Applications of Randomness in System Performance Measurement
T. Blackwell.

Ph.D. Thesis, Harvard University, 1998.


Postscript Version

This thesis presents and analyzes a simple principle for building systems: that there should be a random component in all arbitrary decisions. If no randomness is used, system performance can vary widely and unpredictably due to small changes in the system workload or configuration. This makes measurements hard to reproduce and less meaningful as predictors of performance that could be expected in similar situations. To measure the sensitivity of non-randomized systems to slight configuration changes, we measured the variation in performance measures of TCP/IP and workstation memory systems as a result of "small" configuration perturbations. By "small" we mean within the range over which things may change unintentionally due to other modifications being evaluated, or within the range of accuracy that an independent researcher could reasonably achieve.

For TCP/IP, changes of a few percent in link propagation delays and other parameters caused order of magnitude shifts in bandwidth allocation between competing connections. For memory systems, changes in the essentially arbitrary order in which functions were arranged in memory caused changes in runtime of tens of percent for single benchmarks, and of a few percent when averaged across a suite of benchmarks. In both applications the measured variability is larger than performance increases often reported for new improved designs, suggesting that many published measurements of the benefits of new schemes may be erroneous or at least irreproducible.

To make TCP/IP and memory systems measurable enough to make benchmark results meaningful and convincing, randomness must be added. Methods for adding randomness to conventional program linkers, to linkers which try to optimize memory system performance by avoiding cache conflicts, and to TCP/IP are presented and analyzed. In all of the systems, various amounts of randomness can be added in many different places. We show how to choose reasonable amounts of randomness based on measuring configuration sensitivity, and propose specific recipies for randomizing TCP/IP and memory systems. Substantial reductions in the configuration sensitivity are demonstrated, making measurements much more robust and meaningful. The accuracy of the results increases with the number of runs and thus is limited only by the available computing resources.

When the overall performance of a system is strongly influenced by the worst case behavior, reducing the sensitivity of the system can also make it perform better. Using average waiting time as a metric, TCP/IP performance is shown to improve significantly when randomization is added to the sending host's congestion window calculations. Although the improvements are less than those achieved by previously proposed schemes using randomized packet discard algorithms inside the network, the proposed modifications can be implemented entirely in the sending host and so can be deployed more easily.


Copyright 2001-2005, Trevor Blackwell. Home