Average-case complexity
Script error: No such module "redirect hatnote". In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case complexity which considers the maximal complexity of the algorithm over all possible inputs.
There are three primary motivations for studying average-case complexity.[1] First, although some problems may be intractable in the worst-case, the inputs which elicit this behavior may rarely occur in practice, so the average-case complexity may be a more accurate measure of an algorithm's performance. Second, average-case complexity analysis provides tools and techniques to generate hard instances of problems which can be utilized in areas such as cryptography and derandomization. Third, average-case complexity allows discriminating the most efficient algorithm in practice among algorithms of equivalent best case complexity (for instance Quicksort).
Average-case analysis requires a notion of an "average" input to an algorithm, which leads to the problem of devising a probability distribution over inputs. Alternatively, a randomized algorithm can be used. The analysis of such algorithms leads to the related notion of an expected complexity.[2]Template:Rp
History and background
The average-case performance of algorithms has been studied since modern notions of computational efficiency were developed in the 1950s. Much of this initial work focused on problems for which worst-case polynomial time algorithms were already known.[3] In 1973, Donald Knuth[4] published Volume 3 of the Art of Computer Programming which extensively surveys average-case performance of algorithms for problems solvable in worst-case polynomial time, such as sorting and median-finding.
An efficient algorithm for NPScript error: No such module "Check for unknown parameters".-complete problems is generally characterized as one which runs in polynomial time for all inputs; this is equivalent to requiring efficient worst-case complexity. However, an algorithm which is inefficient on a "small" number of inputs may still be efficient for "most" inputs that occur in practice. Thus, it is desirable to study the properties of these algorithms where the average-case complexity may differ from the worst-case complexity and find methods to relate the two.
The fundamental notions of average-case complexity were developed by Leonid Levin in 1986 when he published a one-page paper[5] defining average-case complexity and completeness while giving an example of a complete problem for distNPScript error: No such module "Check for unknown parameters"., the average-case analogue of NPScript error: No such module "Check for unknown parameters"..
Definitions
Efficient average-case complexity
The first task is to precisely define what is meant by an algorithm which is efficient "on average". An initial attempt might define an efficient average-case algorithm as one which runs in expected polynomial time over all possible inputs. Such a definition has various shortcomings; in particular, it is not robust to changes in the computational model. For example, suppose algorithm Template:Mvar runs in time tA(x)Script error: No such module "Check for unknown parameters". on input Template:Mvar and algorithm Template:Mvar runs in time tA(x)2Script error: No such module "Check for unknown parameters". on input Template:Mvar; that is, Template:Mvar is quadratically slower than Template:Mvar. Intuitively, any definition of average-case efficiency should capture the idea that Template:Mvar is efficient-on-average if and only if Template:Mvar is efficient on-average. Suppose, however, that the inputs are drawn randomly from the uniform distribution of strings with length Template:Mvar, and that Template:Mvar runs in time n2Script error: No such module "Check for unknown parameters". on all inputs except the string 1nScript error: No such module "Check for unknown parameters". for which Template:Mvar takes time 2nScript error: No such module "Check for unknown parameters".. Then it can be easily checked that the expected running time of Template:Mvar is polynomial but the expected running time of Template:Mvar is exponential.[3]
To create a more robust definition of average-case efficiency, it makes sense to allow an algorithm Template:Mvar to run longer than polynomial time on some inputs but the fraction of inputs on which Template:Mvar requires larger and larger running time becomes smaller and smaller. This intuition is captured in the following formula for average polynomial running time, which balances the polynomial trade-off between running time and fraction of inputs:
for every n, t > 0Script error: No such module "Check for unknown parameters". and polynomial Template:Mvar, where tA(x)Script error: No such module "Check for unknown parameters". denotes the running time of algorithm Template:Mvar on input Template:Mvar, and Template:Mvar is a positive constant value.[6] Alternatively, this can be written as
for some constants Template:Mvar and Template:Mvar, where n = Template:AbsScript error: No such module "Check for unknown parameters"..[7] In other words, an algorithm Template:Mvar has good average-case complexity if, after running for tA(n)Script error: No such module "Check for unknown parameters". steps, Template:Mvar can solve all but a Template:SfracScript error: No such module "Check for unknown parameters". fraction of inputs of length Template:Mvar, for some ε, c > 0Script error: No such module "Check for unknown parameters"..[3]
Distributional problem
The next step is to define the "average" input to a particular problem. This is achieved by associating the inputs of each problem with a particular probability distribution. That is, an "average-case" problem consists of a language Template:Mvar and an associated probability distribution Template:Mvar which forms the pair (L, D)Script error: No such module "Check for unknown parameters"..[7] The two most common classes of distributions which are allowed are:
- Polynomial-time computable distributions (PScript error: No such module "Check for unknown parameters".-computable): these are distributions for which it is possible to compute the cumulative density of any given input Template:Mvar. More formally, given a probability distribution Template:Mvar and a string x ∈ Template:MsetnScript error: No such module "Check for unknown parameters". it is possible to compute the value in polynomial time. This implies that Pr[x]Script error: No such module "Check for unknown parameters". is also computable in polynomial time.
- Polynomial-time samplable distributions (PScript error: No such module "Check for unknown parameters".-samplable): these are distributions from which it is possible to draw random samples in polynomial time.
These two formulations, while similar, are not equivalent. If a distribution is PScript error: No such module "Check for unknown parameters".-computable it is also PScript error: No such module "Check for unknown parameters".-samplable, but the converse is not true if PScript error: No such module "Check for unknown parameters". ≠ P#PScript error: No such module "Check for unknown parameters"..[7]
AvgP and distNP
A distributional problem (L, D)Script error: No such module "Check for unknown parameters". is in the complexity class AvgPScript error: No such module "Check for unknown parameters". if there is an efficient average-case algorithm for Template:Mvar, as defined above. The class AvgPScript error: No such module "Check for unknown parameters". is occasionally called distPScript error: No such module "Check for unknown parameters". in the literature.[7]
A distributional problem (L, D)Script error: No such module "Check for unknown parameters". is in the complexity class distNPScript error: No such module "Check for unknown parameters". if Template:Mvar is in NPScript error: No such module "Check for unknown parameters". and Template:Mvar is PScript error: No such module "Check for unknown parameters".-computable. When Template:Mvar is in NPScript error: No such module "Check for unknown parameters". and Template:Mvar is PScript error: No such module "Check for unknown parameters".-samplable, (L, D)Script error: No such module "Check for unknown parameters". belongs to sampNPScript error: No such module "Check for unknown parameters"..[7]
Together, AvgPScript error: No such module "Check for unknown parameters". and distNPScript error: No such module "Check for unknown parameters". define the average-case analogues of PScript error: No such module "Check for unknown parameters". and NPScript error: No such module "Check for unknown parameters"., respectively.[7]
Reductions between distributional problems
Let (L,D)Script error: No such module "Check for unknown parameters". and (L′, D′)Script error: No such module "Check for unknown parameters". be two distributional problems. (L, D)Script error: No such module "Check for unknown parameters". average-case reduces to (L′, D′)Script error: No such module "Check for unknown parameters". (written (L, D) ≤AvgP (L′, D′)Script error: No such module "Check for unknown parameters".) if there is a function Template:Mvar that for every Template:Mvar, on input Template:Mvar can be computed in time polynomial in Template:Mvar and
- (Correctness) x ∈ LScript error: No such module "Check for unknown parameters". if and only if f(x) ∈ L′Script error: No such module "Check for unknown parameters".
- (Domination) There are polynomials Template:Mvar and Template:Mvar such that, for every Template:Mvar and Template:Mvar,
The domination condition enforces the notion that if problem (L, D)Script error: No such module "Check for unknown parameters". is hard on average, then (L′, D′)Script error: No such module "Check for unknown parameters". is also hard on average. Intuitively, a reduction should provide a way to solve an instance Template:Mvar of problem Template:Mvar by computing f(x)Script error: No such module "Check for unknown parameters". and feeding the output to the algorithm which solves Template:Mvar. Without the domination condition, this may not be possible since the algorithm which solves Template:Mvar in polynomial time on average may take super-polynomial time on a small number of inputs but Template:Mvar may map these inputs into a much larger set of Template:Mvar so that algorithm Template:Mvar no longer runs in polynomial time on average. The domination condition only allows such strings to occur polynomially as often in Template:Mvar.[6]
DistNP-complete problems
The average-case analogue to NPScript error: No such module "Check for unknown parameters".-completeness is distNPScript error: No such module "Check for unknown parameters".-completeness. A distributional problem (L′, D′)Script error: No such module "Check for unknown parameters". is distNPScript error: No such module "Check for unknown parameters".-complete if (L′, D′)Script error: No such module "Check for unknown parameters". is in distNPScript error: No such module "Check for unknown parameters". and for every (L, D)Script error: No such module "Check for unknown parameters". in distNPScript error: No such module "Check for unknown parameters"., (L, D)Script error: No such module "Check for unknown parameters". is average-case reducible to (L′, D′)Script error: No such module "Check for unknown parameters"..[7]
An example of a distNPScript error: No such module "Check for unknown parameters".-complete problem is the Bounded Halting Problem, (Template:Mvar,D)Script error: No such module "Check for unknown parameters". (for any PScript error: No such module "Check for unknown parameters".-computable D) defined as follows:
In his original paper, Levin showed an example of a distributional tiling problem that is average-case NPScript error: No such module "Check for unknown parameters".-complete.[5] A survey of known distNPScript error: No such module "Check for unknown parameters".-complete problems is available online.[6]
One area of active research involves finding new distNPScript error: No such module "Check for unknown parameters".-complete problems. However, finding such problems can be complicated due to a result of Gurevich which shows that any distributional problem with a flat distribution cannot be distNPScript error: No such module "Check for unknown parameters".-complete unless EXPScript error: No such module "Check for unknown parameters". = NEXPScript error: No such module "Check for unknown parameters"..[8] (A flat distribution Template:Mvar is one for which there exists an ε > 0Script error: No such module "Check for unknown parameters". such that for any Template:Mvar, μ(x) ≤ 2−Template:AbsεScript error: No such module "Check for unknown parameters"..) A result by Livne shows that all natural NPScript error: No such module "Check for unknown parameters".-complete problems have DistNPScript error: No such module "Check for unknown parameters".-complete versions.[9] However, the goal of finding a natural distributional problem that is DistNPScript error: No such module "Check for unknown parameters".-complete has not yet been achieved.[10]
Applications
Sorting algorithms
As mentioned above, much early work relating to average-case complexity focused on problems for which polynomial-time algorithms already existed, such as sorting. For example, many sorting algorithms which utilize randomness, such as Quicksort, have a worst-case running time of O(n2)Script error: No such module "Check for unknown parameters"., but an average-case running time of O(n log(n))Script error: No such module "Check for unknown parameters"., where Template:Mvar is the length of the input to be sorted.[2]
Cryptography
For most problems, average-case complexity analysis is undertaken to find efficient algorithms for a problem that is considered difficult in the worst-case. In cryptographic applications, however, the opposite is true: the worst-case complexity is irrelevant; we instead want a guarantee that the average-case complexity of every algorithm which "breaks" the cryptographic scheme is inefficient.[11]Template:Pn
Thus, all secure cryptographic schemes rely on the existence of one-way functions.[3] Although the existence of one-way functions is still an open problem, many candidate one-way functions are based on hard problems such as integer factorization or computing the discrete log. Note that it is not desirable for the candidate function to be NPScript error: No such module "Check for unknown parameters".-complete since this would only guarantee that there is likely no efficient algorithm for solving the problem in the worst case; what we actually want is a guarantee that no efficient algorithm can solve the problem over random inputs (i.e. the average case). In fact, both the integer factorization and discrete log problems are in NP ∩ Script error: No such module "Check for unknown parameters".coNPScript error: No such module "Check for unknown parameters"., and are therefore not believed to be NPScript error: No such module "Check for unknown parameters".-complete.[7] The fact that all of cryptography is predicated on the existence of average-case intractable problems in NPScript error: No such module "Check for unknown parameters". is one of the primary motivations for studying average-case complexity.
Other results
Yao's principle, from a 1978 paper by Andrew Yao, shows that for broad classes of computational problems, average-case complexity for a hard input distribution and a deterministic algorithm adapted to that distribution is the same thing as expected complexity for a fast randomized algorithm and its worst-case input.[12]
In 1990, Impagliazzo and Levin showed that if there is an efficient average-case algorithm for a distNPScript error: No such module "Check for unknown parameters".-complete problem under the uniform distribution, then there is an average-case algorithm for every problem in NPScript error: No such module "Check for unknown parameters". under any polynomial-time samplable distribution.[13] Applying this theory to natural distributional problems remains an outstanding open question.[3]
In 1992, Ben-David et al. showed that if all languages in distNPScript error: No such module "Check for unknown parameters". have good-on-average decision algorithms, they also have good-on-average search algorithms. Further, they show that this conclusion holds under a weaker assumption: if every language in NPScript error: No such module "Check for unknown parameters". is easy on average for decision algorithms with respect to the uniform distribution, then it is also easy on average for search algorithms with respect to the uniform distribution.[14] Thus, cryptographic one-way functions can exist only if there are distNPScript error: No such module "Check for unknown parameters". problems over the uniform distribution that are hard on average for decision algorithms.
In 1993, Feigenbaum and Fortnow showed that it is not possible to prove, under non-adaptive random reductions, that the existence of a good-on-average algorithm for a distNPScript error: No such module "Check for unknown parameters".-complete problem under the uniform distribution implies the existence of worst-case efficient algorithms for all problems in NPScript error: No such module "Check for unknown parameters"..[15] In 2003, Bogdanov and Trevisan generalized this result to arbitrary non-adaptive reductions.[16] These results show that it is unlikely that any association can be made between average-case complexity and worst-case complexity via reductions.[3]
See also
- Probabilistic analysis of algorithms
- NP-complete problems
- Worst-case complexity
- Amortized analysis
- Best, worst and average case
References
<templatestyles src="Reflist/styles.css" />
- ↑ Script error: No such module "Citation/CS1".
- ↑ a b Script error: No such module "citation/CS1".
- ↑ a b c d e f Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ a b Script error: No such module "Citation/CS1".
- ↑ a b c Script error: No such module "citation/CS1".
- ↑ a b c d e f g h i Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ R. Impagliazzo and L. Levin, "No Better Ways to Generate Hard NP Instances than Picking Uniformly at Random," in Proceedings of the 31st IEEE Sympo- sium on Foundations of Computer Science, pp. 812–821, 1990.
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
Script error: No such module "Check for unknown parameters".
Further reading
Pedagogical presentations:
- Script error: No such module "citation/CS1".
- Script error: No such module "citation/CS1".
- Script error: No such module "citation/CS1".
- Script error: No such module "citation/CS1".
The literature of average case complexity includes the following work:
- Script error: No such module "citation/CS1".
- Script error: No such module "citation/CS1"...
- Script error: No such module "citation/CS1"..
- Script error: No such module "citation/CS1"..
- Script error: No such module "citation/CS1"..
- Script error: No such module "citation/CS1".. See also 1989 draft.
- Script error: No such module "citation/CS1"..
- Script error: No such module "citation/CS1"..
- Script error: No such module "citation/CS1"..
- Script error: No such module "citation/CS1"..
- Script error: No such module "citation/CS1"..