Law of the unconscious statistician

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Template:Short description In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem which expresses the expected value of a function g(X)Script error: No such module "Check for unknown parameters". of a random variable Template:Mvar in terms of Template:Mvar and the probability distribution of Template:Mvar.

The form of the law depends on the type of random variable Template:Mvar in question. If the distribution of Template:Mvar is discrete and one knows its probability mass function pXScript error: No such module "Check for unknown parameters"., then the expected value of g(X)Script error: No such module "Check for unknown parameters". is E[g(X)]=xg(x)pX(x), where the sum is over all possible values Template:Mvar of Template:Mvar. If instead the distribution of Template:Mvar is continuous with probability density function fXScript error: No such module "Check for unknown parameters"., then the expected value of g(X)Script error: No such module "Check for unknown parameters". is E[g(X)]=g(x)fX(x)dx

Both of these special cases can be expressed in terms of the cumulative probability distribution function FXScript error: No such module "Check for unknown parameters". of Template:Mvar, with the expected value of g(X)Script error: No such module "Check for unknown parameters". now given by the Lebesgue–Stieltjes integral E[g(X)]=g(x)dFX(x).

In even greater generality, Template:Mvar could be a random element in any measurable space, in which case the law is given in terms of measure theory and the Lebesgue integral. In this setting, there is no need to restrict the context to probability measures, and the law becomes a general theorem of mathematical analysis on Lebesgue integration relative to a pushforward measure.

Etymology

This proposition is (sometimes) known as the law of the unconscious statistician because of a purported tendency to think of the aforementioned law as the very definition of the expected value of a function g(X)Script error: No such module "Check for unknown parameters". and a random variable XScript error: No such module "Check for unknown parameters"., rather than (more formally) as a consequence of the true definition of expected value.Template:Sfnm The naming is sometimes attributed to Sheldon Ross' textbook Introduction to Probability Models, although he removed the reference in later editions.Template:Sfnm Many statistics textbooks do present the result as the definition of expected value.Template:Sfnm

Joint distributions

A similar property holds for joint distributions, or equivalently, for random vectors. For discrete random variables X and Y, a function of two variables g, and joint probability mass function pX,Y(x,y):Template:Sfnm E[g(X,Y)]=yxg(x,y)pX,Y(x,y) In the absolutely continuous case, with fX,Y(x,y) being the joint probability density function, E[g(X,Y)]=g(x,y)fX,Y(x,y)dxdy

Special cases

A number of special cases are given here. In the simplest case, where the random variable Template:Mvar takes on countably many values (so that its distribution is discrete), the proof is particularly simple, and holds without modification if Template:Mvar is a discrete random vector or even a discrete random element.

The case of a continuous random variable is more subtle, since the proof in generality requires subtle forms of the change-of-variables formula for integration. However, in the framework of measure theory, the discrete case generalizes straightforwardly to general (not necessarily discrete) random elements, and the case of a continuous random variable is then a special case by making use of the Radon–Nikodym theorem.

Discrete case

Suppose that Template:Mvar is a random variable which takes on only finitely or countably many different values x1, x2, ...Script error: No such module "Check for unknown parameters"., with probabilities p1, p2, ...Script error: No such module "Check for unknown parameters".. Then for any function Template:Mvar of these values, the random variable g(X)Script error: No such module "Check for unknown parameters". has values g(x1), g(x2), ...Script error: No such module "Check for unknown parameters"., although some of these may coincide with each other. For example, this is the case if XScript error: No such module "Check for unknown parameters". can take on both values 1Script error: No such module "Check for unknown parameters". and −1Script error: No such module "Check for unknown parameters". and g(x) = x2Script error: No such module "Check for unknown parameters"..

Let y1, y2, ...Script error: No such module "Check for unknown parameters". enumerate the possible distinct values of g(X), and for each Template:Mvar let IiScript error: No such module "Check for unknown parameters". denote the collection of all Template:Mvar with g(xj) = yiScript error: No such module "Check for unknown parameters".. Then, according to the definition of expected value, there is E[g(X)]=iyipg(X)(yi).

Since a yi can be the image of multiple, distinct xj, it holds that pg(X)(yi)=jIipX(xj).

Then the expected value can be rewritten as iyipg(X)(yi)=iyijIipX(xj)=ijIig(xj)pX(xj)=xg(x)pX(x). This equality relates the average of the outputs of g(X)Script error: No such module "Check for unknown parameters". as weighted by the probabilities of the outputs themselves to the average of the outputs of g(X)Script error: No such module "Check for unknown parameters". as weighted by the probabilities of the outputs of Template:Mvar.

If Template:Mvar takes on only finitely many possible values, the above is fully rigorous. However, if Template:Mvar takes on countably many values, the last equality given does not always hold, as seen by the Riemann series theorem. Because of this, it is necessary to assume the absolute convergence of the sums in question.Template:Sfnm

Continuous case

Suppose that Template:Mvar is a random variable whose distribution has a continuous density Template:Mvar. If Template:Mvar is a general function, then the probability that g(X)Script error: No such module "Check for unknown parameters". is valued in a set of real numbers Template:Mvar equals the probability that Template:Mvar is valued in g−1(K)Script error: No such module "Check for unknown parameters"., which is given by g1(K)f(x)dx. Under various conditions on Template:Mvar, the change-of-variables formula for integration can be applied to relate this to an integral over Template:Mvar, and hence to identify the density of g(X)Script error: No such module "Check for unknown parameters". in terms of the density of Template:Mvar. In the simplest case, if Template:Mvar is differentiable with nowhere-vanishing derivative, then the above integral can be written as Kf(g1(y))(g1)(y)dy, thereby identifying g(X)Script error: No such module "Check for unknown parameters". as possessing the density f (g−1(y))(g−1)′(y)Script error: No such module "Check for unknown parameters".. The expected value of g(X)Script error: No such module "Check for unknown parameters". is then identified as yf(g1(y))(g1)(y)dy=g(x)f(x)dx, where the equality follows by another use of the change-of-variables formula for integration. This shows that the expected value of g(X)Script error: No such module "Check for unknown parameters". is encoded entirely by the function Template:Mvar and the density Template:Mvar of Template:Mvar.Template:Sfnm

The assumption that Template:Mvar is differentiable with nonvanishing derivative, which is necessary for applying the usual change-of-variables formula, excludes many typical cases, such as g(x) = x2Script error: No such module "Check for unknown parameters".. The result still holds true in these broader settings, although the proof requires more sophisticated results from mathematical analysis such as Sard's theorem and the coarea formula. In even greater generality, using the Lebesgue theory as below, it can be found that the identity E[g(X)]=g(x)f(x)dx holds true whenever Template:Mvar has a density Template:Mvar (which does not have to be continuous) and whenever Template:Mvar is a measurable function for which g(X)Script error: No such module "Check for unknown parameters". has finite expected value. (Every continuous function is measurable.) Furthermore, without modification to the proof, this holds even if Template:Mvar is a random vector (with density) and Template:Mvar is a multivariable function; the integral is then taken over the multi-dimensional range of values of Template:Mvar.

Measure-theoretic formulation

An abstract and general form of the result is available using the framework of measure theory and the Lebesgue integral. Here, the setting is that of a measure space (Ω, μ)Script error: No such module "Check for unknown parameters". and a measurable map XScript error: No such module "Check for unknown parameters". from ΩScript error: No such module "Check for unknown parameters". to a measurable space Ω'Script error: No such module "Check for unknown parameters".. The theorem then says that for any measurable function Template:Mvar on Ω'Script error: No such module "Check for unknown parameters". which is valued in real numbers (or even the extended real number line), there is ΩgXdμ=Ωgd(Xμ), (interpreted as saying, in particular, that either side of the equality exists if the other side exists). Here X μScript error: No such module "Check for unknown parameters". denotes the pushforward measure on Ω′Script error: No such module "Check for unknown parameters".. The 'discrete case' given above is the special case arising when Template:Mvar takes on only countably many values and Template:Mvar is a probability measure. In fact, the discrete case (although without the restriction to probability measures) is the first step in proving the general measure-theoretic formulation, as the general version follows therefrom by an application of the monotone convergence theorem.Template:Sfnm Without any major changes, the result can also be formulated in the setting of outer measures.Template:Sfnm

If Template:Mvar is a σ-finite measure, the theory of the Radon–Nikodym derivative is applicable. In the special case that the measure X μScript error: No such module "Check for unknown parameters". is absolutely continuous relative to some background σ-finite measure Template:Mvar on Ω′Script error: No such module "Check for unknown parameters"., there is a real-valued function fXScript error: No such module "Check for unknown parameters". on Ω'Script error: No such module "Check for unknown parameters". representing the Radon–Nikodym derivative of the two measures, and then Ωgd(Xμ)=ΩgfXdν. In the further special case that Ω′Script error: No such module "Check for unknown parameters". is the real number line, as in the contexts discussed above, it is natural to take νScript error: No such module "Check for unknown parameters". to be the Lebesgue measure, and this then recovers the 'continuous case' given above whenever μScript error: No such module "Check for unknown parameters". is a probability measure. (In this special case, the condition of σ-finiteness is vacuous, since Lebesgue measure and every probability measure are trivially σ-finite.)Template:Sfnm

References

<templatestyles src="Reflist/styles.css" />

Script error: No such module "Check for unknown parameters". <templatestyles src="Refbegin/styles.css" />

  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".