Probability axioms: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>RJFJR
Proof of monotonicity: move ref out of heading
 
imported>Danielittlewood
m Removed redundant words (also that sentence was ungrammatical before)
 
Line 1: Line 1:
{{Short description|Foundations of probability theory}}
{{Short description|Foundations of probability theory}}
{{Probability fundamentals}}
{{Probability fundamentals}}
The standard '''probability axioms''' are the foundations of [[probability theory]] introduced by Russian mathematician [[Andrey Kolmogorov]] in 1933.<ref name=":0">{{Cite book |title=Foundations of the theory of probability |url=https://archive.org/details/foundationsofthe00kolm |last=Kolmogorov |first=Andrey |publisher=Chelsea Publishing Company |year=1950 |orig-date=1933 |location=New York, US }}</ref> These [[axiom]]s remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.<ref>{{Cite web |url=https://www.stat.berkeley.edu/~aldous/Real_World/kolmogorov.html |title=What is the significance of the Kolmogorov axioms? |last=Aldous |first=David |website=David Aldous |access-date=November 19, 2019}}</ref>
The standard '''probability axioms''' are the foundations of [[probability theory]] introduced by Russian mathematician [[Andrey Kolmogorov]] in 1933.<ref name=":0">{{Cite book |title=Foundations of the theory of probability |url=https://archive.org/details/foundationsofthe00kolm |last=Kolmogorov |first=Andrey |publisher=Chelsea Publishing Company |year=1950 |orig-date=1933 |location=New York, US }}</ref> Like all [[Axiomatic system|axiomatic systems]], they outline the basic assumptions underlying the application of probability to fields such as pure mathematics and the physical sciences, while avoiding logical paradoxes.<ref>{{Cite web |url=https://www.stat.berkeley.edu/~aldous/Real_World/kolmogorov.html |title=What is the significance of the Kolmogorov axioms? |last=Aldous |first=David |website=David Aldous |access-date=November 19, 2019}}</ref>


There are several other (equivalent) approaches to formalising probability. [[Bayesian theory|Bayesians]] will often motivate the Kolmogorov axioms by invoking [[Cox's theorem]] or the [[Dutch book argument|Dutch book arguments]] instead.<ref>{{Cite journal | last = Cox | first = R. T. | author-link = Richard Threlkeld Cox| doi = 10.1119/1.1990764 | title = Probability, Frequency and Reasonable Expectation | journal = American Journal of Physics | volume = 14 | pages = 1–10 | year = 1946 | issue = 1 | bibcode = 1946AmJPh..14....1C }}</ref><ref>{{cite book|first=R. T. |last=Cox |author-link=Richard Threlkeld Cox |title=The Algebra of Probable Inference |publisher=Johns Hopkins University Press |location=Baltimore, MD |year=1961 }}</ref>
The probability axioms do not specify or assume any particular [[Probability interpretations|interpretation of probability]], but may be motivated by starting from a philosophical definition of probability and arguing that the axioms are satisfied by this definition. For example,
 
* [[Cox's theorem]] derives the laws of probability based on a "logical" definition of probability as the likelihood or credibility of arbitrary logical propositions.<ref>{{Cite journal |last=Cox |first=R. T. |author-link=Richard Threlkeld Cox |year=1946 |title=Probability, Frequency and Reasonable Expectation |journal=American Journal of Physics |volume=14 |issue=1 |pages=1–10 |bibcode=1946AmJPh..14....1C |doi=10.1119/1.1990764}}</ref><ref>{{cite book |last=Cox |first=R. T. |author-link=Richard Threlkeld Cox |title=The Algebra of Probable Inference |publisher=Johns Hopkins University Press |year=1961 |location=Baltimore, MD}}</ref>
* The [[Dutch book arguments]] show that [[Rational agent|rational agents]] must make bets which are in proportion with a subjective measure of the probability of events.
 
The third axiom, [[σ-additivity]], is relatively modern, and originates with Lebesgue's [[Measure (mathematics)|measure theory]]. Some authors replace this with the strictly weaker axiom of finite additivity, which is sufficient to deal with some applications.<ref>{{Cite journal |last=Bingham |first=N.H. |date=2010 |title=Finite Additivity Versus Countable Additivity: de Finetti and Savage |url=https://www.ma.imperial.ac.uk/~bin06/Papers/favcarev.pdf |journal=Electronic J. History of Probability and Statistics |volume=6 |issue=1 |pages=1–6}}</ref>


== Kolmogorov axioms ==
== Kolmogorov axioms ==
The assumptions as to setting up the axioms can be summarised as follows: Let <math>(\Omega, F, P)</math> be a [[measure space]] such that <math>P(E)</math> is the [[probability]] of some [[Event (probability theory)|event]] <math>E</math>, and <math>P(\Omega) = 1</math>. Then <math>(\Omega, F, P)</math> is a [[probability space]], with sample space <math>\Omega</math>, event space <math>F</math> and [[probability measure]] <math>P</math>.<ref name=":0" />
In order to state the Kolmogorov axioms, the following pieces of data must be specified:
 
* The [[sample space]], <math display="inline">\Omega</math>, which is the [[Set theory|set]] of all possible [[Outcome (probability)|outcomes]] or [[Elementary event|elementary events]].
* The space of all [[Event (probability theory)|events]], which are each taken to be sets of outcomes (i.e. subsets of <math display="inline">\Omega</math>). The event space, <math display="inline">F</math>, must be a [[Σ-algebra|''{{mvar|σ}}''-algebra]] on <math display="inline">\Omega</math>.
* The probability [[Measure (mathematics)|measure]] <math display="inline">P</math> which assigns to each event <math>E \in F</math> its probability, <math>P(E)</math>.
 
Taken together, these assumptions mean that <math>(\Omega, F, P)</math> is a [[measure space]]. It is additionally assumed that <math>P(\Omega)=1</math>, making this triple a [[probability space]].<ref name=":0" />


==={{Anchor|Non-negativity}}First axiom ===
==={{Anchor|Non-negativity}}First axiom ===
The probability of an event is a non-negative real number:
The probability of an event is a non-negative real number. This assumption is implied by the fact that <math>P</math> is a measure on <math>F</math>.
:<math>P(E)\in\mathbb{R}, P(E)\geq 0 \qquad \forall E \in F</math>
:<math>P(E)\geq 0 \qquad \forall E \in F</math>


where <math>F</math> is the event space. It follows (when combined with the second axiom) that <math>P(E)</math> is always finite, in contrast with more general [[Measure (mathematics)|measure theory]]. Theories which assign [[negative probability]] relax the first axiom.
Theories which assign [[negative probability]] relax the first axiom.


=== {{Anchor|Unitarity|Normalization}}Second axiom ===
=== {{Anchor|Unitarity|Normalization}}Second axiom ===
This is the assumption of [[unit measure]]: that the probability that at least one of the [[elementary event]]s in the entire sample space will occur is 1.
This is the assumption of [[unit measure]]: that the probability that at least one of the [[elementary event]]s in the entire sample space will occur is 1.<math display="block">P(\Omega) = 1</math>From this axiom it follows that <math>P(E)</math> is always finite, in contrast with more general [[Measure (mathematics)|measure theory]].
 
: <math>P(\Omega) = 1</math>


=== {{Anchor|Sigma additivity|Finite additivity|Countable additivity|Finitely additive}}Third axiom ===
=== {{Anchor|Sigma additivity|Finite additivity|Countable additivity|Finitely additive}}Third axiom ===
This is the assumption of [[σ-additivity]]:
This is the assumption of [[σ-additivity]]: Any [[countable]] sequence of [[disjoint sets]] (synonymous with ''[[Mutual exclusivity|mutually exclusive]]'' events) <math>E_1, E_2, \ldots</math> satisfies
: Any [[countable]] sequence of [[disjoint sets]] (synonymous with ''[[Mutual exclusivity|mutually exclusive]]'' events) <math>E_1, E_2, \ldots</math> satisfies
::<math>P\left(\bigcup_{i = 1}^\infty E_i\right) = \sum_{i=1}^\infty P(E_i).</math>
::<math>P\left(\bigcup_{i = 1}^\infty E_i\right) = \sum_{i=1}^\infty P(E_i).</math>
This property again is implied by the fact that <math>P</math> is a measure. Note that, by taking <math>E_1 = \Omega</math> and <math>E_i = \emptyset</math> for all <math>i>1</math>, one deduces that <math>P(\emptyset) = 0</math>. This in turn shows that σ-additivity implies finite additivity.
Some authors consider merely [[finitely additive]] probability spaces, in which case one just needs an [[field of sets|algebra of sets]], rather than a [[σ-algebra]].<ref>{{Cite web|url=https://plato.stanford.edu/entries/probability-interpret/#KolProCal|title=Interpretations of Probability|last=Hájek|first=Alan|date=August 28, 2019|website=Stanford Encyclopedia of Philosophy|access-date=November 17, 2019}}</ref> [[Quasiprobability distribution]]s in general relax the third axiom.
Some authors consider merely [[finitely additive]] probability spaces, in which case one just needs an [[field of sets|algebra of sets]], rather than a [[σ-algebra]].<ref>{{Cite web|url=https://plato.stanford.edu/entries/probability-interpret/#KolProCal|title=Interpretations of Probability|last=Hájek|first=Alan|date=August 28, 2019|website=Stanford Encyclopedia of Philosophy|access-date=November 17, 2019}}</ref> [[Quasiprobability distribution]]s in general relax the third axiom.


== Consequences ==
== Elementary consequences ==
From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs<ref name=":1">{{Cite book|title=A first course in probability|last=Ross, Sheldon M.|year=2014|isbn=978-0-321-79477-2|edition=Ninth|location=Upper Saddle River, New Jersey|pages=27, 28|oclc=827003384}}</ref><ref>{{Cite web|url=https://dcgerard.github.io/stat234/11_proofs_from_axioms.pdf|title=Proofs from axioms|last=Gerard|first=David|date=December 9, 2017|access-date=November 20, 2019}}</ref><ref>{{Cite web|url=http://www.maths.qmul.ac.uk/~bill/MTH4107/notesweek3_10.pdf|title=Probability (Lecture Notes - Week 3)|last=Jackson|first=Bill|date=2010|website=School of Mathematics, Queen Mary University of London|access-date=November 20, 2019}}</ref> of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the prior two axioms. Four of the immediate corollaries and their proofs are shown below:
In order to demonstrate that the theory generated by the Kolmogorov axioms corresponds with [[Classical definition of probability|classical probability]], some elementary consequences are typically derived.<ref>{{Cite web |last=Gerard |first=David |date=December 9, 2017 |title=Proofs from axioms |url=https://dcgerard.github.io/stat234/11_proofs_from_axioms.pdf |access-date=November 20, 2019}}</ref>
 
=== Monotonicity ===
 
:<math>\quad\text{if}\quad A\subseteq B\quad\text{then}\quad P(A)\leq P(B).</math>
 
If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.
 
==== ''Proof of monotonicity'' ====
Source:<ref name=":1" />
 
In order to verify the monotonicity property, we set <math>E_1=A</math> and <math>E_2=B\setminus A</math>, where <math>A\subseteq B</math> and <math>E_i=\varnothing</math> for <math>i\geq 3</math>. From the properties of the [[empty set]] (<math>\varnothing</math>), it is easy to see that the sets <math>E_i</math> are pairwise disjoint and <math>E_1\cup E_2\cup\cdots=B</math>. Hence, we obtain from the third axiom that
 
:<math>P(A)+P(B\setminus A)+\sum_{i=3}^\infty P(E_i)=P(B).</math>
 
Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to <math>P(B)</math> which is finite, we obtain both <math>P(A)\leq P(B)</math> and <math>P(\varnothing)=0</math>.
 
=== The probability of the empty set ===
 
: <math>P(\varnothing)=0.</math>
 
In many cases, <math>\varnothing</math> is not the only event with probability&nbsp;0.
 
==== ''Proof of the probability of the empty set''====
 
<math>P(\varnothing \cup \varnothing) = P(\varnothing)</math> since <math>\varnothing \cup \varnothing = \varnothing</math>,
 
<math>P(\varnothing)+P(\varnothing) = P(\varnothing)</math> by applying the third axiom to the left-hand side
(note <math>\varnothing</math> is disjoint with itself), and so
 
<math>P(\varnothing) = 0</math> by subtracting <math>P(\varnothing)</math> from each side of the equation.
 
=== The complement rule ===
<math>P\left(A^{\complement}\right) = P(\Omega-A) = 1 - P(A)</math>
 
==== ''Proof of the complement rule'' ====
Given <math>A</math> and <math>A^{\complement}</math> are mutually exclusive and that <math>A \cup A^\complement = \Omega
</math>:
 
<math>P(A \cup A^\complement)=P(A)+P(A^\complement)
</math>          ''... (by axiom 3)''
 
and,      <math>
P(A \cup A^\complement)=P(\Omega)=1
</math>                    ... ''(by axiom 2)''
 
<math> \Rightarrow P(A)+P(A^\complement)=1   
</math>
 
<math>\therefore    P(A^\complement)=1-P(A)
</math>
 
=== The numeric bound ===
It immediately follows from the monotonicity property that
: <math>0\leq P(E)\leq 1\qquad \forall E\in F.</math>
 
==== ''Proof of the numeric bound'' ====
Given the complement rule <math>P(E^c)=1-P(E)
</math> and ''axiom 1'' <math>P(E^c)\geq0
</math>:
 
<math>1-P(E) \geq 0
</math>
 
<math>\Rightarrow 1 \geq P(E)
</math>
 
<math>\therefore 0\leq P(E)\leq 1</math>
 
== Further consequences ==
Another important property is:
 
: <math>P(A \cup B) = P(A) + P(B) - P(A \cap B).</math>
 
This is called the addition law of probability, or the sum rule.
That is, the probability that an event in ''A'' ''or'' ''B'' will happen is the sum of the probability of an event in ''A'' and the probability of an event in ''B'', minus the probability of an event that is in both ''A'' ''and'' ''B''. The proof of this is as follows:
 
Firstly,
 
:<math>P(A\cup B) = P(A) + P(B\setminus A)</math>. ''(by Axiom 3)''
 
So,
 
:<math>P(A \cup B) = P(A) + P(B\setminus  (A \cap B))</math> (by <math>B \setminus A = B\setminus  (A \cap B)</math>).
 
Also,
 
:<math>P(B) = P(B\setminus (A \cap B)) + P(A \cap B)</math>
 
and eliminating <math>P(B\setminus (A \cap B))</math> from both equations gives us the desired result.
 
An extension of the addition law to any number of sets is the [[inclusion–exclusion principle]].


Setting ''B'' to the complement ''A<sup>c</sup>'' of ''A'' in the addition law gives
* Since <math>P</math> is finitely additive, we have <math>P(A) + P(A^c) = P(A\cup A^c)= P(\Omega) = 1</math>, so <math>P(A^c) = 1-P(A)</math>.
* In particular, it follows that <math>P(\emptyset) = 0</math>. The empty set is interpreted as the event that "no outcome occurs", which is impossible.
* Similarly, if <math>A \subseteq B</math>, then <math>P(B) = P(A \cup (B\setminus A)) = P(A) + P(B\setminus A) \ge P(A)</math>. In other words, <math>P</math> is [[Monotonic function|monotone]].<ref name=":1">{{Cite book |last=Ross, Sheldon M. |title=A first course in probability |year=2014 |isbn=978-0-321-79477-2 |edition=Ninth |location=Upper Saddle River, New Jersey |pages=27, 28 |oclc=827003384}}</ref>
* Since <math>\emptyset \subseteq E \subseteq \Omega</math> for any event <math>E</math>, it follows that <math>0 \le P(E) \le 1</math>.


: <math>P\left(A^{c}\right) = P(\Omega\setminus A) = 1 - P(A)</math>
By dividing <math>A \cup B </math> into the disjoint sets <math>A \setminus (A \cap B) </math>, <math>B \setminus (A \cap B)</math> and <math>A \cap B</math>, one arrives at a probabilistic version of the inclusion-exclusion principle<ref>{{Cite web |last=Jackson |first=Bill |date=2010 |title=Probability (Lecture Notes - Week 3) |url=http://www.maths.qmul.ac.uk/~bill/MTH4107/notesweek3_10.pdf |access-date=November 20, 2019 |website=School of Mathematics, Queen Mary University of London}}</ref><math display="block">P(A \cup B) = P(A) + P(B) - P(A \cap B).</math>In the case where <math>\Omega</math> is finite, the two identities are equivalent.


That is, the probability that any event will ''not'' happen (or the event's [[Complement (set theory)|complement]]) is 1 minus the probability that it will.
In order to actually do calculations when <math>\Omega</math> is an infinite set, it is sometimes useful to generalize from a finite sample space. For example, if <math>\Omega</math> consists of all infinite sequences of tosses of a fair coin, it is not obvious how to compute the probability of any particular set of sequences (i.e. an event). If the event is "every flip is heads", then it is intuitive that the probability can be computed as:<math display="block">P(\text{infinite sequence of heads}) = \lim_{n \to \infty} P(\text{sequence of n heads}) = \lim_{n \to \infty} 2^{-n} = 0.</math>In order to make this rigorous, one has to prove that <math>P</math> is '''continuous''', in the following sense. If <math>A_j,\,\, j = 1, 2, \ldots</math> is a sequence of events increasing (or decreasing) to another event <math>A</math>, then<ref>{{Cite book |last=Evans |first=Michael |title=Probability and Statistics: The Science of Uncertainty |last2=Rosenthal |first2=Jeffrey |date=25 July 2003 |publisher=[[W. H. Freeman and Company]] |isbn=978-0716747420 |pages=27-29 |language=en-us}}</ref><math display="block">\lim_{n \to \infty} P(A_n) = P(A).</math>


== Simple example: coin toss ==
== Simple example: coin toss ==
Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both).  No assumption is made as to whether the coin is fair or as to whether or not any bias depends on how the coin is tossed.<ref>{{cite journal |last1=Diaconis |first1=Persi |last2=Holmes |first2=Susan |last3=Montgomery |first3=Richard |title=Dynamical Bias in the Coin Toss |journal= SIAM Review|date=2007 |volume=49 |issue=211–235 |pages=211–235 |doi=10.1137/S0036144504446436 |bibcode=2007SIAMR..49..211D |url=https://statweb.stanford.edu/~cgates/PERSI/papers/dyn_coin_07.pdf |access-date=5 January 2024}}</ref>
Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both).  No assumption is made as to whether the coin is fair.<ref>{{cite journal |last1=Diaconis |first1=Persi |last2=Holmes |first2=Susan |last3=Montgomery |first3=Richard |title=Dynamical Bias in the Coin Toss |journal= SIAM Review|date=2007 |volume=49 |issue=211–235 |pages=211–235 |doi=10.1137/S0036144504446436 |bibcode=2007SIAMR..49..211D |url=https://statweb.stanford.edu/~cgates/PERSI/papers/dyn_coin_07.pdf |access-date=5 January 2024}}</ref>


We may define:
We may define:

Latest revision as of 22:27, 28 October 2025

Template:Short description Template:Probability fundamentals The standard probability axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933.[1] Like all axiomatic systems, they outline the basic assumptions underlying the application of probability to fields such as pure mathematics and the physical sciences, while avoiding logical paradoxes.[2]

The probability axioms do not specify or assume any particular interpretation of probability, but may be motivated by starting from a philosophical definition of probability and arguing that the axioms are satisfied by this definition. For example,

  • Cox's theorem derives the laws of probability based on a "logical" definition of probability as the likelihood or credibility of arbitrary logical propositions.[3][4]
  • The Dutch book arguments show that rational agents must make bets which are in proportion with a subjective measure of the probability of events.

The third axiom, σ-additivity, is relatively modern, and originates with Lebesgue's measure theory. Some authors replace this with the strictly weaker axiom of finite additivity, which is sufficient to deal with some applications.[5]

Kolmogorov axioms

In order to state the Kolmogorov axioms, the following pieces of data must be specified:

  • The sample space, Ω, which is the set of all possible outcomes or elementary events.
  • The space of all events, which are each taken to be sets of outcomes (i.e. subsets of Ω). The event space, F, must be a [[Σ-algebra|Template:Mvar-algebra]] on Ω.
  • The probability measure P which assigns to each event EF its probability, P(E).

Taken together, these assumptions mean that (Ω,F,P) is a measure space. It is additionally assumed that P(Ω)=1, making this triple a probability space.[1]

Script error: No such module "anchor".First axiom

The probability of an event is a non-negative real number. This assumption is implied by the fact that P is a measure on F.

P(E)0EF

Theories which assign negative probability relax the first axiom.

Script error: No such module "anchor".Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1.P(Ω)=1From this axiom it follows that P(E) is always finite, in contrast with more general measure theory.

Script error: No such module "anchor".Third axiom

This is the assumption of σ-additivity: Any countable sequence of disjoint sets (synonymous with mutually exclusive events) E1,E2, satisfies

P(i=1Ei)=i=1P(Ei).

This property again is implied by the fact that P is a measure. Note that, by taking E1=Ω and Ei= for all i>1, one deduces that P()=0. This in turn shows that σ-additivity implies finite additivity.

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.[6] Quasiprobability distributions in general relax the third axiom.

Elementary consequences

In order to demonstrate that the theory generated by the Kolmogorov axioms corresponds with classical probability, some elementary consequences are typically derived.[7]

  • Since P is finitely additive, we have P(A)+P(Ac)=P(AAc)=P(Ω)=1, so P(Ac)=1P(A).
  • In particular, it follows that P()=0. The empty set is interpreted as the event that "no outcome occurs", which is impossible.
  • Similarly, if AB, then P(B)=P(A(BA))=P(A)+P(BA)P(A). In other words, P is monotone.[8]
  • Since EΩ for any event E, it follows that 0P(E)1.

By dividing AB into the disjoint sets A(AB), B(AB) and AB, one arrives at a probabilistic version of the inclusion-exclusion principle[9]P(AB)=P(A)+P(B)P(AB).In the case where Ω is finite, the two identities are equivalent.

In order to actually do calculations when Ω is an infinite set, it is sometimes useful to generalize from a finite sample space. For example, if Ω consists of all infinite sequences of tosses of a fair coin, it is not obvious how to compute the probability of any particular set of sequences (i.e. an event). If the event is "every flip is heads", then it is intuitive that the probability can be computed as:P(infinite sequence of heads)=limnP(sequence of n heads)=limn2n=0.In order to make this rigorous, one has to prove that P is continuous, in the following sense. If Aj,j=1,2, is a sequence of events increasing (or decreasing) to another event A, then[10]limnP(An)=P(A).

Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair.[11]

We may define:

Ω={H,T}
F={,{H},{T},{H,T}}

Kolmogorov's axioms imply that:

P()=0

The probability of neither heads nor tails, is 0.

P({H,T}c)=0

The probability of either heads or tails, is 1.

P({H})+P({T})=1

The sum of the probability of heads and the probability of tails, is 1.

See also

References

Template:Reflist

Further reading

  1. a b Script error: No such module "citation/CS1".
  2. Script error: No such module "citation/CS1".
  3. Script error: No such module "Citation/CS1".
  4. Script error: No such module "citation/CS1".
  5. Script error: No such module "Citation/CS1".
  6. Script error: No such module "citation/CS1".
  7. Script error: No such module "citation/CS1".
  8. Script error: No such module "citation/CS1".
  9. Script error: No such module "citation/CS1".
  10. Script error: No such module "citation/CS1".
  11. Script error: No such module "Citation/CS1".