<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://debianws.lexgopc.com/wiki143/index.php?action=history&amp;feed=atom&amp;title=Conditional_mutual_information</id>
	<title>Conditional mutual information - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://debianws.lexgopc.com/wiki143/index.php?action=history&amp;feed=atom&amp;title=Conditional_mutual_information"/>
	<link rel="alternate" type="text/html" href="http://debianws.lexgopc.com/wiki143/index.php?title=Conditional_mutual_information&amp;action=history"/>
	<updated>2026-05-12T19:25:22Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>http://debianws.lexgopc.com/wiki143/index.php?title=Conditional_mutual_information&amp;diff=7173893&amp;oldid=prev</id>
		<title>128.29.17.11: Moved punctuation outside of math notation and removed distracting box.</title>
		<link rel="alternate" type="text/html" href="http://debianws.lexgopc.com/wiki143/index.php?title=Conditional_mutual_information&amp;diff=7173893&amp;oldid=prev"/>
		<updated>2025-05-16T15:00:06Z</updated>

		<summary type="html">&lt;p&gt;Moved punctuation outside of math notation and removed distracting box.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{Short description|Information theory}}&lt;br /&gt;
{{Information theory}}&lt;br /&gt;
&lt;br /&gt;
[[Image:VennInfo3Var.svg|thumb|256px|right|[[Venn diagram]] of information theoretic measures for three variables &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;y&amp;lt;/math&amp;gt;, and &amp;lt;math&amp;gt;z&amp;lt;/math&amp;gt;, represented by the lower left, lower right, and upper circles, respectively. The conditional mutual informations &amp;lt;math&amp;gt;I(x;z|y)&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;I(y;z|x)&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;I(x;y|z)&amp;lt;/math&amp;gt; are represented by the yellow, cyan, and magenta regions, respectively.]]&lt;br /&gt;
&lt;br /&gt;
In [[probability theory]], particularly [[information theory]], the &amp;#039;&amp;#039;&amp;#039;conditional mutual information&amp;#039;&amp;#039;&amp;#039;&amp;lt;ref name = Wyner1978&amp;gt;{{cite journal|last=Wyner|first=A. D. |title=A definition of conditional mutual information for arbitrary ensembles|journal=Information and Control|year=1978|volume=38|issue=1|pages=51–59|doi=10.1016/s0019-9958(78)90026-8|doi-access=free}}&amp;lt;/ref&amp;gt;&amp;lt;ref name = Dobrushin1959&amp;gt;{{cite journal|last=Dobrushin|first=R. L. |title=General formulation of Shannon&amp;#039;s main theorem in information theory|journal=Uspekhi Mat. Nauk|year=1959|volume=14|pages=3–104}}&amp;lt;/ref&amp;gt; is, in its most basic form, the [[expected value]] of the [[mutual information]] of two random variables given the value of a third.&lt;br /&gt;
&lt;br /&gt;
==Definition==&lt;br /&gt;
For random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;, and &amp;lt;math&amp;gt;Z&amp;lt;/math&amp;gt; with [[Support (mathematics)|support sets]] &amp;lt;math&amp;gt;\mathcal{X}&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;\mathcal{Y}&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\mathcal{Z}&amp;lt;/math&amp;gt;, we define the conditional mutual information as&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
I(X;Y|Z) = \int_\mathcal{Z} D_{\mathrm{KL}}( P_{(X,Y)|Z} \| P_{X|Z} \otimes P_{Y|Z} ) dP_{Z}&lt;br /&gt;
&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
This may be written in terms of the expectation operator: &amp;lt;math&amp;gt;I(X;Y|Z) = \mathbb{E}_Z [D_{\mathrm{KL}}( P_{(X,Y)|Z} \| P_{X|Z} \otimes P_{Y|Z} )]&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Thus &amp;lt;math&amp;gt;I(X;Y|Z)&amp;lt;/math&amp;gt; is the expected (with respect to &amp;lt;math&amp;gt;Z&amp;lt;/math&amp;gt;) [[Kullback–Leibler divergence]] from the conditional joint distribution &amp;lt;math&amp;gt;P_{(X,Y)|Z}&amp;lt;/math&amp;gt; to the product of the conditional marginals &amp;lt;math&amp;gt;P_{X|Z}&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;P_{Y|Z}&amp;lt;/math&amp;gt;. Compare with the definition of [[mutual information]].&lt;br /&gt;
&lt;br /&gt;
==In terms of PMFs for discrete distributions==&lt;br /&gt;
For discrete random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;, and &amp;lt;math&amp;gt;Z&amp;lt;/math&amp;gt; with [[Support (mathematics)|support sets]] &amp;lt;math&amp;gt;\mathcal{X}&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;\mathcal{Y}&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\mathcal{Z}&amp;lt;/math&amp;gt;, the conditional mutual information &amp;lt;math&amp;gt;I(X;Y|Z)&amp;lt;/math&amp;gt; is as follows&lt;br /&gt;
:&amp;lt;math&amp;gt;&lt;br /&gt;
I(X;Y|Z) = \sum_{z\in \mathcal{Z}} p_Z(z) \sum_{y\in \mathcal{Y}} \sum_{x\in \mathcal{X}}&lt;br /&gt;
      p_{X,Y|Z}(x,y|z) \log \frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
where the marginal, joint, and/or conditional [[probability mass function]]s are denoted by &amp;lt;math&amp;gt;p&amp;lt;/math&amp;gt; with the appropriate subscript. This can be simplified as&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
I(X;Y|Z) = \sum_{z\in \mathcal{Z}} \sum_{y\in \mathcal{Y}} \sum_{x\in \mathcal{X}} p_{X,Y,Z}(x,y,z) \log \frac{p_Z(z)p_{X,Y,Z}(x,y,z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}&lt;br /&gt;
&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==In terms of PDFs for continuous distributions==&lt;br /&gt;
For (absolutely) continuous random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;, and &amp;lt;math&amp;gt;Z&amp;lt;/math&amp;gt; with [[Support (mathematics)|support sets]] &amp;lt;math&amp;gt;\mathcal{X}&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;\mathcal{Y}&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\mathcal{Z}&amp;lt;/math&amp;gt;, the conditional mutual information &amp;lt;math&amp;gt;I(X;Y|Z)&amp;lt;/math&amp;gt; is as follows&lt;br /&gt;
:&amp;lt;math&amp;gt;&lt;br /&gt;
I(X;Y|Z) = \int_{\mathcal{Z}} \bigg( \int_{\mathcal{Y}} \int_{\mathcal{X}}&lt;br /&gt;
      \log \left(\frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)}\right) p_{X,Y|Z}(x,y|z) dx dy \bigg) p_Z(z) dz&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
where the marginal, joint, and/or conditional [[probability density function]]s are denoted by &amp;lt;math&amp;gt;p&amp;lt;/math&amp;gt; with the appropriate subscript. This can be simplified as&lt;br /&gt;
&lt;br /&gt;
&amp;lt;math&amp;gt;&lt;br /&gt;
I(X;Y|Z) = \int_{\mathcal{Z}} \int_{\mathcal{Y}} \int_{\mathcal{X}} \log \left(\frac{p_Z(z)p_{X,Y,Z}(x,y,z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}\right) p_{X,Y,Z}(x,y,z) dx dy dz&lt;br /&gt;
&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==Some identities==&lt;br /&gt;
Alternatively, we may write in terms of joint and conditional [[Entropy (information theory)|entropies]] as&amp;lt;ref&amp;gt;{{cite book |last1=Cover |first1=Thomas |author-link1=Thomas M. Cover |last2=Thomas |first2=Joy A. |title=Elements of Information Theory |edition=2nd |location=New York |publisher=[[Wiley-Interscience]] |date=2006 |isbn=0-471-24195-4}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
:&amp;lt;math&amp;gt;\begin{align}&lt;br /&gt;
I(X;Y|Z) &amp;amp;= H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z) \\&lt;br /&gt;
         &amp;amp;= H(X|Z) - H(X|Y,Z) \\&lt;br /&gt;
         &amp;amp;= H(X|Z)+H(Y|Z)-H(X,Y|Z).&lt;br /&gt;
\end{align}&amp;lt;/math&amp;gt;&lt;br /&gt;
This can be rewritten to show its relationship to mutual information&lt;br /&gt;
:&amp;lt;math&amp;gt;I(X;Y|Z) = I(X;Y,Z) - I(X;Z)&amp;lt;/math&amp;gt;&lt;br /&gt;
usually rearranged as &amp;#039;&amp;#039;&amp;#039;the chain rule for mutual information&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
:&amp;lt;math&amp;gt;I(X;Y,Z) = I(X;Z) + I(X;Y|Z)&amp;lt;/math&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
:&amp;lt;math&amp;gt;I(X;Y|Z) = I(X;Y) - (I(X;Z) - I(X;Z|Y))\,.&amp;lt;/math&amp;gt;&lt;br /&gt;
Another equivalent form of the above is&lt;br /&gt;
:&amp;lt;math&amp;gt;\begin{align}&lt;br /&gt;
I(X;Y|Z) &amp;amp;= H(Z|X) + H(X) + H(Z|Y) + H(Y) - H(Z|X,Y) - H(X,Y) - H(Z)\\&lt;br /&gt;
         &amp;amp;= I(X;Y) + H(Z|X) + H(Z|Y) - H(Z|X,Y) - H(Z)&lt;br /&gt;
\end{align}\,.&amp;lt;/math&amp;gt;&lt;br /&gt;
Another equivalent form of the conditional mutual information is&lt;br /&gt;
:&amp;lt;math&amp;gt;\begin{align}&lt;br /&gt;
I(X;Y|Z) = I(X,Z;Y,Z) - H(Z)&lt;br /&gt;
\end{align}\,.&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Like mutual information, conditional mutual information can be expressed as a [[Kullback–Leibler divergence]]:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt; I(X;Y|Z) = D_{\mathrm{KL}}[ p(X,Y,Z) \| p(X|Z)p(Y|Z)p(Z) ]. &amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Or as an expected value of simpler Kullback–Leibler divergences:&lt;br /&gt;
:&amp;lt;math&amp;gt; I(X;Y|Z) = \sum_{z \in \mathcal{Z}} p( Z=z ) D_{\mathrm{KL}}[ p(X,Y|z) \| p(X|z)p(Y|z) ]&amp;lt;/math&amp;gt;,&lt;br /&gt;
:&amp;lt;math&amp;gt; I(X;Y|Z) = \sum_{y \in \mathcal{Y}} p( Y=y ) D_{\mathrm{KL}}[ p(X,Z|y) \| p(X|Z)p(Z|y) ]&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==More general definition==&lt;br /&gt;
A more general definition of conditional mutual information, applicable to random variables with continuous or other arbitrary distributions, will depend on the concept of &amp;#039;&amp;#039;&amp;#039;[[regular conditional probability]]&amp;#039;&amp;#039;&amp;#039;.&amp;lt;ref&amp;gt;D. Leao, Jr. et al. &amp;#039;&amp;#039;Regular conditional probability, disintegration of probability and Radon spaces.&amp;#039;&amp;#039; Proyecciones. Vol. 23, No. 1, pp. 15–29, May 2004, Universidad Católica del Norte, Antofagasta, Chile [http://www.scielo.cl/pdf/proy/v23n1/art02.pdf PDF]&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Let &amp;lt;math&amp;gt;(\Omega, \mathcal F, \mathfrak P)&amp;lt;/math&amp;gt; be a [[probability space]], and let the random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt;, and &amp;lt;math&amp;gt;Z&amp;lt;/math&amp;gt; each be defined as a Borel-measurable function from &amp;lt;math&amp;gt;\Omega&amp;lt;/math&amp;gt; to some state space endowed with a topological structure.&lt;br /&gt;
&lt;br /&gt;
Consider the Borel measure (on the σ-algebra generated by the open sets) in the state space of each random variable defined by assigning each Borel set the &amp;lt;math&amp;gt;\mathfrak P&amp;lt;/math&amp;gt;-measure of its preimage in &amp;lt;math&amp;gt;\mathcal F&amp;lt;/math&amp;gt;.  This is called the [[pushforward measure]] &amp;lt;math&amp;gt;X _* \mathfrak P = \mathfrak P\big(X^{-1}(\cdot)\big).&amp;lt;/math&amp;gt;  The &amp;#039;&amp;#039;&amp;#039;support of a random variable&amp;#039;&amp;#039;&amp;#039; is defined to be the [[Support (measure theory)|topological support]] of this measure, i.e. &amp;lt;math&amp;gt;\mathrm{supp}\,X = \mathrm{supp}\,X _* \mathfrak P.&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now we can formally define the [[conditional probability distribution|conditional probability measure]] given the value of one (or, via the [[product topology]], more) of the random variables.  Let &amp;lt;math&amp;gt;M&amp;lt;/math&amp;gt; be a measurable subset of &amp;lt;math&amp;gt;\Omega,&amp;lt;/math&amp;gt; (i.e. &amp;lt;math&amp;gt;M \in \mathcal F,&amp;lt;/math&amp;gt;) and let &amp;lt;math&amp;gt;x \in \mathrm{supp}\,X.&amp;lt;/math&amp;gt;  Then, using the [[disintegration theorem]]:&lt;br /&gt;
:&amp;lt;math&amp;gt;\mathfrak P(M | X=x) = \lim_{U \ni x}&lt;br /&gt;
  \frac {\mathfrak P(M \cap \{X \in U\})}&lt;br /&gt;
        {\mathfrak P(\{X \in U\})}&lt;br /&gt;
  \qquad \textrm{and} \qquad \mathfrak P(M|X) = \int_M d\mathfrak P\big(\omega|X=X(\omega)\big),&amp;lt;/math&amp;gt;&lt;br /&gt;
where the limit is taken over the open neighborhoods &amp;lt;math&amp;gt;U&amp;lt;/math&amp;gt; of &amp;lt;math&amp;gt;x&amp;lt;/math&amp;gt;, as they are allowed to become arbitrarily smaller with respect to [[Subset|set inclusion]].&lt;br /&gt;
&lt;br /&gt;
Finally we can define the conditional mutual information via [[Lebesgue integration]]:&lt;br /&gt;
:&amp;lt;math&amp;gt;I(X;Y|Z) = \int_\Omega \log&lt;br /&gt;
  \Bigl(&lt;br /&gt;
  \frac {d \mathfrak P(\omega|X,Z)\, d\mathfrak P(\omega|Y,Z)}&lt;br /&gt;
        {d \mathfrak P(\omega|Z)\, d\mathfrak P(\omega|X,Y,Z)}&lt;br /&gt;
  \Bigr)&lt;br /&gt;
  d \mathfrak P(\omega),&lt;br /&gt;
  &amp;lt;/math&amp;gt;&lt;br /&gt;
where the integrand is the logarithm of a [[Radon–Nikodym derivative]] involving some of the conditional probability measures we have just defined.&lt;br /&gt;
&lt;br /&gt;
==Note on notation==&lt;br /&gt;
In an expression such as &amp;lt;math&amp;gt;I(A;B|C),&amp;lt;/math&amp;gt; &amp;lt;math&amp;gt;A,&amp;lt;/math&amp;gt; &amp;lt;math&amp;gt;B,&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;C&amp;lt;/math&amp;gt; need not necessarily be restricted to representing individual random variables, but could also represent the joint distribution of any collection of random variables defined on the same [[probability space]].  As is common in [[probability theory]], we may use the comma to denote such a joint distribution, e.g. &amp;lt;math&amp;gt;I(A_0,A_1;B_1,B_2,B_3|C_0,C_1).&amp;lt;/math&amp;gt;  Hence the use of the semicolon (or occasionally a colon or even a wedge &amp;lt;math&amp;gt;\wedge&amp;lt;/math&amp;gt;) to separate the principal arguments of the mutual information symbol.  (No such distinction is necessary in the symbol for [[joint entropy]], since the joint entropy of any number of random variables is the same as the entropy of their joint distribution.)&lt;br /&gt;
&lt;br /&gt;
== Properties ==&lt;br /&gt;
===Nonnegativity===&lt;br /&gt;
It is always true that&lt;br /&gt;
:&amp;lt;math&amp;gt;I(X;Y|Z) \ge 0&amp;lt;/math&amp;gt;,&lt;br /&gt;
for discrete, jointly distributed random variables &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Z&amp;lt;/math&amp;gt;.  This result has been used as a basic building block for proving other [[inequalities in information theory]], in particular, those known as Shannon-type inequalities. Conditional mutual information is also non-negative for continuous random variables under certain regularity conditions.&amp;lt;ref&amp;gt;{{cite book |last1=Polyanskiy |first1=Yury |last2=Wu |first2=Yihong |title=Lecture notes on information theory |date=2017 |page=30 |url=http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf}}&amp;lt;/ref&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Interaction information===&lt;br /&gt;
Conditioning on a third random variable may either increase or decrease the mutual information: that is, the difference &amp;lt;math&amp;gt;I(X;Y) - I(X;Y|Z)&amp;lt;/math&amp;gt;, called the [[interaction information]], may be positive, negative, or zero. This is the case even when random variables are pairwise independent. Such is the case when: &amp;lt;math display=&amp;quot;block&amp;quot;&amp;gt;X \sim \mathrm{Bernoulli}(0.5), Z \sim \mathrm{Bernoulli}(0.5), \quad Y=\left\{\begin{array}{ll} X &amp;amp; \text{if }Z=0\\ 1-X &amp;amp; \text{if }Z=1 \end{array}\right.&amp;lt;/math&amp;gt;in which case &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt;, &amp;lt;math&amp;gt;Y&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;Z&amp;lt;/math&amp;gt; are pairwise independent and in particular &amp;lt;math&amp;gt;I(X;Y)=0&amp;lt;/math&amp;gt;, but &amp;lt;math&amp;gt;I(X;Y|Z)=1.&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Chain rule for mutual information===&lt;br /&gt;
The chain rule (as derived above) provides two ways to decompose &amp;lt;math&amp;gt;I(X;Y,Z)&amp;lt;/math&amp;gt;:&lt;br /&gt;
:&amp;lt;math&amp;gt;&lt;br /&gt;
\begin{align}&lt;br /&gt;
I(X;Y,Z) &amp;amp;= I(X;Z) + I(X;Y|Z) \\&lt;br /&gt;
         &amp;amp;= I(X;Y) + I(X;Z|Y)&lt;br /&gt;
\end{align}&lt;br /&gt;
&amp;lt;/math&amp;gt;&lt;br /&gt;
The [[data processing inequality]] is closely related to conditional mutual information and can be proven using the chain rule.&lt;br /&gt;
&lt;br /&gt;
==Interaction information==&lt;br /&gt;
{{main|Interaction information}}&lt;br /&gt;
The conditional mutual information is used to inductively define the &amp;#039;&amp;#039;&amp;#039;interaction information&amp;#039;&amp;#039;&amp;#039;, a generalization of mutual information, as follows:&lt;br /&gt;
:&amp;lt;math&amp;gt;I(X_1;\ldots;X_{n+1}) = I(X_1;\ldots;X_n) - I(X_1;\ldots;X_n|X_{n+1}),&amp;lt;/math&amp;gt;&lt;br /&gt;
where&lt;br /&gt;
:&amp;lt;math&amp;gt;I(X_1;\ldots;X_n|X_{n+1}) = \mathbb{E}_{X_{n+1}} [D_{\mathrm{KL}}( P_{(X_1,\ldots,X_n)|X_{n+1}} \| P_{X_1|X_{n+1}} \otimes\cdots\otimes P_{X_n|X_{n+1}} )].&amp;lt;/math&amp;gt;&lt;br /&gt;
Because the conditional mutual information can be greater than or less than its unconditional counterpart, the interaction information can be positive, negative, or zero, which makes it hard to interpret.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
&amp;lt;references/&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[Category:Information theory]]&lt;br /&gt;
[[Category:Entropy and information]]&lt;/div&gt;</summary>
		<author><name>128.29.17.11</name></author>
	</entry>
</feed>