Shannon Information Measures¶
The pyinform.shannon
module provides a collection of entropy and
information measures on discrete probability distributions
(pyinform.dist.Dist
). This module forms the core of PyInform as
all of the time series analysis functions are built upon this module.
Examples¶
Example 1: Entropy and Random Numbers¶
The pyinform.shannon.entropy()
function allows us to calculate the
Shannon entropy of a distributions. Let’s try generating a random distribution
and see what the entropy looks like?
import numpy as np
xs = np.random.randint(0,10,10000)
d = Dist(10)
for x in xs:
d.tick(x)
print(entropy(d)) # 3.32137023165359
print(entropy(d, b=10)) # 0.9998320664331565
This is exactly what you should expect; the pseudorandom number generate does a decent job producing integers in a uniform fashion.
Example 2: Mutual Information¶
How correlated are consecutive integers? Let’s find out using
mutual_info()
.
from pyinform.dist import Dist
from pyinform.shannon import mutual_info
import numpy as np
obs = np.random.randint(0, 10, 10000)
p_xy = Dist(100)
p_x = Dist(10)
p_y = Dist(10)
for x in obs[:1]:
for y in obs[1:]:
p_x.tick(x)
p_y.tick(y)
p_xy.tick(10*x + y)
print(mutual_info(p_xy, p_x, p_y)) # 1.7763568394002505e15
print(mutual_info(p_xy, p_x, p_y, b=10)) # 6.661338147750939e16
Due to the subtlties of floatingpoint computation we don’t get zero. Really, though the mutual information is zero.
Example 3: Relative Entropy and Biased Random Numbers¶
Okay. Now let’s generate some binary sequences. The first will be roughly uniform, but the second will be biased toward 0.
from pyinform.dist import Dist
from pyinform.shannon import relative_entropy
import numpy as np
p = Dist(2)
q = Dist(2)
ys = np.random.randint(0, 2, 10000)
for y in ys:
p.tick(y)
xs = np.random.randint(0, 6, 10000)
for i, _ in enumerate(xs):
xs[i] = (((xs[i] % 5) % 4) % 3) % 2
q.tick(xs[i])
print(relative_entropy(q,p)) # 0.3338542254583825
print(relative_entropy(p,q)) # 0.40107198925821924
API Documentation¶

pyinform.shannon.
entropy
(p, b=2.0)[source]¶ Compute the baseb shannon entropy of the distribution p.
Taking \(X\) to be a random variable with \(p_X\) a probability distribution on \(X\), the base\(b\) Shannon entropy is defined as
\[H(X) = \sum_{x} p_X(x) \log_b p_X(x).\]Examples:
>>> d = Dist([1,1,1,1]) >>> entropy(d) 2.0 >>> entropy(d, 4) 1.0
>>> d = Dist([2,1]) >>> entropy(d) 0.9182958340544896 >>> entropy(d, b=3) 0.579380164285695
See [Shannon1948a] for more details.
Parameters:  p (
pyinform.dist.Dist
) – the distribution  b (float) – the logarithmic base
Returns: the shannon entropy of the distribution
Return type: float
 p (

pyinform.shannon.
mutual_info
(p_xy, p_x, p_y, b=2.0)[source]¶ Compute the baseb mututal information between two random variables.
Mutual information provides a measure of the mutual dependence between two random variables. Let \(X\) and \(Y\) be random variables with probability distributions \(p_X\) and \(p_Y\) respectively, and \(p_{X,Y}\) the joint probability distribution over \((X,Y)\). The base\(b\) mutual information between \(X\) and \(Y\) is defined as
\[\begin{split}I(X;Y) &= \sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)}\\ &= H(X) + H(Y)  H(X,Y).\end{split}\]Here the second line takes advantage of the properties of logarithms and the definition of Shannon entropy,
entropy()
.To some degree one can think of mutual information as a measure of the (linear and nonlinear) coorelations between random variables.
See [Cover1991a] for more details.
Examples:
>>> xy = Dist([10,70,15,5]) >>> x = Dist([80,20]) >>> y = Dist([25,75]) >>> mutual_info(xy, x, y) 0.214170945007629
Parameters:  p_xy (
pyinform.dist.Dist
) – the joint distribution  p_x (
pyinform.dist.Dist
) – the xmarginal distribution  p_y (
pyinform.dist.Dist
) – the ymarginal distribution  b (float) – the logarithmic base
Returns: the mutual information
Return type: float
 p_xy (

pyinform.shannon.
conditional_entropy
(p_xy, p_y, b=2.0)[source]¶ Compute the baseb conditional entropy given joint (p_xy) and marginal (p_y) distributions.
Conditional entropy quantifies the amount of information required to describe a random variable \(X\) given knowledge of a random variable \(Y\). With \(p_Y\) the probability distribution of \(Y\), and \(p_{X,Y}\) the distribution for the joint distribution \((X,Y)\), the base\(b\) conditional entropy is defined as
\[\begin{split}H(XY) &= \sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_Y(y)}\\ &= H(X,Y)  H(Y).\end{split}\]See [Cover1991a] for more details.
Examples:
>>> xy = Dist([10,70,15,5]) >>> x = Dist([80,20]) >>> y = Dist([25,75]) >>> conditional_entropy(xy, x) 0.5971071794515037 >>> conditional_entropy(xy, y) 0.5077571498797332
Parameters:  p_xy (
pyinform.dist.Dist
) – the joint distribution  p_y (
pyinform.dist.Dist
) – the marginal distribution  b (float) – the logarithmic base
Returns: the conditional entropy
Return type: float
 p_xy (

pyinform.shannon.
conditional_mutual_info
(p_xyz, p_xz, p_yz, p_z, b=2.0)[source]¶ Compute the baseb conditional mutual information the given joint (p_xyz) and marginal (p_xz, p_yz, p_z) distributions.
Conditional mutual information was introduced by [Dobrushin1959] and [Wyner1978], and more or less quantifies the average mutual information between random variables \(X\) and \(Y\) given knowledge of a third \(Z\). Following the same notations as in
conditional_entropy()
, the base\(b\) conditional mutual information is defined as\[\begin{split}I(X;YZ) &= \sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,YZ}(x,yz)}{p_{XZ}(xz)p_{YZ}(yz)}\\ &= \sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,Y,Z}(x,y,z)p_{Z}(z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}\\ &= H(X,Z) + H(Y,Z)  H(Z)  H(X,Y,Z)\end{split}\]Examples:
>>> xyz = Dist([24,24,9,6,25,15,10,5]) >>> xz = Dist([15,9,5,10]) >>> yz = Dist([9,15,10,15]) >>> z = Dist([3,5]) >>> conditional_mutual_info(xyz, xz, yz, z) 0.12594942727460323
Parameters:  p_xyz (
pyinform.dist.Dist
) – the joint distribution  p_xz (
pyinform.dist.Dist
) – the x,zmarginal distribution  p_yz (
pyinform.dist.Dist
) – the y,zmarginal distribution  p_z (
pyinform.dist.Dist
) – the zmarginal distribution  b (float) – the logarithmic base
Returns: the conditional mutual information
Return type: float
 p_xyz (

pyinform.shannon.
relative_entropy
(p, q, b=2.0)[source]¶ Compute the baseb relative entropy between posterior (p) and prior (q) distributions.
Relative entropy, also known as the KullbackLeibler divergence, was introduced by Kullback and Leiber in 1951 ([Kullback1951a]). Given a random variable \(X\), two probability distributions \(p_X\) and \(q_X\), relative entropy measures the information gained in switching from the prior \(q_X\) to the posterior \(p_X\):
\[D_{KL}(p_X  q_X) = \sum_x p_X(x) \log_b \frac{p_X(x)}{q_X(x)}.\]Many of the information measures, e.g.
mutual_info()
,conditional_entropy()
, etc…, amount to applications of relative entropy for various prior and posterior distributions.Examples:
>>> p = Dist([4,1]) >>> q = Dist([1,1]) >>> relative_entropy(p,q) 0.27807190511263774 >>> relative_entropy(q,p) 0.32192809488736235
>>> p = Dist([1,0]) >>> q = Dist([1,1]) >>> relative_entropy(p,q) 1.0 >>> relative_entropy(q,p) nan
Parameters:  p (
pyinform.dist.Dist
) – the posterior distribution  q (
pyinform.dist.Dist
) – the prior distribution  b (float) – the logarithmic base
Returns: the relative entropy
Return type: float
 p (
References¶
[Cover1991a]  (1, 2) T.M. Cover amd J.A. Thomas (1991). “Elements of information theory” (1st ed.). New York: Wiley. ISBN 0471062596. 
[Dobrushin1959]  Dobrushin, R. L. (1959). “General formulation of Shannon’s main theorem in information theory”. Ushepi Mat. Nauk. 14: 3104. 
[Kullback1951a]  Kullback, S.; Leibler, R.A. (1951). “On information and sufficiency”. Annals of Mathematical Statistics. 22 (1): 7986. doi:10.1214/aoms/1177729694. MR 39968. 
[Shannon1948a]  Shannon, Claude E. (JulyOctober 1948). “A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (3): 379423. doi:10.1002/j.15387305.1948.tb01448.x. 
[Wyner1978]  Wyner, A. D. (1978). “A definition of conditional mutual information for arbitrary ensembles”. Information and Control 38 (1): 5159. doi:10.1015/s00199958(78)900268. 