Shannon Information Measures

The pyinform.shannon module provides a collection of entropy and information measures on discrete probability distributions (pyinform.dist.Dist). This module forms the core of PyInform as all of the time series analysis functions are built upon this module.

Examples

Example 1: Entropy and Random Numbers

The pyinform.shannon.entropy() function allows us to calculate the Shannon entropy of a distributions. Let’s try generating a random distribution and see what the entropy looks like?

import numpy as np
from pyinform.dist import Dist
from pyinform.shannon import entropy

np.random.seed(2019)
xs = np.random.randint(0,10,10000)
d = Dist(10)
for x in xs:
    d.tick(x)
print(entropy(d))
print(entropy(d, b=10))
3.3216276921709724
0.9999095697715877

This is exactly what you should expect; the pseudo-random number generate does a decent job producing integers in a uniform fashion.

Example 2: Mutual Information

How correlated are consecutive integers? Let’s find out using mutual_info().

import numpy as np
from pyinform.dist import Dist
from pyinform.shannon import mutual_info

np.random.seed(2019)
obs = np.random.randint(0, 10, 100)

p_xy = Dist(100)
p_x  = Dist(10)
p_y  = Dist(10)

for x in obs[:-1]:
    for y in obs[1:]:
        p_x.tick(x)
        p_y.tick(y)
        p_xy.tick(10*x + y)

print(mutual_info(p_xy, p_x, p_y))
print(mutual_info(p_xy, p_x, p_y, b=10))
1.3322676295501878e-15
4.440892098500626e-16

Due to the subtlties of floating-point computation we don’t get zero. Really, though the mutual information is zero.

Example 3: Relative Entropy and Biased Random Numbers

Okay. Now let’s generate some binary sequences. The first will be roughly uniform, but the second will be biased toward 0.

import numpy as np
from pyinform.dist import Dist
from pyinform.shannon import relative_entropy

p = Dist(2)
q = Dist(2)

np.random.seed(2019)
ys = np.random.randint(0, 2, 100)
for y in ys:
    p.tick(y)

xs = np.random.randint(0, 6, 100)
for i, _ in enumerate(xs):
    xs[i] = (((xs[i] % 5) % 4) % 3) % 2
    q.tick(xs[i])

print(relative_entropy(q,p))
print(relative_entropy(p,q))
0.3810306585586593
0.4924878808808457

API Documentation

pyinform.shannon.entropy(p, b=2.0)[source]

Compute the base-b shannon entropy of the distribution p.

Taking \(X\) to be a random variable with \(p_X\) a probability distribution on \(X\), the base-\(b\) Shannon entropy is defined as

\[H(X) = -\sum_{x} p_X(x) \log_b p_X(x).\]

Examples:

>>> d = Dist([1,1,1,1])
>>> shannon.entropy(d)
2.0
>>> shannon.entropy(d, 4)
1.0
>>> d = Dist([2,1])
>>> shannon.entropy(d)
0.9182958340544896
>>> shannon.entropy(d, b=3)
0.579380164285695

See [Shannon1948a] for more details.

Parameters
Returns

the shannon entropy of the distribution

Return type

float

pyinform.shannon.mutual_info(p_xy, p_x, p_y, b=2.0)[source]

Compute the base-b mututal information between two random variables.

Mutual information provides a measure of the mutual dependence between two random variables. Let \(X\) and \(Y\) be random variables with probability distributions \(p_X\) and \(p_Y\) respectively, and \(p_{X,Y}\) the joint probability distribution over \((X,Y)\). The base-\(b\) mutual information between \(X\) and \(Y\) is defined as

\[\begin{split}I(X;Y) &= \sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)}\\ &= H(X) + H(Y) - H(X,Y).\end{split}\]

Here the second line takes advantage of the properties of logarithms and the definition of Shannon entropy, entropy().

To some degree one can think of mutual information as a measure of the (linear and non-linear) coorelations between random variables.

See [Cover1991a] for more details.

Examples:

>>> xy = Dist([10,70,15,5])
>>> x = Dist([80,20])
>>> y = Dist([25,75])
>>> shannon.mutual_info(xy, x, y)
0.21417094500762912
Parameters
Returns

the mutual information

Return type

float

pyinform.shannon.conditional_entropy(p_xy, p_y, b=2.0)[source]

Compute the base-b conditional entropy given joint (p_xy) and marginal (p_y) distributions.

Conditional entropy quantifies the amount of information required to describe a random variable \(X\) given knowledge of a random variable \(Y\). With \(p_Y\) the probability distribution of \(Y\), and \(p_{X,Y}\) the distribution for the joint distribution \((X,Y)\), the base-\(b\) conditional entropy is defined as

\[\begin{split}H(X|Y) &= -\sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_Y(y)}\\ &= H(X,Y) - H(Y).\end{split}\]

See [Cover1991a] for more details.

Examples:

>>> xy = Dist([10,70,15,5])
>>> x = Dist([80,20])
>>> y = Dist([25,75])
>>> shannon.conditional_entropy(xy, x)
0.5971071794515037
>>> shannon.conditional_entropy(xy, y)
0.5077571498797332
Parameters
Returns

the conditional entropy

Return type

float

pyinform.shannon.conditional_mutual_info(p_xyz, p_xz, p_yz, p_z, b=2.0)[source]

Compute the base-b conditional mutual information the given joint (p_xyz) and marginal (p_xz, p_yz, p_z) distributions.

Conditional mutual information was introduced by [Dobrushin1959] and [Wyner1978], and more or less quantifies the average mutual information between random variables \(X\) and \(Y\) given knowledge of a third \(Z\). Following the same notations as in conditional_entropy(), the base-\(b\) conditional mutual information is defined as

\[\begin{split}I(X;Y|Z) &= -\sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)}\\ &= -\sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,Y,Z}(x,y,z)p_{Z}(z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}\\ &= H(X,Z) + H(Y,Z) - H(Z) - H(X,Y,Z)\end{split}\]

Examples:

>>> xyz = Dist([24,24,9,6,25,15,10,5])
>>> xz = Dist([15,9,5,10])
>>> yz = Dist([9,15,10,15])
>>> z = Dist([3,5])
>>> shannon.conditional_mutual_info(xyz, xz, yz, z)
0.12594942727460334
Parameters
Returns

the conditional mutual information

Return type

float

pyinform.shannon.relative_entropy(p, q, b=2.0)[source]

Compute the base-b relative entropy between posterior (p) and prior (q) distributions.

Relative entropy, also known as the Kullback-Leibler divergence, was introduced by Kullback and Leiber in 1951 ([Kullback1951a]). Given a random variable \(X\), two probability distributions \(p_X\) and \(q_X\), relative entropy measures the information gained in switching from the prior \(q_X\) to the posterior \(p_X\):

\[D_{KL}(p_X || q_X) = \sum_x p_X(x) \log_b \frac{p_X(x)}{q_X(x)}.\]

Many of the information measures, e.g. mutual_info(), conditional_entropy(), etc…, amount to applications of relative entropy for various prior and posterior distributions.

Examples:

>>> p = Dist([4,1])
>>> q = Dist([1,1])
>>> shannon.relative_entropy(p,q)
0.27807190511263774
>>> shannon.relative_entropy(q,p)
0.3219280948873624
>>> p = Dist([1,0])
>>> q = Dist([1,1])
>>> shannon.relative_entropy(p,q)
1.0
>>> shannon.relative_entropy(q,p)
nan
Parameters
Returns

the relative entropy

Return type

float

References

Cover1991a(1,2)

T.M. Cover amd J.A. Thomas (1991). “Elements of information theory” (1st ed.). New York: Wiley. ISBN 0-471-06259-6.

Dobrushin1959

Dobrushin, R. L. (1959). “General formulation of Shannon’s main theorem in information theory”. Ushepi Mat. Nauk. 14: 3-104.

Kullback1951a

Kullback, S.; Leibler, R.A. (1951). “On information and sufficiency”. Annals of Mathematical Statistics. 22 (1): 79-86. doi:10.1214/aoms/1177729694. MR 39968.

Shannon1948a

Shannon, Claude E. (July-October 1948). “A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (3): 379-423. doi:10.1002/j.1538-7305.1948.tb01448.x.

Wyner1978

Wyner, A. D. (1978). “A definition of conditional mutual information for arbitrary ensembles”. Information and Control 38 (1): 51-59. doi:10.1015/s0019-9958(78)90026-8.