Shannon Information Measures

The pyinform.shannon module provides a collection of entropy and information measures on discrete probability distributions (pyinform.dist.Dist). This module forms the core of PyInform as all of the time series analysis functions are built upon this module.

Examples

Example 1: Entropy and Random Numbers

The pyinform.shannon.entropy() function allows us to calculate the Shannon entropy of a distributions. Let’s try generating a random distribution and see what the entropy looks like?

import numpy as np

xs = np.random.randint(0,10,10000)
d = Dist(10)
for x in xs:
    d.tick(x)
print(entropy(d))       # 3.32137023165359
print(entropy(d, b=10)) # 0.9998320664331565

This is exactly what you should expect; the pseudo-random number generate does a decent job producing integers in a uniform fashion.

Example 2: Mutual Information

How correlated are consecutive integers? Let’s find out using mutual_info().

from pyinform.dist import Dist
from pyinform.shannon import mutual_info
import numpy as np

obs = np.random.randint(0, 10, 10000)

p_xy = Dist(100)
p_x  = Dist(10)
p_y  = Dist(10)

for x in obs[:-1]:
    for y in obs[1:]:
        p_x.tick(x)
        p_y.tick(y)
        p_xy.tick(10*x + y)

print(mutual_info(p_xy, p_x, p_y))       # -1.7763568394002505e-15
print(mutual_info(p_xy, p_x, p_y, b=10)) # -6.661338147750939e-16

Due to the subtlties of floating-point computation we don’t get zero. Really, though the mutual information is zero.

Example 3: Relative Entropy and Biased Random Numbers

Okay. Now let’s generate some binary sequences. The first will be roughly uniform, but the second will be biased toward 0.

from pyinform.dist import Dist
from pyinform.shannon import relative_entropy
import numpy as np

p = Dist(2)
q = Dist(2)

ys = np.random.randint(0, 2, 10000)
for y in ys:
    p.tick(y)

xs = np.random.randint(0, 6, 10000)
for i, _ in enumerate(xs):
    xs[i] = (((xs[i] % 5) % 4) % 3) % 2
    q.tick(xs[i])

print(relative_entropy(q,p)) # 0.3338542254583825
print(relative_entropy(p,q)) # 0.40107198925821924

API Documentation

pyinform.shannon.entropy(p, b=2.0)[source]

Compute the base-b shannon entropy of the distribution p.

Taking \(X\) to be a random variable with \(p_X\) a probability distribution on \(X\), the base-\(b\) Shannon entropy is defined as

\[H(X) = -\sum_{x} p_X(x) \log_b p_X(x).\]

Examples:

>>> d = Dist([1,1,1,1])
>>> entropy(d)
2.0
>>> entropy(d, 4)
1.0
>>> d = Dist([2,1])
>>> entropy(d)
0.9182958340544896
>>> entropy(d, b=3)
0.579380164285695

See [Shannon1948a] for more details.

Parameters:
Returns:

the shannon entropy of the distribution

Return type:

float

pyinform.shannon.mutual_info(p_xy, p_x, p_y, b=2.0)[source]

Compute the base-b mututal information between two random variables.

Mutual information provides a measure of the mutual dependence between two random variables. Let \(X\) and \(Y\) be random variables with probability distributions \(p_X\) and \(p_Y\) respectively, and \(p_{X,Y}\) the joint probability distribution over \((X,Y)\). The base-\(b\) mutual information between \(X\) and \(Y\) is defined as

\[\begin{split}I(X;Y) &= \sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)}\\ &= H(X) + H(Y) - H(X,Y).\end{split}\]

Here the second line takes advantage of the properties of logarithms and the definition of Shannon entropy, entropy().

To some degree one can think of mutual information as a measure of the (linear and non-linear) coorelations between random variables.

See [Cover1991a] for more details.

Examples:

>>> xy = Dist([10,70,15,5])
>>> x = Dist([80,20])
>>> y = Dist([25,75])
>>> mutual_info(xy, x, y)
0.214170945007629
Parameters:
Returns:

the mutual information

Return type:

float

pyinform.shannon.conditional_entropy(p_xy, p_y, b=2.0)[source]

Compute the base-b conditional entropy given joint (p_xy) and marginal (p_y) distributions.

Conditional entropy quantifies the amount of information required to describe a random variable \(X\) given knowledge of a random variable \(Y\). With \(p_Y\) the probability distribution of \(Y\), and \(p_{X,Y}\) the distribution for the joint distribution \((X,Y)\), the base-\(b\) conditional entropy is defined as

\[\begin{split}H(X|Y) &= -\sum_{x,y} p_{X,Y}(x,y) \log_b \frac{p_{X,Y}(x,y)}{p_Y(y)}\\ &= H(X,Y) - H(Y).\end{split}\]

See [Cover1991a] for more details.

Examples:

>>> xy = Dist([10,70,15,5])
>>> x = Dist([80,20])
>>> y = Dist([25,75])
>>> conditional_entropy(xy, x)
0.5971071794515037
>>> conditional_entropy(xy, y)
0.5077571498797332
Parameters:
Returns:

the conditional entropy

Return type:

float

pyinform.shannon.conditional_mutual_info(p_xyz, p_xz, p_yz, p_z, b=2.0)[source]

Compute the base-b conditional mutual information the given joint (p_xyz) and marginal (p_xz, p_yz, p_z) distributions.

Conditional mutual information was introduced by [Dobrushin1959] and [Wyner1978], and more or less quantifies the average mutual information between random variables \(X\) and \(Y\) given knowledge of a third \(Z\). Following the same notations as in conditional_entropy(), the base-\(b\) conditional mutual information is defined as

\[\begin{split}I(X;Y|Z) &= -\sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,Y|Z}(x,y|z)}{p_{X|Z}(x|z)p_{Y|Z}(y|z)}\\ &= -\sum_{x,y,z} p_{X,Y,Z}(x,y,z) \log_b \frac{p_{X,Y,Z}(x,y,z)p_{Z}(z)}{p_{X,Z}(x,z)p_{Y,Z}(y,z)}\\ &= H(X,Z) + H(Y,Z) - H(Z) - H(X,Y,Z)\end{split}\]

Examples:

>>> xyz = Dist([24,24,9,6,25,15,10,5])
>>> xz = Dist([15,9,5,10])
>>> yz = Dist([9,15,10,15])
>>> z = Dist([3,5])
>>> conditional_mutual_info(xyz, xz, yz, z)
0.12594942727460323
Parameters:
Returns:

the conditional mutual information

Return type:

float

pyinform.shannon.relative_entropy(p, q, b=2.0)[source]

Compute the base-b relative entropy between posterior (p) and prior (q) distributions.

Relative entropy, also known as the Kullback-Leibler divergence, was introduced by Kullback and Leiber in 1951 ([Kullback1951a]). Given a random variable \(X\), two probability distributions \(p_X\) and \(q_X\), relative entropy measures the information gained in switching from the prior \(q_X\) to the posterior \(p_X\):

\[D_{KL}(p_X || q_X) = \sum_x p_X(x) \log_b \frac{p_X(x)}{q_X(x)}.\]

Many of the information measures, e.g. mutual_info(), conditional_entropy(), etc…, amount to applications of relative entropy for various prior and posterior distributions.

Examples:

>>> p = Dist([4,1])
>>> q = Dist([1,1])
>>> relative_entropy(p,q)
0.27807190511263774
>>> relative_entropy(q,p)
0.32192809488736235
>>> p = Dist([1,0])
>>> q = Dist([1,1])
>>> relative_entropy(p,q)
1.0
>>> relative_entropy(q,p)
nan
Parameters:
Returns:

the relative entropy

Return type:

float

References

[Cover1991a](1, 2) T.M. Cover amd J.A. Thomas (1991). “Elements of information theory” (1st ed.). New York: Wiley. ISBN 0-471-06259-6.
[Dobrushin1959]Dobrushin, R. L. (1959). “General formulation of Shannon’s main theorem in information theory”. Ushepi Mat. Nauk. 14: 3-104.
[Kullback1951a]Kullback, S.; Leibler, R.A. (1951). “On information and sufficiency”. Annals of Mathematical Statistics. 22 (1): 79-86. doi:10.1214/aoms/1177729694. MR 39968.
[Shannon1948a]Shannon, Claude E. (July-October 1948). “A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (3): 379-423. doi:10.1002/j.1538-7305.1948.tb01448.x.
[Wyner1978]Wyner, A. D. (1978). “A definition of conditional mutual information for arbitrary ensembles”. Information and Control 38 (1): 51-59. doi:10.1015/s0019-9958(78)90026-8.