Time Series Measures¶
The Empirical Distributions and Shannon Information Measures come together to make information measures on time series almost trivial to implement. Every such measure amounts to constructing distributions and applying an information measure.
Notation¶
Throughout this section we will denote random variables as \(X, Y, \ldots\), and let \(x_i, y_i, \ldots\) represent the \(i\)-th time step of a time series drawn a random variable. Many of the measures consider \(k\)-histories (a.k.a \(k\)-blocks) of the time series, e.g. \(x^{(k)}_i = \{x_{i-k+1}, x_{i-k+2}, \ldots, x_i\}\).
For the sake of conciseness, when denoting probability distributions, we will only make the random variable explicit in situations where the notation is ambiguous. Generally, we will write \(p(x_i)\), \(p(x^{(k)}_i)\) and \(p(x^{(k)}_i, x_{i+1})\) to denote the empirical probability of obseriving the \(x_i\) state, the \(x^{(k)}_i\) \(k\)-history, and the joint probability of observing \((x^{(k)}_i, x_{i+1})\).
Please report any notational ambiguities as an issue.
Subtle Details¶
The library takes several liberties in the way in which the time series measures are implemented.
The Base: States and Logarithms¶
The word “base” has two different meanings in the context of the information measures on time series. It could refer to the base of the time series itself, that is the number of unique states in the time series. For example, the time series \(\{0,2,1,0,0\}\) has a base of 3. On the other handle it could refer to the base of the logarithm used in computing the information content of the emipirical distributions. The problem is that these two meanings clash. The base of the time series affects the range of values the measure can produce, and the base of the logarithm represents a rescaling of those values.
The following measures use one of two conventions. The measures of information dynamics (e.g. Active Information, Entropy Rate and Transfer Entropy) take as an argument the base of the state and use that as the base of the logarithm. The result is that the time-averaged values of those measures are in the unit range. An exception to this rule is the block entropy. It two uses this convention, but its value will not be in the unit range unless the block size \(k\) is 1 or the specified base is \(2^k\) (or you could just divide by \(k\)). The second convention is to take both the base of the time series and the base of the logarithm. This is about as unambiguous as it gets. This approach is used for the measures that do not make explicit use of a history length (or block size), e.g. Mutual Information, Conditional Entropy, etc…
Coming releases may revise the handling of the bases, but until then each function’s documentation will specify how the base is used.
Multiple Initial Conditions¶
PyInform tries to provide handling of multiple initial conditions. The “proper” way to handle initial conditions is a bit contested. One completely reasonable approach is to apply the information measures to each initial condition’s time series independently and then average. One can think of this approach as conditioning the measure on the inital condition. The second approach is to independently use all of the initial conditions to construct the various probability distributions. You can think of this approach as rolling the uncertainty of the initial condition into the measure. 1
The current implementation takes the second approach. The accpeted time series can be up to 2-D with each row representing the time series for a different initial condition. We chose to take the second approach because the “measure then average” method can still be done with the current implimentation. For an example of this, see the example section of Active Information.
Subsequent releases may provide a mechanism for specifying a how the user prefers the initial conditions to be handled, but at the moment the user has to make it happen manually.
- 1
There is actually at least three ways to handle multiple initial conditions, but the third method is related to the first described in the text by the addition of the entropy of the distribution over initial conditions. In this approach, the initial condition is considered as a random variable.
Active Information¶
Active information (AI) was introduced in [Lizier2012] to quantify information storage in distributed computation. Active information is defined in terms of a temporally local variant
where the probabilities are constructed empirically from the entire time series. From the local variant, the temporally global active information as
Strictly speaking, the local and average active information are defined as
but we do not provide limiting functionality in this library (yet!).
Examples¶
A Single Initial Condition¶
The typical usage is to provide the time series as a sequence (or
numpy.ndarray
) and the history length as an integer and let the
active_info()
sort out the rest:
>>> active_info([0,0,1,1,1,1,0,0,0], k=2)
0.3059584928680418
>>> active_info([0,0,1,1,1,1,0,0,0], k=2, local=True)
array([[-0.19264508, 0.80735492, 0.22239242, 0.22239242, -0.36257008,
1.22239242, 0.22239242]])
Multiple Initial Conditions¶
What about multiple initial conditions? We’ve got that covered!
>>> active_info([[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]], k=2)
0.35987902873686084
>>> active_info([[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]], k=2, local=True)
array([[ 0.80735492, -0.36257008, 0.63742992, 0.63742992, -0.77760758,
0.80735492, -1.19264508],
[ 0.80735492, 0.80735492, 0.22239242, 0.80735492, 0.80735492,
0.22239242, 0.80735492]])
As mentioned in Subtle Details, averaging the AI for over the initial conditions does not give the same result as constructing the distributions using all of the initial conditions together.
>>> import numpy as np
>>> series = np.asarray([[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]])
>>> np.apply_along_axis(active_info, 1, series, 2).mean()
0.5845395307173363
Or if you are feeling verbose:
>>> ai = np.empty(len(series))
>>> for i, xs in enumerate(series):
... ai[i] = active_info(xs, k=2)
...
>>> ai
array([0.30595849, 0.86312057])
>>> ai.mean()
0.5845395307173363
API Documentation¶
-
pyinform.activeinfo.
active_info
(series, k, local=False)[source]¶ Compute the average or local active information of a timeseries with history length k.
- Parameters
series (sequence or
numpy.ndarray
) – the time seriesk (int) – the history length
local (bool) – compute the local active information
- Returns
the average or local active information
- Return type
float or
numpy.ndarray
- Raises
ValueError – if the time series has no initial conditions
ValueError – if the time series is greater than 2-D
InformError – if an error occurs within the
inform
C call
Block Entropy¶
Block entropy, also known as N-gram entropy [Shannon1948], is the the standard Shannon entropy applied to the time series (or sequence) of \(k\)-histories of a time series (or sequence):
which of course reduces to the traditional Shannon entropy for k == 1
. Much
as with Active Information, the ideal usage is to take
\(k \rightarrow \infty\).
Examples¶
A Single Initial Condition¶
The typical usage is to provide the time series as a sequence (or
numpy.ndarray
) and the block size as an integer and let the
block_entropy()
sort out the rest:
>>> block_entropy([0,0,1,1,1,1,0,0,0], k=1)
0.9910760598382222
>>> block_entropy([0,0,1,1,1,1,0,0,0], k=1, local=True)
array([[0.84799691, 0.84799691, 1.169925 , 1.169925 , 1.169925 ,
1.169925 , 0.84799691, 0.84799691, 0.84799691]])
>>> block_entropy([0,0,1,1,1,1,0,0,0], k=2)
1.811278124459133
>>> block_entropy([0,0,1,1,1,1,0,0,0], k=2, local=True)
array([[1.4150375, 3. , 1.4150375, 1.4150375, 1.4150375, 3. ,
1.4150375, 1.4150375]])
Multiple Initial Conditions¶
Do we support multiple initial conditions? Of course we do!
>>> series = [[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]]
>>> block_entropy(series, k=2)
1.936278124459133
>>> block_entropy(series, k=2, local=True)
array([[1.4150375, 2.4150375, 2.4150375, 2.4150375, 2.4150375, 2. ,
1.4150375, 1.4150375],
[2. , 1.4150375, 2.4150375, 2. , 1.4150375, 2.4150375,
2. , 1.4150375]])
Or you can compute the block entropy on each initial condition and average:
>>> import numpy as np
>>> np.apply_along_axis(block_entropy, 1, series, 2).mean()
1.686278124459133
API Documentation¶
-
pyinform.blockentropy.
block_entropy
(series, k, local=False)[source]¶ Compute the (local) block entropy of a time series with block size k.
- Parameters
series (sequence or numpy.ndarray) – the time series
k (int) – the block size
local (bool) – compute the local block entropy
- Returns
the average or local block entropy
- Return type
float or numpy.ndarray
- Raises
ValueError – if the time series has no initial conditions
ValueError – if the time series is greater than 2-D
InformError – if an error occurs within the
inform
C call
Conditional Entropy¶
Conditional entropy is a measure of the amount of information
required to describe a random variable \(Y\) given knowledge of another
random variable \(X\). When applied to time series, two time series are used
to construct the empirical distributions and then
conditional_entropy()
can be applied to yield
This can be viewed as the time-average of the local conditional entropy
See [Cover1991] for more information.
Examples¶
>>> xs = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1]
>>> ys = [0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1]
>>> conditional_entropy(xs,ys) # H(Y|X)
0.5971071794515037
>>> conditional_entropy(ys,xs) # H(X|Y)
0.5077571498797332
>>> conditional_entropy(xs, ys, local=True)
array([3. , 3. , 0.19264508, 0.19264508, 0.19264508,
0.19264508, 0.19264508, 0.19264508, 0.19264508, 0.19264508,
0.19264508, 0.19264508, 0.19264508, 0.19264508, 0.19264508,
0.19264508, 0.4150375 , 0.4150375 , 0.4150375 , 2. ])
>>> conditional_entropy(ys, xs, local=True)
array([1.32192809, 1.32192809, 0.09953567, 0.09953567, 0.09953567,
0.09953567, 0.09953567, 0.09953567, 0.09953567, 0.09953567,
0.09953567, 0.09953567, 0.09953567, 0.09953567, 0.09953567,
0.09953567, 0.73696559, 0.73696559, 0.73696559, 3.9068906 ])
API Documentation¶
-
pyinform.conditionalentropy.
conditional_entropy
(xs, ys, local=False)[source]¶ Compute the (local) conditional entropy between two time series.
This function expects the condition to be the first argument.
- Parameters
xs (a sequence or
numpy.ndarray
) – the time series drawn from the conditional distributionys (a sequence or
numpy.ndarray
) – the time series drawn from the target distributionlocal (bool) – compute the local conditional entropy
- Returns
the local or average conditional entropy
- Return type
float or
numpy.ndarray
- Raises
ValueError – if the time series have different shapes
InformError – if an error occurs within the
inform
C call
Entropy Rate¶
Entropy rate (ER) quantifies the amount of information needed to describe the \(X\) given observations of \(X^{(k)}\). In other words, it is the entropy of the time series conditioned on the \(k\)-histories. The local entropy rate
can be averaged to obtain the global entropy rate
Much as with Active Information, the local and average entropy rates are formally obtained in the limit
but we do not provide limiting functionality in this library (yet!).
See [Cover1991] for more details.
Examples¶
A Single Initial Condition¶
Let’s apply the entropy rate to a single initial condition. Typically, you will
just provide the time series and the history length, and let
entropy_rate()
take care of the rest:
>>> entropy_rate([0,0,1,1,1,1,0,0,0], k=2)
0.6792696431662095
>>> entropy_rate([0,0,1,1,1,1,0,0,0], k=2, local=True)
array([[1. , 0. , 0.5849625, 0.5849625, 1.5849625, 0. ,
1. ]])
>>> entropy_rate([0,0,1,1,1,1,2,2,2], k=2)
0.39355535745192416
Multiple Initial Conditions¶
Of course multiple initial conditions are handled.
>>> series = [[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]]
>>> entropy_rate(series, k=2)
0.6253491072973907
>>> entropy_rate(series, k=2, local=True)
array([[0.4150375, 1.5849625, 0.5849625, 0.5849625, 1.5849625, 0. ,
2. ],
[0. , 0.4150375, 0.5849625, 0. , 0.4150375, 0.5849625,
0. ]])
API Documentation¶
-
pyinform.entropyrate.
entropy_rate
(series, k, local=False)[source]¶ Compute the average or local entropy rate of a time series with history length k.
- Parameters
series (sequence or
numpy.ndarray
) – the time seriesk (int) – the history length
local (bool) – compute the local active information
- Returns
the average or local entropy rate
- Return type
float or
numpy.ndarray
- Raises
ValueError – if the time series has no initial conditions
ValueError – if the time series is greater than 2-D
InformError – if an error occurs within the
inform
C call
Mutual Information¶
Mutual information (MI) is a measure of the amount of mutual dependence
between two random variables. When applied to time series, two time series are
used to construct the empirical distributions and then
mutual_info()
can be applied. Locally MI is defined as
The mutual information is then just the time average of \(i_{i}(X,Y)\).
See [Cover1991] for more details.
Examples¶
>>> xs = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1]
>>> ys = [0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1]
>>> mutual_info(xs, ys)
0.21417094500762912
>>> mutual_info(xs, ys, local=True)
array([-1. , -1. , 0.22239242, 0.22239242, 0.22239242,
0.22239242, 0.22239242, 0.22239242, 0.22239242, 0.22239242,
0.22239242, 0.22239242, 0.22239242, 0.22239242, 0.22239242,
0.22239242, 1.5849625 , 1.5849625 , 1.5849625 , -1.5849625 ])
API Documentation¶
-
pyinform.mutualinfo.
mutual_info
(xs, ys, local=False)[source]¶ Compute the (local) mutual information between two time series.
This function explicitly takes the logarithmic base b as an argument.
- Parameters
xs (a sequence or
numpy.ndarray
) – a time seriesys (a sequence or
numpy.ndarray
) – a time serieslocal (bool) – compute the local mutual information
- Returns
the local or average mutual information
- Return type
float or
numpy.ndarray
- Raises
ValueError – if the time series have different shapes
InformError – if an error occurs within the
inform
C call
Relative Entropy¶
Relative entropy, also known as the Kullback-Leibler divergence, measures the
amount of information gained in switching from a prior \(q_X\) to a
posterior distribution \(p_X\) over the same support. That is \(q_X\)
and \(P\) represent hypotheses of the distribution of some random variable
\(X.\) Time series data sampled from the posterior and prior can be used to
estiamte those distributions, and the relative entropy can the be computed via
a call to relative_entropy()
. The result is
which has as its local counterpart
Note that the average in moving from the local to the non-local relative entropy is taken over the posterior distribution.
See [Kullback1951] and [Cover1991] for more information.
Examples¶
>>> xs = [0,1,0,0,0,0,0,0,0,1]
>>> ys = [0,1,1,1,1,0,0,1,0,0]
>>> relative_entropy(xs, ys)
0.27807190511263774
>>> relative_entropy(ys, xs)
0.3219280948873624
>>> xs = [0,0,0,0]
>>> ys = [0,1,1,0]
>>> relative_entropy(xs, ys)
1.0
>>> relative_entropy(ys, xs)
nan
API Documentation¶
-
pyinform.relativeentropy.
relative_entropy
(xs, ys, local=False)[source]¶ Compute the local or global relative entropy between two time series treating each as observations from a distribution.
- Parameters
xs (a sequence or
numpy.ndarray
) – the time series sampled from the posterior distributionys (a sequence or
numpy.ndarray
) – the time series sampled from the prior distributionlocal (bool) – compute the local relative entropy
- Returns
the local or global relative entropy
- Return type
float or
numpy.ndarray
- Raises
ValueError – if the time series have different shapes
InformError – if an error occurs within the
inform
C call
Transfer Entropy¶
Transfer entropy (TE) was introduced by [Schreiber2000] to quantify information transfer between an information source and destination, conditioning out shared history effects. TE was originally formulated considering only the source and destination; however, many systems of interest have more than just those two components. As such, it may be necessary to condition the probabilities on the states of all “background” components in the system. These two forms are sometimes called _apparent_ and _complete_ transfer entropy, respectively ([Lizier2008]).
This implementation of TE allows the user to condition the probabilities on any number of background processes, within hardware limits of course. For the subsequent description, take \(X\) to be the source, \(Y\) the target, and \(\mathcal{W}=\left\{W_1, \ldots, W_l\right\}\) to be the background processes against which we’d like to condition. For example, we might take the state of two nodes in a dynamical network as the source and target, while all other nodes in the network are treated as the background. Transfer entropy is then defined in terms of a time-local variant:
Averaging in time we have
As in the case of Active Information and Entropy Rate, the transfer entropy is formally defined as the limit of the \(k\)-history transfer entropy as \(k \rightarrow \infty\):
but we do not provide limiting functionality in this library (yet!).
See [Schreiber2000], [Kraiser2002] and [Lizier2008] for more details.
Examples¶
One initial condition, no background¶
Just give us a couple of time series and tell us the history length and we’ll give you a number
>>> xs = [0,1,1,1,1,0,0,0,0]
>>> ys = [0,0,1,1,1,1,0,0,0]
>>> transfer_entropy(xs, ys, k=2)
0.6792696431662097
>>> transfer_entropy(ys, xs, k=2)
0.0
or an array if you ask for it
>>> transfer_entropy(xs, ys, k=2, local=True)
array([[1. , 0. , 0.5849625, 0.5849625, 1.5849625, 0. ,
1. ]])
>>> transfer_entropy(ys, xs, k=2, local=True)
array([[0., 0., 0., 0., 0., 0., 0.]])
Two initial conditions, no background¶
Uhm, yes we can! (Did you really expect anything less?)
>>> xs = [[1,0,0,0,0,1,1,1,1], [1,1,1,1,0,0,0,1,1]]
>>> ys = [[0,0,1,1,1,1,0,0,0], [1,0,0,0,0,1,1,1,0]]
>>> transfer_entropy(xs, ys, k=2)
0.693536138896192
>>> transfer_entropy(xs, ys, k=2, local=True)
array([[1.32192809, 0. , 0.73696559, 0.73696559, 1.32192809,
0. , 0.73696559],
[0. , 0.73696559, 0.73696559, 1.32192809, 0. ,
0.73696559, 1.32192809]])
One initial condition, one background process¶
>>> xs = [0,1,1,1,1,0,0,0,0]
>>> ys = [0,0,1,1,1,1,0,0,0]
>>> ws = [0,1,1,1,1,0,1,1,1]
>>> transfer_entropy(xs, ys, k=2, condition=ws)
0.2857142857142857
>>> transfer_entropy(xs, ys, k=2, condition=ws, local=True)
array([[1., 0., 0., 0., 0., 0., 1.]])
One initial condition, two background processes¶
>>> xs = [0,1,1,1,1,0,0,0,0]
>>> ys = [0,0,1,1,1,1,0,0,0]
>>> ws = [[1,0,1,0,1,1,1,1,1], [1,1,0,1,0,1,1,1,1]]
>>> transfer_entropy(xs, ys, k=2, condition=ws)
0.0
>>> transfer_entropy(xs, ys, k=2, condition=ws, local=True)
array([[0., 0., 0., 0., 0., 0., 0.]])
Two initial conditions, two background processes¶
>>> xs = [[1,1,0,1,0,1,1,0,0],[0,1,0,1,1,1,0,0,1]]
>>> ys = [[1,1,1,0,1,1,1,0,0],[0,0,1,0,1,1,1,0,0]]
>>> ws = [[[1,1,0,1,1,0,1,0,1],[1,1,1,0,1,1,1,1,0]],
... [[1,1,1,1,0,0,0,0,1],[0,0,0,1,1,1,1,0,1]]]
>>> transfer_entropy(xs, ys, k=2)
0.5364125003090668
>>> transfer_entropy(xs, ys, k=2, condition=ws)
0.3396348215831049
>>> transfer_entropy(xs, ys, k=2, condition=ws, local=True)
array([[ 1. , 0. , 0. , -0.4150375, 0. ,
0. , 1. ],
[ 0. , 0.5849625, 1. , 0.5849625, 0. ,
1. , 0. ]])
API Documentation¶
-
pyinform.transferentropy.
transfer_entropy
(source, target, k, condition=None, local=False)[source]¶ Compute the local or average transfer entropy from one time series to another with target history length k. Optionally, time series can be provided against which to condition.
- Parameters
source (sequence or
numpy.ndarray
) – the source time seriestarget (sequence or
numpy.ndarray
) – the target time seriesk (int) – the history length
condition (sequence or
numpy.ndarray
) – time series of any conditionslocal (bool) – compute the local transfer entropy
- Returns
the average or local transfer entropy
- Return type
float or
numpy.ndarray
- Raises
ValueError – if the time series have different shapes
ValueError – if either time series has no initial conditions
ValueError – if either time series is greater than 2-D
InformError – if an error occurs within the
inform
C call
References¶
- Cover1991(1,2,3,4)
T.M. Cover amd J.A. Thomas (1991). “Elements of information theory” (1st ed.). New York: Wiley. ISBN 0-471-06259-6.
- Kraiser2002
Kaiser, T. Schreiber, “Information transfer in continuous processes”, Physica D: Nonlinear Phenomena, Volume 166, Issues 1–2, 1 June 2002, Pages 43-62, ISSN 0167-2789
- Kullback1951
Kullback, S.; Leibler, R.A. (1951). “On information and sufficiency”. Annals of Mathematical Statistics. 22 (1): 79-86. doi:10.1214/aoms/1177729694. MR 39968.
- Lizier2008(1,2)
J.T. Lizier M. Prokopenko and A. Zomaya, “Local information transfer as a spatiotemporal filter for complex systems”, Phys. Rev. E 77, 026110, 2008.
- Lizier2012
J.T. Lizier, M. Prokopenko and A.Y. Zomaya, “Local measures of information storage in complex distributed computation” Information Sciences, vol. 208, pp. 39-54, 2012.
- Schreiber2000(1,2)
Schreiber, “Measuring information transfer”, Phys.Rev.Lett. 85 (2) pp.461-464, 2000.
- Shannon1948
Shannon, Claude E. (July-October 1948). “A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (3): 379-423. doi:10.1002/j.1538-7305.1948.tb01448.x.