Time Series Measures¶

The Empirical Distributions and Shannon Information Measures come together to make information measures on time series almost trivial to implement. Every such measure amounts to constructing distributions and applying an information measure.

Notation¶

Throughout this section we will denote random variables as \(X, Y, \ldots\), and let \(x_i, y_i, \ldots\) represent the \(i\)-th time step of a time series drawn a random variable. Many of the measures consider \(k\)-histories (a.k.a \(k\)-blocks) of the time series, e.g. \(x^{(k)}_i = \{x_{i-k+1}, x_{i-k+2}, \ldots, x_i\}\).

For the sake of conciseness, when denoting probability distributions, we will only make the random variable explicit in situations where the notation is ambiguous. Generally, we will write \(p(x_i)\), \(p(x^{(k)}_i)\) and \(p(x^{(k)}_i, x_{i+1})\) to denote the empirical probability of obseriving the \(x_i\) state, the \(x^{(k)}_i\) \(k\)-history, and the joint probability of observing \((x^{(k)}_i, x_{i+1})\).

Please report any notational ambiguities as an issue.

Subtle Details¶

The library takes several liberties in the way in which the time series measures are implemented.

The Base: States and Logarithms¶

The word “base” has two different meanings in the context of the information measures on time series. It could refer to the base of the time series itself, that is the number of unique states in the time series. For example, the time series \(\{0,2,1,0,0\}\) has a base of 3. On the other handle it could refer to the base of the logarithm used in computing the information content of the emipirical distributions. The problem is that these two meanings clash. The base of the time series affects the range of values the measure can produce, and the base of the logarithm represents a rescaling of those values.

The following measures use one of two conventions. The measures of information dynamics (e.g. Active Information, Entropy Rate and Transfer Entropy) take as an argument the base of the state and use that as the base of the logarithm. The result is that the time-averaged values of those measures are in the unit range. An exception to this rule is the block entropy. It two uses this convention, but its value will not be in the unit range unless the block size \(k\) is 1 or the specified base is \(2^k\) (or you could just divide by \(k\)). The second convention is to take both the base of the time series and the base of the logarithm. This is about as unambiguous as it gets. This approach is used for the measures that do not make explicit use of a history length (or block size), e.g. Mutual Information, Conditional Entropy, etc…

Coming releases may revise the handling of the bases, but until then each function’s documentation will specify how the base is used.

Multiple Initial Conditions¶

PyInform tries to provide handling of multiple initial conditions. The “proper” way to handle initial conditions is a bit contested. One completely reasonable approach is to apply the information measures to each initial condition’s time series independently and then average. One can think of this approach as conditioning the measure on the inital condition. The second approach is to independently use all of the initial conditions to construct the various probability distributions. You can think of this approach as rolling the uncertainty of the initial condition into the measure. 1

The current implementation takes the second approach. The accpeted time series can be up to 2-D with each row representing the time series for a different initial condition. We chose to take the second approach because the “measure then average” method can still be done with the current implimentation. For an example of this, see the example section of Active Information.

Subsequent releases may provide a mechanism for specifying a how the user prefers the initial conditions to be handled, but at the moment the user has to make it happen manually.

1: There is actually at least three ways to handle multiple initial conditions, but the third method is related to the first described in the text by the addition of the entropy of the distribution over initial conditions. In this approach, the initial condition is considered as a random variable.

Active Information¶

Active information (AI) was introduced in [Lizier2012] to quantify information storage in distributed computation. Active information is defined in terms of a temporally local variant

\[a_{X,i}(k) = \log_2 \frac{p(x^{(k)}_i, x_{i+1})}{p(x^{(k)}_i)p(x_{i+1})}.\]

where the probabilities are constructed empirically from the entire time series. From the local variant, the temporally global active information as

\[A_X(k) = \langle a_{X,i}(k) \rangle_{i} = \sum_{x^{(k)}_i,\, x_{i+1}} p(x^{(k)}_i, x_{i+1}) \log_2 \frac{p(x^{(k)}_i, x_{i+1})}{p(x^{(k)}_i)p(x_{i+1})}.\]

Strictly speaking, the local and average active information are defined as

\[a_{X,i} = \lim_{k \rightarrow \infty} a_{X,i}(k) \quad \textrm{and} \quad A_X = \lim_{k \rightarrow \infty} A_X(k),\]

but we do not provide limiting functionality in this library (yet!).

Examples¶

A Single Initial Condition¶

The typical usage is to provide the time series as a sequence (or numpy.ndarray) and the history length as an integer and let the active_info() sort out the rest:

>>> active_info([0,0,1,1,1,1,0,0,0], k=2)
0.3059584928680418
>>> active_info([0,0,1,1,1,1,0,0,0], k=2, local=True)
array([[-0.19264508,  0.80735492,  0.22239242,  0.22239242, -0.36257008,
         1.22239242,  0.22239242]])

Multiple Initial Conditions¶

What about multiple initial conditions? We’ve got that covered!

>>> active_info([[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]], k=2)
0.35987902873686084
>>> active_info([[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]], k=2, local=True)
array([[ 0.80735492, -0.36257008,  0.63742992,  0.63742992, -0.77760758,
         0.80735492, -1.19264508],
       [ 0.80735492,  0.80735492,  0.22239242,  0.80735492,  0.80735492,
         0.22239242,  0.80735492]])

As mentioned in Subtle Details, averaging the AI for over the initial conditions does not give the same result as constructing the distributions using all of the initial conditions together.

>>> import numpy as np
>>> series = np.asarray([[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]])
>>> np.apply_along_axis(active_info, 1, series, 2).mean()
0.5845395307173363

Or if you are feeling verbose:

>>> ai = np.empty(len(series))
>>> for i, xs in enumerate(series):
...     ai[i] = active_info(xs, k=2)
...
>>> ai
array([0.30595849, 0.86312057])
>>> ai.mean()
0.5845395307173363

API Documentation¶

pyinform.activeinfo.active_info(series, k, local=False)[source]¶

Compute the average or local active information of a timeseries with history length k.

Parameters

series (sequence or numpy.ndarray) – the time series
k (int) – the history length
local (bool) – compute the local active information

Returns

the average or local active information

Return type

float or numpy.ndarray

Raises

ValueError – if the time series has no initial conditions
ValueError – if the time series is greater than 2-D
InformError – if an error occurs within the inform C call

Block Entropy¶

Block entropy, also known as N-gram entropy [Shannon1948], is the the standard Shannon entropy applied to the time series (or sequence) of \(k\)-histories of a time series (or sequence):

\[H(X^{(k)}) = -\sum_{x^{(k)}_i} p(x^{(k)}_i) \log_2 p(x^{(k)}_i)\]

which of course reduces to the traditional Shannon entropy for k == 1. Much as with Active Information, the ideal usage is to take \(k \rightarrow \infty\).

Examples¶

A Single Initial Condition¶

The typical usage is to provide the time series as a sequence (or numpy.ndarray) and the block size as an integer and let the block_entropy() sort out the rest:

>>> block_entropy([0,0,1,1,1,1,0,0,0], k=1)
0.9910760598382222
>>> block_entropy([0,0,1,1,1,1,0,0,0], k=1, local=True)
array([[0.84799691, 0.84799691, 1.169925  , 1.169925  , 1.169925  ,
        1.169925  , 0.84799691, 0.84799691, 0.84799691]])

>>> block_entropy([0,0,1,1,1,1,0,0,0], k=2)
1.811278124459133
>>> block_entropy([0,0,1,1,1,1,0,0,0], k=2, local=True)
array([[1.4150375, 3.       , 1.4150375, 1.4150375, 1.4150375, 3.       ,
        1.4150375, 1.4150375]])

Multiple Initial Conditions¶

Do we support multiple initial conditions? Of course we do!

>>> series = [[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]]
>>> block_entropy(series, k=2)
1.936278124459133
>>> block_entropy(series, k=2, local=True)
array([[1.4150375, 2.4150375, 2.4150375, 2.4150375, 2.4150375, 2.       ,
        1.4150375, 1.4150375],
       [2.       , 1.4150375, 2.4150375, 2.       , 1.4150375, 2.4150375,
        2.       , 1.4150375]])

Or you can compute the block entropy on each initial condition and average:

>>> import numpy as np
>>> np.apply_along_axis(block_entropy, 1, series, 2).mean()
1.686278124459133

API Documentation¶

pyinform.blockentropy.block_entropy(series, k, local=False)[source]¶

Compute the (local) block entropy of a time series with block size k.

Parameters

series (sequence or numpy.ndarray) – the time series
k (int) – the block size
local (bool) – compute the local block entropy

Returns

the average or local block entropy

Return type

float or numpy.ndarray

Raises

ValueError – if the time series has no initial conditions
ValueError – if the time series is greater than 2-D
InformError – if an error occurs within the inform C call

Conditional Entropy¶

Conditional entropy is a measure of the amount of information required to describe a random variable \(Y\) given knowledge of another random variable \(X\). When applied to time series, two time series are used to construct the empirical distributions and then conditional_entropy() can be applied to yield

\[H(Y|X) = -\sum_{x_i, y_i} p(x_i, y_i) \log_2 \frac{p(x_i, y_i)}{p(x_i)}.\]

This can be viewed as the time-average of the local conditional entropy

\[h_{i}(Y|X) = -\log_2 \frac{p(x_i, y_i)}{p(x_i)}.\]

See [Cover1991] for more information.

Examples¶

>>> xs = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1]
>>> ys = [0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1]
>>> conditional_entropy(xs,ys)      # H(Y|X)
0.5971071794515037
>>> conditional_entropy(ys,xs)      # H(X|Y)
0.5077571498797332
>>> conditional_entropy(xs, ys, local=True)
array([3.        , 3.        , 0.19264508, 0.19264508, 0.19264508,
       0.19264508, 0.19264508, 0.19264508, 0.19264508, 0.19264508,
       0.19264508, 0.19264508, 0.19264508, 0.19264508, 0.19264508,
       0.19264508, 0.4150375 , 0.4150375 , 0.4150375 , 2.        ])
>>> conditional_entropy(ys, xs, local=True)
array([1.32192809, 1.32192809, 0.09953567, 0.09953567, 0.09953567,
       0.09953567, 0.09953567, 0.09953567, 0.09953567, 0.09953567,
       0.09953567, 0.09953567, 0.09953567, 0.09953567, 0.09953567,
       0.09953567, 0.73696559, 0.73696559, 0.73696559, 3.9068906 ])

API Documentation¶

pyinform.conditionalentropy.conditional_entropy(xs, ys, local=False)[source]¶

Compute the (local) conditional entropy between two time series.

This function expects the condition to be the first argument.

Parameters

xs (a sequence or numpy.ndarray) – the time series drawn from the conditional distribution
ys (a sequence or numpy.ndarray) – the time series drawn from the target distribution
local (bool) – compute the local conditional entropy

Returns

the local or average conditional entropy

Return type

float or numpy.ndarray

Raises

ValueError – if the time series have different shapes
InformError – if an error occurs within the inform C call

Entropy Rate¶

Entropy rate (ER) quantifies the amount of information needed to describe the \(X\) given observations of \(X^{(k)}\). In other words, it is the entropy of the time series conditioned on the \(k\)-histories. The local entropy rate

\[h_{X,i}(k) = \log_2 \frac{p(x^{(k)}_i, x_{i+1})}{p(x^{(k)}_i)}\]

can be averaged to obtain the global entropy rate

\[H_X(k) = \langle h_{X,i}(k) \rangle_{i} = \sum_{x^{(k)}_i,\, x_{i+1}} p(x^{(k)}_i, x_{i+1}) \log_2 \frac{p(x^{(k)}_i, x_{i+1})}{p(x^{(k)}_i)}.\]

Much as with Active Information, the local and average entropy rates are formally obtained in the limit

\[h_{X,i} = \lim_{k \rightarrow \infty} h_{X,i}(k) \quad \textrm{and} \quad H_X = \lim_{k \rightarrow \infty} H_X(k),\]

but we do not provide limiting functionality in this library (yet!).

See [Cover1991] for more details.

Examples¶

A Single Initial Condition¶

Let’s apply the entropy rate to a single initial condition. Typically, you will just provide the time series and the history length, and let entropy_rate() take care of the rest:

>>> entropy_rate([0,0,1,1,1,1,0,0,0], k=2)
0.6792696431662095
>>> entropy_rate([0,0,1,1,1,1,0,0,0], k=2, local=True)
array([[1.       , 0.       , 0.5849625, 0.5849625, 1.5849625, 0.       ,
        1.       ]])
>>> entropy_rate([0,0,1,1,1,1,2,2,2], k=2)
0.39355535745192416

Multiple Initial Conditions¶

Of course multiple initial conditions are handled.

>>> series = [[0,0,1,1,1,1,0,0,0], [1,0,0,1,0,0,1,0,0]]
>>> entropy_rate(series, k=2)
0.6253491072973907
>>> entropy_rate(series, k=2, local=True)
array([[0.4150375, 1.5849625, 0.5849625, 0.5849625, 1.5849625, 0.       ,
        2.       ],
       [0.       , 0.4150375, 0.5849625, 0.       , 0.4150375, 0.5849625,
        0.       ]])

API Documentation¶

pyinform.entropyrate.entropy_rate(series, k, local=False)[source]¶

Compute the average or local entropy rate of a time series with history length k.

Parameters

series (sequence or numpy.ndarray) – the time series
k (int) – the history length
local (bool) – compute the local active information

Returns

the average or local entropy rate

Return type

float or numpy.ndarray

Raises

ValueError – if the time series has no initial conditions
ValueError – if the time series is greater than 2-D
InformError – if an error occurs within the inform C call

Mutual Information¶

Mutual information (MI) is a measure of the amount of mutual dependence between two random variables. When applied to time series, two time series are used to construct the empirical distributions and then mutual_info() can be applied. Locally MI is defined as

\[i_{i}(X,Y) = -\log_2 \frac{p(x_i, y_i)}{p(x_i)p(y_i)}.\]

The mutual information is then just the time average of \(i_{i}(X,Y)\).

\[I(X,Y) = -\sum_{x_i, y_i} p(x_i, y_i) \log_2 \frac{p(x_i, y_i)}{p(x_i)p(y_i)}.\]

See [Cover1991] for more details.

Examples¶

>>> xs = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1]
>>> ys = [0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1]
>>> mutual_info(xs, ys)
0.21417094500762912
>>> mutual_info(xs, ys, local=True)
array([-1.        , -1.        ,  0.22239242,  0.22239242,  0.22239242,
        0.22239242,  0.22239242,  0.22239242,  0.22239242,  0.22239242,
        0.22239242,  0.22239242,  0.22239242,  0.22239242,  0.22239242,
        0.22239242,  1.5849625 ,  1.5849625 ,  1.5849625 , -1.5849625 ])

API Documentation¶

pyinform.mutualinfo.mutual_info(xs, ys, local=False)[source]¶

Compute the (local) mutual information between two time series.

This function explicitly takes the logarithmic base b as an argument.

Parameters

xs (a sequence or numpy.ndarray) – a time series
ys (a sequence or numpy.ndarray) – a time series
local (bool) – compute the local mutual information

Returns

the local or average mutual information

Return type

float or numpy.ndarray

Raises

ValueError – if the time series have different shapes
InformError – if an error occurs within the inform C call

Relative Entropy¶

Relative entropy, also known as the Kullback-Leibler divergence, measures the amount of information gained in switching from a prior \(q_X\) to a posterior distribution \(p_X\) over the same support. That is \(q_X\) and \(P\) represent hypotheses of the distribution of some random variable \(X.\) Time series data sampled from the posterior and prior can be used to estiamte those distributions, and the relative entropy can the be computed via a call to relative_entropy(). The result is

\[D_{KL}(p||q) = \sum_{x_i} p(x_i) \log_2 \frac{p(x_i)}{q(x_i)}\]

which has as its local counterpart

\[d_{KL, i}(p||q) = \log_2 \frac{p(x_i)}{q(x_i)}.\]

Note that the average in moving from the local to the non-local relative entropy is taken over the posterior distribution.

See [Kullback1951] and [Cover1991] for more information.

Examples¶

>>> xs = [0,1,0,0,0,0,0,0,0,1]
>>> ys = [0,1,1,1,1,0,0,1,0,0]
>>> relative_entropy(xs, ys)
0.27807190511263774
>>> relative_entropy(ys, xs)
0.3219280948873624

>>> xs = [0,0,0,0]
>>> ys = [0,1,1,0]
>>> relative_entropy(xs, ys)
1.0
>>> relative_entropy(ys, xs)
nan

API Documentation¶

pyinform.relativeentropy.relative_entropy(xs, ys, local=False)[source]¶

Compute the local or global relative entropy between two time series treating each as observations from a distribution.

Parameters

xs (a sequence or numpy.ndarray) – the time series sampled from the posterior distribution
ys (a sequence or numpy.ndarray) – the time series sampled from the prior distribution
local (bool) – compute the local relative entropy

Returns

the local or global relative entropy

Return type

float or numpy.ndarray

Raises

ValueError – if the time series have different shapes
InformError – if an error occurs within the inform C call

Transfer Entropy¶

Transfer entropy (TE) was introduced by [Schreiber2000] to quantify information transfer between an information source and destination, conditioning out shared history effects. TE was originally formulated considering only the source and destination; however, many systems of interest have more than just those two components. As such, it may be necessary to condition the probabilities on the states of all “background” components in the system. These two forms are sometimes called _apparent_ and _complete_ transfer entropy, respectively ([Lizier2008]).

This implementation of TE allows the user to condition the probabilities on any number of background processes, within hardware limits of course. For the subsequent description, take \(X\) to be the source, \(Y\) the target, and \(\mathcal{W}=\left\{W_1, \ldots, W_l\right\}\) to be the background processes against which we’d like to condition. For example, we might take the state of two nodes in a dynamical network as the source and target, while all other nodes in the network are treated as the background. Transfer entropy is then defined in terms of a time-local variant:

\[t_{X \rightarrow Y,\mathcal{W},i}(k) = \log_2{\frac{p(y_{i+1}, x_i~|~y^{(k)}_i, W_{\{1,i\}},\ldots,W_{\{l,i\}})}{p(y_{i+1}~|~y^{(k)}_i, W_{\{1,i\}},\ldots,W_{\{l,i\}})p(x_i~|~y^{(k)}_i,W_{\{1,i\}},\ldots,W_{\{l,i\}})}}\]

Averaging in time we have

\[T_{Y \rightarrow X,\mathcal{W}}(k) = \langle t_{X \rightarrow Y,\mathcal{W},i}(k) \rangle_i\]

As in the case of Active Information and Entropy Rate, the transfer entropy is formally defined as the limit of the \(k\)-history transfer entropy as \(k \rightarrow \infty\):

\[t_{Y \rightarrow X,\mathcal{W},i} = \lim_{k \rightarrow \infty} t_{Y \rightarrow X,\mathcal{W},i}(k) \quad \textrm{and} \quad T_{Y \rightarrow X,\mathcal{W}} = \lim_{k \rightarrow \infty} T_{Y \rightarrow X,\mathcal{W}}(k),\]

but we do not provide limiting functionality in this library (yet!).

See [Schreiber2000], [Kraiser2002] and [Lizier2008] for more details.

Examples¶

One initial condition, no background¶

Just give us a couple of time series and tell us the history length and we’ll give you a number

>>> xs = [0,1,1,1,1,0,0,0,0]
>>> ys = [0,0,1,1,1,1,0,0,0]
>>> transfer_entropy(xs, ys, k=2)
0.6792696431662097
>>> transfer_entropy(ys, xs, k=2)
0.0

or an array if you ask for it

>>> transfer_entropy(xs, ys, k=2, local=True)
array([[1.       , 0.       , 0.5849625, 0.5849625, 1.5849625, 0.       ,
        1.       ]])
>>> transfer_entropy(ys, xs, k=2, local=True)
array([[0., 0., 0., 0., 0., 0., 0.]])

Two initial conditions, no background¶

Uhm, yes we can! (Did you really expect anything less?)

>>> xs = [[1,0,0,0,0,1,1,1,1], [1,1,1,1,0,0,0,1,1]]
>>> ys = [[0,0,1,1,1,1,0,0,0], [1,0,0,0,0,1,1,1,0]]
>>> transfer_entropy(xs, ys, k=2)
0.693536138896192
>>> transfer_entropy(xs, ys, k=2, local=True)
array([[1.32192809, 0.        , 0.73696559, 0.73696559, 1.32192809,
        0.        , 0.73696559],
       [0.        , 0.73696559, 0.73696559, 1.32192809, 0.        ,
        0.73696559, 1.32192809]])

One initial condition, one background process¶

>>> xs = [0,1,1,1,1,0,0,0,0]
>>> ys = [0,0,1,1,1,1,0,0,0]
>>> ws = [0,1,1,1,1,0,1,1,1]
>>> transfer_entropy(xs, ys, k=2, condition=ws)
0.2857142857142857
>>> transfer_entropy(xs, ys, k=2, condition=ws, local=True)
array([[1., 0., 0., 0., 0., 0., 1.]])

One initial condition, two background processes¶

>>> xs = [0,1,1,1,1,0,0,0,0]
>>> ys = [0,0,1,1,1,1,0,0,0]
>>> ws = [[1,0,1,0,1,1,1,1,1], [1,1,0,1,0,1,1,1,1]]
>>> transfer_entropy(xs, ys, k=2, condition=ws)
0.0
>>> transfer_entropy(xs, ys, k=2, condition=ws, local=True)
array([[0., 0., 0., 0., 0., 0., 0.]])

Two initial conditions, two background processes¶

>>> xs = [[1,1,0,1,0,1,1,0,0],[0,1,0,1,1,1,0,0,1]]
>>> ys = [[1,1,1,0,1,1,1,0,0],[0,0,1,0,1,1,1,0,0]]
>>> ws = [[[1,1,0,1,1,0,1,0,1],[1,1,1,0,1,1,1,1,0]],
...       [[1,1,1,1,0,0,0,0,1],[0,0,0,1,1,1,1,0,1]]]
>>> transfer_entropy(xs, ys, k=2)
0.5364125003090668
>>> transfer_entropy(xs, ys, k=2, condition=ws)
0.3396348215831049
>>> transfer_entropy(xs, ys, k=2, condition=ws, local=True)
array([[ 1.       ,  0.       ,  0.       , -0.4150375,  0.       ,
         0.       ,  1.       ],
       [ 0.       ,  0.5849625,  1.       ,  0.5849625,  0.       ,
         1.       ,  0.       ]])

API Documentation¶

pyinform.transferentropy.transfer_entropy(source, target, k, condition=None, local=False)[source]¶

Compute the local or average transfer entropy from one time series to another with target history length k. Optionally, time series can be provided against which to condition.

Parameters

source (sequence or numpy.ndarray) – the source time series
target (sequence or numpy.ndarray) – the target time series
k (int) – the history length
condition (sequence or numpy.ndarray) – time series of any conditions
local (bool) – compute the local transfer entropy

Returns

the average or local transfer entropy

Return type

float or numpy.ndarray

Raises

ValueError – if the time series have different shapes
ValueError – if either time series has no initial conditions
ValueError – if either time series is greater than 2-D
InformError – if an error occurs within the inform C call

References¶

Cover1991(1,2,3,4)

T.M. Cover amd J.A. Thomas (1991). “Elements of information theory” (1st ed.). New York: Wiley. ISBN 0-471-06259-6.

Kraiser2002

Kaiser, T. Schreiber, “Information transfer in continuous processes”, Physica D: Nonlinear Phenomena, Volume 166, Issues 1–2, 1 June 2002, Pages 43-62, ISSN 0167-2789

Kullback1951: Kullback, S.; Leibler, R.A. (1951). “On information and sufficiency”. Annals of Mathematical Statistics. 22 (1): 79-86. doi:10.1214/aoms/1177729694. MR 39968.

Lizier2008(1,2): J.T. Lizier M. Prokopenko and A. Zomaya, “Local information transfer as a spatiotemporal filter for complex systems”, Phys. Rev. E 77, 026110, 2008.

Lizier2012: J.T. Lizier, M. Prokopenko and A.Y. Zomaya, “Local measures of information storage in complex distributed computation” Information Sciences, vol. 208, pp. 39-54, 2012.

Schreiber2000(1,2)

Schreiber, “Measuring information transfer”, Phys.Rev.Lett. 85 (2) pp.461-464, 2000.

Shannon1948: Shannon, Claude E. (July-October 1948). “A Mathematical Theory of Communication”. Bell System Technical Journal. 27 (3): 379-423. doi:10.1002/j.1538-7305.1948.tb01448.x.