Utilities¶
State Binning¶
All of the currently implemented time series measures are only defined on discretelyvalued time series. However, in practice continuouslyvalued time series are ubiquitous. There are two approaches to accomodating continuous values.
The simplest is to bin the time series, forcing the values into discrete states. This method has its downsides, namely that the binning is often a bit unphysical and it can introduce bias. What’s more, without some kind of guiding principle it can be difficult to decide exactly which binning approach.
The second approach attempts to infer condinuous probability distributions from continuous data. This is potentially more robust, but more technically difficult. Unfortunately, PyInform does not yet have an implementation of information measures on continous distributions.
This module (pyinform.utils.binning
) provides a basic binning facility
via the bin_series()
function.

pyinform.utils.binning.
series_range
(series)[source]¶ Compute the range of a continuouslyvalued time series.
Examples:
>>> from pyinform import utils >>> utils.series_range([0,1,2,3,4,5]) (5, 0, 5) >>> utils.series_range([0.1, 8.5, 0.02, 6.3]) (14.8, 6.3, 8.5)
Parameters: series (sequence) – the time series Returns: the range and the minimum/maximum values Return type: 3tuple (float, float, float) Raises: InformError – if an error occurs within the inform
C call

pyinform.utils.binning.
bin_series
(series, b=None, step=None, bounds=None)[source]¶ Bin a continouslyvalued times series.
The binning can be performed in any one of three ways.
1. Specified Number of Bins
The first is binning the time series into b uniform bins (with b an integer).
>>> from pyinform import utils >>> import numpy as np >>> xs = 10 * np.random.rand(20) >>> xs array([ 6.62004974, 7.24471972, 0.76670198, 2.66306833, 4.32200795, 8.84902227, 6.83491844, 7.05008074, 3.79287646, 6.50844032, 8.68804879, 6.79543773, 0.3222078 , 7.39576325, 7.54150189, 1.06422897, 1.91958431, 2.34760945, 3.90139184, 3.08885353]) >>> utils.bin_series(xs, b=2) (array([1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0], dtype=int32), 2, 4.263407236635026) >>> utils.bin_series(xs, b=3) (array([2, 2, 0, 0, 1, 2, 2, 2, 1, 2, 2, 2, 0, 2, 2, 0, 0, 0, 1, 0], dtype=int32), 3, 2.8422714910900173)
With this approach the binned sequence (as an
numpy.ndarray
), the number of bins, and the size of each bin are returned.This binning method is useful if, for example, the user wants to bin several time series to the same base.
2. Fixed Size Bins
The second type of binning produces bins of a specific size step.:
>>> utils.bin_series(xs, step=4.0) (array([1, 1, 0, 0, 0, 2, 1, 1, 0, 1, 2, 1, 0, 1, 1, 0, 0, 0, 0, 0], dtype=int32), 3, 4.0) >>> utils.bin_series(xs, step=2.0) (array([3, 3, 0, 1, 1, 4, 3, 3, 1, 3, 4, 3, 0, 3, 3, 0, 0, 1, 1, 1], dtype=int32), 5, 2.0)
As in the previous case the binned sequence, the number of bins, and the size of each bin are returned.
This approach is appropriate when the system at hand has a particular sensitivity or precision, e.g. if the system is sensitive down to 5.0mV changes in potential.
3. Thresholds
The third type of binning is breaks the real number line into segments with specified boundaries or thresholds, and the time series is binned according to this partitioning. The bounds are expected to be provided in ascending order.:
>>> utils.bin_series(xs, bounds=[2.0, 7.5]) (array([1, 1, 0, 1, 1, 2, 1, 1, 1, 1, 2, 1, 0, 1, 2, 0, 0, 1, 1, 1], dtype=int32), 3, [2.0, 7.5]) >>> utils.bin_series(xs, bounds=[2.0]) (array([1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1], dtype=int32), 2, [2.0])
Unlike the previous two types of binning, this approach returns the specific bounds rather than the bin sizes. The other two returns, the binned sequence and the number of bins, are returned as before.
This approach is useful in situations where the system has natural thesholds, e.g. the polarized/hyperpolarized states of a neuron.
Parameters:  series (sequence) – the continuouslyvalued time series
 b (int) – the desired number of uniform bins
 step (float) – the desired size of each uniform bin
 bounds (sequence) – the (finite) bounds of each bin
Returns: the binned sequence, the number of bins and either the bin sizes or bin bounds
Return type: either (
numpy.ndarray
, int, float) or (numpy.ndarray
, int, sequence)Raises:  ValueError – if no keyword argument is provided
 ValueError – if more than one keyword argument is provided
 InformError – if an error occurs in the
inform
C call
State Coalescing¶

pyinform.utils.coalesce.
coalesce_series
(series)[source]¶ Coalesce a timeseries into as few contiguous states as possible.
The magic of information measures is that the actual values of a time series are irrelavent. For example, \(\{0,1,0,1,1\}\) has the same entropy as \(\{2,9,2,9,9\}\) (possibly up to a rescaling). This give us the freedom to shift around the values of a time series as long as we do not change the relative number of states.
This function thus provides a way of “compressing” a time series into as small a base as possible. For example
>>> utils.coalesce_series([2,9,2,9,9]) (array([0, 1, 0, 1, 1], dtype=int32), 2)
Why is this useful? Many of the measures use the base of the time series to determine how much memory to allocate; the larger the base, the higher the memory usage. It also affects the overall performance as the combinatorics climb exponentially with the base.
The two standard usage cases for this function are to reduce the base of a time series
>>> utils.coalesce_series([0,2,0,2,0,2]) (array([0, 1, 0, 1, 0, 1], dtype=int32), 2)
or ensure that the states are nonnegative
>>> utils.coalesce_series([8,2,6,2,4]) (array([0, 2, 4, 1, 3], dtype=int32), 5)
Notice that the encoding that is used ensures that the ordering of the states stays the same, e.g. \(\{8 \rightarrow 0, 2 \rightarrow 1, 2 \rightarrow 2, 4 \rightarrow 3, 6 \rightarrow 4\}\). This isn’t strictly necessary, so we are going to call it a “feature”.
Parameters: series (sequence) – the time series to coalesce Returns: the coalesced time series and its base Return type: the 2tuple ( numpy.ndarray
, int)Raises: InformError – if an error occurs in the inform
C call
State Encoding¶
State encoding is a necessity when complex time series are being analyzed. For
example, \(k\)history must be encoded as an integer in order to “observe”
it using a Dist
. What if you are interested in correlating
the aggragate state of one group of nodes with that of another? You’d need to
encode the groups’ states as integers. This module
(pyinform.utils.encoding
)provides such functionality.
Attention
As a practical matter, these utility functions should only be used as a stopgap while a solution for your problem is implemented in the core Inform library. “Why?” you ask? Well, these functions are about as efficient as they can be for oneoff state encodings, but most of the time you are interested in encoding sequences of states. This can be done much more efficiently if you encode the entire sequence at once. You need domainspecific information to make that happen.
This being said, these functions aren’t bad just be aware that they may turn into a bottleneck in whatever you are implementing.

pyinform.utils.encoding.
encode
(state, b=None)[source]¶ Encode a baseb array of integers into a single integer.
This function uses a bigendian encoding scheme. That is, the most significant bits of the encoded integer are determined by the leftmost end of the unencoded state.
>>> from pyinform.utils import * >>> encode([0,0,1], b=2) 1 >>> encode([0,1,0], b=3) 3 >>> encode([1,0,0], b=4) 16 >>> encode([1,0,4], b=5) 29
If b is not provided (or is None), the base is inferred from the state with a minimum value of 2.
>>> from pyinform.utils import * >>> encode([0,0,2]) 2 >>> encode([0,2,0]) 6 >>> encode([1,2,1]) 16
See also
decode()
.Parameters:  state (sequence) – the state to encode
 b (int) – the base in which to encode
Returns: the encoded state
Return type: int
Raises:  ValueError – if the state is empty
 InformError – if an error occurs in the
inform
C call

pyinform.utils.encoding.
decode
(encoding, b, n=None)[source]¶ Decode an integer into a baseb array with n digits.
The provided encoded state is decoded using the bigendian encoding scheme.
>>> decode(2, b=2, n=2) array([1, 0], dtype=int32) >>> decode(6, b=2, n=3) array([1, 1, 0], dtype=int32) >>> decode(6, b=3, n=2) array([2, 0], dtype=int32)
Note that the base b must be provided, but the number of digits n is optional. If it is provided then the decoded state will have exactly that many elements.
>>> decode(2, b=2, n=4) array([0, 0, 1, 0], dtype=int32)
However, if n is too small to contain a full representation of the state, an error will be raised.
>>> decode(6, b=2, n=2) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ubuntu/workspace/pyinform/utils/encoding.py", line 126, in decode error_guard(e) File "/home/ubuntu/workspace/pyinform/error.py", line 57, in error_guard raise InformError(e,func) pyinform.error.InformError: an inform error occurred  "encoding/decoding failed"
If n is not provided, the length of the decoded state is as small as possible:
>>> decode(1, b=2) array([1], dtype=int32) >>> decode(1, b=3) array([1], dtype=int32) >>> decode(3, b=2) array([1, 1], dtype=int32) >>> decode(3, b=3) array([1, 0], dtype=int32) >>> decode(3, b=4) array([3], dtype=int32)
Of course
encode()
anddecode()
play well together.>>> for i in range(100): ... assert(encode(decode(i, b=2)) == i) ... >>>
See also
encode()
.Parameters:  encoding (int) – the encoded state
 b (int) – the desired base
 n (int) – the desired number of digits
Returns: the decoded state
Return type: numpy.ndarray
Raises:  InformError – if n is too small to contain the decoding
 InformError – if an error occurs within the
inform
C call