Utilities¶

State Binning¶

All of the currently implemented time series measures are only defined on discretely-valued time series. However, in practice continuously-valued time series are ubiquitous. There are two approaches to accomodating continuous values.

The simplest is to bin the time series, forcing the values into discrete states. This method has its downsides, namely that the binning is often a bit unphysical and it can introduce bias. What’s more, without some kind of guiding principle it can be difficult to decide exactly which binning approach.

The second approach attempts to infer condinuous probability distributions from continuous data. This is potentially more robust, but more technically difficult. Unfortunately, PyInform does not yet have an implementation of information measures on continous distributions.

This module (pyinform.utils.binning) provides a basic binning facility via the bin_series() function.

pyinform.utils.binning.series_range(series)[source]¶

Compute the range of a continuously-valued time series.

Examples:

>>> utils.series_range([0,1,2,3,4,5])
(5.0, 0.0, 5.0)
>>> utils.series_range([-0.1, 8.5, 0.02, -6.3])
(14.8, -6.3, 8.5)

Parameters: series (sequence) – the time series
Returns: the range and the minimum/maximum values
Return type: 3-tuple (float, float, float)
Raises: InformError – if an error occurs within the inform C call

pyinform.utils.binning.bin_series(series, b=None, step=None, bounds=None)[source]¶

Bin a continously-valued times series.

The binning can be performed in any one of three ways.

1. Specified Number of Bins

The first is binning the time series into b uniform bins (with b an integer).

>>> import numpy as np
>>> np.random.seed(2019)
>>> xs = 10 * np.random.rand(20)
>>> xs
array([9.03482214, 3.93080507, 6.23969961, 6.37877401, 8.80499069,
       2.99172019, 7.0219827 , 9.03206161, 8.81381926, 4.05749798,
       4.52446621, 2.67070324, 1.6286487 , 8.89214695, 1.48476226,
       9.84723485, 0.32361219, 5.15350754, 2.01129047, 8.86010874])
>>> utils.bin_series(xs, b=2)
(array([1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1],
      dtype=int32), 2, 4.761811327822174)
>>> utils.bin_series(xs, b=3)
(array([2, 1, 1, 1, 2, 0, 2, 2, 2, 1, 1, 0, 0, 2, 0, 2, 0, 1, 0, 2],
      dtype=int32), 3, 3.1745408852147823)

With this approach the binned sequence (as an numpy.ndarray), the number of bins, and the size of each bin are returned.

This binning method is useful if, for example, the user wants to bin several time series to the same base.

2. Fixed Size Bins

The second type of binning produces bins of a specific size step.

>>> utils.bin_series(xs, step=4.0)
(array([2, 0, 1, 1, 2, 0, 1, 2, 2, 0, 1, 0, 0, 2, 0, 2, 0, 1, 0, 2],
      dtype=int32), 3, 4.0)
>>> utils.bin_series(xs, step=2.0)
(array([4, 1, 2, 3, 4, 1, 3, 4, 4, 1, 2, 1, 0, 4, 0, 4, 0, 2, 0, 4],
      dtype=int32), 5, 2.0)

As in the previous case the binned sequence, the number of bins, and the size of each bin are returned.

This approach is appropriate when the system at hand has a particular sensitivity or precision, e.g. if the system is sensitive down to 5.0mV changes in potential.

3. Thresholds

The third type of binning is breaks the real number line into segments with specified boundaries or thresholds, and the time series is binned according to this partitioning. The bounds are expected to be provided in ascending order.

>>> utils.bin_series(xs, bounds=[2.0, 7.5])
(array([2, 1, 1, 1, 2, 1, 1, 2, 2, 1, 1, 1, 0, 2, 0, 2, 0, 1, 1, 2],
      dtype=int32), 3, [2.0, 7.5])
>>> utils.bin_series(xs, bounds=[2.0])
(array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1],
      dtype=int32), 2, [2.0])

Unlike the previous two types of binning, this approach returns the specific bounds rather than the bin sizes. The other two returns, the binned sequence and the number of bins, are returned as before.

This approach is useful in situations where the system has natural thesholds, e.g. the polarized/hyperpolarized states of a neuron.

Parameters

series (sequence) – the continuously-valued time series
b (int) – the desired number of uniform bins
step (float) – the desired size of each uniform bin
bounds (sequence) – the (finite) bounds of each bin

Returns

the binned sequence, the number of bins and either the bin sizes or bin bounds

Return type

either (numpy.ndarray, int, float) or (numpy.ndarray, int, sequence)

Raises

ValueError – if no keyword argument is provided
ValueError – if more than one keyword argument is provided
InformError – if an error occurs in the inform C call

State Coalescing¶

pyinform.utils.coalesce.coalesce_series(series)[source]¶

Coalesce a timeseries into as few contiguous states as possible.

The magic of information measures is that the actual values of a time series are irrelavent. For example, \(\{0,1,0,1,1\}\) has the same entropy as \(\{2,9,2,9,9\}\) (possibly up to a rescaling). This give us the freedom to shift around the values of a time series as long as we do not change the relative number of states.

This function thus provides a way of “compressing” a time series into as small a base as possible. For example

>>> utils.coalesce_series([2,9,2,9,9])
(array([0, 1, 0, 1, 1], dtype=int32), 2)

Why is this useful? Many of the measures use the base of the time series to determine how much memory to allocate; the larger the base, the higher the memory usage. It also affects the overall performance as the combinatorics climb exponentially with the base.

The two standard usage cases for this function are to reduce the base of a time series

>>> utils.coalesce_series([0,2,0,2,0,2])
(array([0, 1, 0, 1, 0, 1], dtype=int32), 2)

or ensure that the states are non-negative

>>> utils.coalesce_series([-8,2,6,-2,4])
(array([0, 2, 4, 1, 3], dtype=int32), 5)

Notice that the encoding that is used ensures that the ordering of the states stays the same, e.g. \(\{-8 \rightarrow 0, -2 \rightarrow 1, 2 \rightarrow 2, 4 \rightarrow 3, 6 \rightarrow 4\}\). This isn’t strictly necessary, so we are going to call it a “feature”.

Parameters: series (sequence) – the time series to coalesce
Returns: the coalesced time series and its base
Return type: the 2-tuple (numpy.ndarray, int)
Raises: InformError – if an error occurs in the inform C call

State Encoding¶

State encoding is a necessity when complex time series are being analyzed. For example, \(k\)-history must be encoded as an integer in order to “observe” it using a Dist. What if you are interested in correlating the aggragate state of one group of nodes with that of another? You’d need to encode the groups’ states as integers. This module (pyinform.utils.encoding)provides such functionality.

Attention

As a practical matter, these utility functions should only be used as a stop-gap while a solution for your problem is implemented in the core Inform library. “Why?” you ask? Well, these functions are about as efficient as they can be for one-off state encodings, but most of the time you are interested in encoding sequences of states. This can be done much more efficiently if you encode the entire sequence at once. You need domain-specific information to make that happen.

This being said, these functions aren’t bad just be aware that they may turn into a bottleneck in whatever you are implementing.

pyinform.utils.encoding.encode(state, b=None)[source]¶

Encode a base-b array of integers into a single integer.

This function uses a big-endian encoding scheme. That is, the most significant bits of the encoded integer are determined by the left-most end of the unencoded state.

>>> utils.encode([0,0,1], b=2)
1
>>> utils.encode([0,1,0], b=3)
3
>>> utils.encode([1,0,0], b=4)
16
>>> utils.encode([1,0,4], b=5)
29

If b is not provided (or is None), the base is inferred from the state with a minimum value of 2.

>>> utils.encode([0,0,2])
2
>>> utils.encode([0,2,0])
6
>>> utils.encode([1,2,1])
16