Utilities

State Binning

All of the currently implemented time series measures are only defined on discretely-valued time series. However, in practice continuously-valued time series are ubiquitous. There are two approaches to accomodating continuous values.

The simplest is to bin the time series, forcing the values into discrete states. This method has its downsides, namely that the binning is often a bit unphysical and it can introduce bias. What’s more, without some kind of guiding principle it can be difficult to decide exactly which binning approach.

The second approach attempts to infer condinuous probability distributions from continuous data. This is potentially more robust, but more technically difficult. Unfortunately, PyInform does not yet have an implementation of information measures on continous distributions.

This module (pyinform.utils.binning) provides a basic binning facility via the bin_series() function.

pyinform.utils.binning.series_range(series)[source]

Compute the range of a continuously-valued time series.

Examples:

>>> from pyinform import utils
>>> utils.series_range([0,1,2,3,4,5])
(5, 0, 5)
>>> utils.series_range([-0.1, 8.5, 0.02, -6.3])
(14.8, -6.3, 8.5)
Parameters:series (sequence) – the time series
Returns:the range and the minimum/maximum values
Return type:3-tuple (float, float, float)
Raises:InformError – if an error occurs within the inform C call
pyinform.utils.binning.bin_series(series, b=None, step=None, bounds=None)[source]

Bin a continously-valued times series.

The binning can be performed in any one of three ways.

1. Specified Number of Bins

The first is binning the time series into b uniform bins (with b an integer).

>>> from pyinform import utils
>>> import numpy as np
>>> xs = 10 * np.random.rand(20)
>>> xs
array([ 6.62004974,  7.24471972,  0.76670198,  2.66306833,  4.32200795,
        8.84902227,  6.83491844,  7.05008074,  3.79287646,  6.50844032,
        8.68804879,  6.79543773,  0.3222078 ,  7.39576325,  7.54150189,
        1.06422897,  1.91958431,  2.34760945,  3.90139184,  3.08885353])
>>> utils.bin_series(xs, b=2)
(array([1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0], dtype=int32), 2, 4.263407236635026)
>>> utils.bin_series(xs, b=3)
(array([2, 2, 0, 0, 1, 2, 2, 2, 1, 2, 2, 2, 0, 2, 2, 0, 0, 0, 1, 0], dtype=int32), 3, 2.8422714910900173)

With this approach the binned sequence (as an numpy.ndarray), the number of bins, and the size of each bin are returned.

This binning method is useful if, for example, the user wants to bin several time series to the same base.

2. Fixed Size Bins

The second type of binning produces bins of a specific size step.:

>>> utils.bin_series(xs, step=4.0)
(array([1, 1, 0, 0, 0, 2, 1, 1, 0, 1, 2, 1, 0, 1, 1, 0, 0, 0, 0, 0], dtype=int32), 3, 4.0)
>>> utils.bin_series(xs, step=2.0)
(array([3, 3, 0, 1, 1, 4, 3, 3, 1, 3, 4, 3, 0, 3, 3, 0, 0, 1, 1, 1], dtype=int32), 5, 2.0)

As in the previous case the binned sequence, the number of bins, and the size of each bin are returned.

This approach is appropriate when the system at hand has a particular sensitivity or precision, e.g. if the system is sensitive down to 5.0mV changes in potential.

3. Thresholds

The third type of binning is breaks the real number line into segments with specified boundaries or thresholds, and the time series is binned according to this partitioning. The bounds are expected to be provided in ascending order.:

>>> utils.bin_series(xs, bounds=[2.0, 7.5])
(array([1, 1, 0, 1, 1, 2, 1, 1, 1, 1, 2, 1, 0, 1, 2, 0, 0, 1, 1, 1], dtype=int32), 3, [2.0, 7.5])
>>> utils.bin_series(xs, bounds=[2.0])
(array([1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1], dtype=int32), 2, [2.0])

Unlike the previous two types of binning, this approach returns the specific bounds rather than the bin sizes. The other two returns, the binned sequence and the number of bins, are returned as before.

This approach is useful in situations where the system has natural thesholds, e.g. the polarized/hyperpolarized states of a neuron.

Parameters:
  • series (sequence) – the continuously-valued time series
  • b (int) – the desired number of uniform bins
  • step (float) – the desired size of each uniform bin
  • bounds (sequence) – the (finite) bounds of each bin
Returns:

the binned sequence, the number of bins and either the bin sizes or bin bounds

Return type:

either (numpy.ndarray, int, float) or (numpy.ndarray, int, sequence)

Raises:
  • ValueError – if no keyword argument is provided
  • ValueError – if more than one keyword argument is provided
  • InformError – if an error occurs in the inform C call

State Coalescing

pyinform.utils.coalesce.coalesce_series(series)[source]

Coalesce a timeseries into as few contiguous states as possible.

The magic of information measures is that the actual values of a time series are irrelavent. For example, \(\{0,1,0,1,1\}\) has the same entropy as \(\{2,9,2,9,9\}\) (possibly up to a rescaling). This give us the freedom to shift around the values of a time series as long as we do not change the relative number of states.

This function thus provides a way of “compressing” a time series into as small a base as possible. For example

>>> utils.coalesce_series([2,9,2,9,9])
(array([0, 1, 0, 1, 1], dtype=int32), 2)

Why is this useful? Many of the measures use the base of the time series to determine how much memory to allocate; the larger the base, the higher the memory usage. It also affects the overall performance as the combinatorics climb exponentially with the base.

The two standard usage cases for this function are to reduce the base of a time series

>>> utils.coalesce_series([0,2,0,2,0,2])
(array([0, 1, 0, 1, 0, 1], dtype=int32), 2)

or ensure that the states are non-negative

>>> utils.coalesce_series([-8,2,6,-2,4])
(array([0, 2, 4, 1, 3], dtype=int32), 5)

Notice that the encoding that is used ensures that the ordering of the states stays the same, e.g. \(\{-8 \rightarrow 0, -2 \rightarrow 1, 2 \rightarrow 2, 4 \rightarrow 3, 6 \rightarrow 4\}\). This isn’t strictly necessary, so we are going to call it a “feature”.

Parameters:series (sequence) – the time series to coalesce
Returns:the coalesced time series and its base
Return type:the 2-tuple (numpy.ndarray, int)
Raises:InformError – if an error occurs in the inform C call

State Encoding

State encoding is a necessity when complex time series are being analyzed. For example, \(k\)-history must be encoded as an integer in order to “observe” it using a Dist. What if you are interested in correlating the aggragate state of one group of nodes with that of another? You’d need to encode the groups’ states as integers. This module (pyinform.utils.encoding)provides such functionality.

Attention

As a practical matter, these utility functions should only be used as a stop-gap while a solution for your problem is implemented in the core Inform library. “Why?” you ask? Well, these functions are about as efficient as they can be for one-off state encodings, but most of the time you are interested in encoding sequences of states. This can be done much more efficiently if you encode the entire sequence at once. You need domain-specific information to make that happen.

This being said, these functions aren’t bad just be aware that they may turn into a bottleneck in whatever you are implementing.

pyinform.utils.encoding.encode(state, b=None)[source]

Encode a base-b array of integers into a single integer.

This function uses a big-endian encoding scheme. That is, the most significant bits of the encoded integer are determined by the left-most end of the unencoded state.

>>> from pyinform.utils import *
>>> encode([0,0,1], b=2)
1
>>> encode([0,1,0], b=3)
3
>>> encode([1,0,0], b=4)
16
>>> encode([1,0,4], b=5)
29

If b is not provided (or is None), the base is inferred from the state with a minimum value of 2.

>>> from pyinform.utils import *
>>> encode([0,0,2])
2
>>> encode([0,2,0])
6
>>> encode([1,2,1])
16

See also decode().

Parameters:
  • state (sequence) – the state to encode
  • b (int) – the base in which to encode
Returns:

the encoded state

Return type:

int

Raises:
  • ValueError – if the state is empty
  • InformError – if an error occurs in the inform C call
pyinform.utils.encoding.decode(encoding, b, n=None)[source]

Decode an integer into a base-b array with n digits.

The provided encoded state is decoded using the big-endian encoding scheme.

>>> decode(2, b=2, n=2)
array([1, 0], dtype=int32)
>>> decode(6, b=2, n=3)
array([1, 1, 0], dtype=int32)
>>> decode(6, b=3, n=2)
array([2, 0], dtype=int32)

Note that the base b must be provided, but the number of digits n is optional. If it is provided then the decoded state will have exactly that many elements.

>>> decode(2, b=2, n=4)
array([0, 0, 1, 0], dtype=int32)

However, if n is too small to contain a full representation of the state, an error will be raised.

>>> decode(6, b=2, n=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/workspace/pyinform/utils/encoding.py", line 126, in decode
    error_guard(e)
  File "/home/ubuntu/workspace/pyinform/error.py", line 57, in error_guard
    raise InformError(e,func)
pyinform.error.InformError: an inform error occurred - "encoding/decoding failed"

If n is not provided, the length of the decoded state is as small as possible:

>>> decode(1, b=2)
array([1], dtype=int32)
>>> decode(1, b=3)
array([1], dtype=int32)
>>> decode(3, b=2)
array([1, 1], dtype=int32)
>>> decode(3, b=3)
array([1, 0], dtype=int32)
>>> decode(3, b=4)
array([3], dtype=int32)

Of course encode() and decode() play well together.

>>> for i in range(100):
...     assert(encode(decode(i, b=2)) == i)
...
>>>

See also encode().

Parameters:
  • encoding (int) – the encoded state
  • b (int) – the desired base
  • n (int) – the desired number of digits
Returns:

the decoded state

Return type:

numpy.ndarray

Raises:
  • InformError – if n is too small to contain the decoding
  • InformError – if an error occurs within the inform C call