Empirical Distributions¶
The pyinform.dist.Dist
class provides an empirical distribution,
i.e. a histogram, representing the observed frequencies of some fixedsize set
of events. This class is the basis for all of the fundamental information
measures on discrete probability distributions.
Examples¶
Example 1: Construction¶
You can construct a distribution with a specified number of unique observables. This construction method results in an invalid distribution as no observations have been made thus far.
>>> d = Dist(5)
>>> d.valid()
False
>>> d.counts()
0
>>> len(d)
5
Alternatively you can construct a distribution given a list (or NumPy array) of observation counts:
>>> d = Dist([0,0,1,2,1,0,0])
>>> d.valid()
True
>>> d.counts()
4
>>> len(d)
7
Example 2: Making Observations¶
Once a distribution has been constructed, we can begin making observations. There are two methods for doing so. The first uses the standard indexing operations, treating the distribution similarly to a list:
>>> d = Dist(5)
>>> for i in range(len(d)):
... d[i] = i*i
>>> list(d)
[0, 1, 4, 9, 25]
The second method is to make incremental changes to the distribution. This is useful when making observations of timeseries:
>>> obs = [1,0,1,2,2,1,2,3,2,2]
>>> d = Dist(max(obs) + 1)
>>> for event in obs:
... assert(d[event] == d.tick(event)  1)
...
>>> list(d)
[1, 3, 5, 1]
It is important to remember that Dist
keeps track of your
events as you provide them. For example:
>>> obs = [1, 1, 3, 5, 1, 3, 7, 9]
>>> d = Dist(max(obs) + 1)
>>> for event in obs:
... assert(d[event] == d.tick(event)  1)
...
>>> list(d)
[0, 3, 0, 2, 0, 1, 0, 1, 0, 1]
>>> d[3]
2
>>> d[7]
1
If you know there are “gaps” in your time series, e.g. no even numbers, then you
can use the utility function coalesce_series()
to get
rid of them:
>>> from pyinform import utils
>>> obs = [1, 1, 3, 5, 1, 3, 7, 9]
>>> coal, b = utils.coalesce_series(obs)
(array([0, 0, 1, 2, 0, 1, 3, 4], dtype=int32), 5)
>>> d = Dist(b)
>>> for event in coal:
... assert(d[event] == d.tick(event)  1)
...
>>> list(d)
[3, 2, 1, 1, 1]
>>> d[1]
2
>>> d[3]
7
This can significantly improve memory usage in situations where the range of possible states is large, but is sparsely sampled in the time series.
Example 3: Probabilities¶
Once some observations have been made, we can start asking for probabilities. As in the previous examples, there are multiple ways of doing this. The first is to just ask for the probability of a given event.
>>> d = Dist([3,0,1,2])
>>> d.probability(0)
0.5
>>> d.probability(1)
0.0
>>> d.probability(2)
0.16666666666666666
>>> d.probability(3)
0.3333333333333333
Sometimes it is nice to just dump the probabilities out to an array:
>>> d = Dist([3,0,1,2])
>>> d.dump()
array([ 0.5 , 0. , 0.16666667, 0.33333333])
Example 4: Shannon Entropy¶
Once you have a distribution you can do lots of fun things with it. In this example, we will compute the shannon entropy of a timeseries of observed values.
from math import log2
from pyinform.dist import Dist
obs = [1,0,1,2,2,1,2,3,2,2]
d = Dist(max(obs) + 1)
for event in obs:
d.tick(event)
h = 0.
for p in d.dump():
h = p * log2(p)
print(h) # 1.68547529723
Of course PyInform provides a function for this:
pyinform.shannon.entropy()
.
from pyinform.dist import Dist
from pyinform.shannon import entropy
obs = [1,0,1,2,2,1,2,3,2,2]
d = Dist(max(obs) + 1)
for event in obs:
d.tick(event)
print(entropy(dist)) # 1.6854752972273344
API Documentation¶

class
pyinform.dist.
Dist
(n)[source]¶ Dist is class designed to represent empirical probability distributions, i.e. histograms, for cleanly logging observations of time series data.
The premise behind this class is that it allows PyInform to define the standard entropy measures on distributions. This reduces functions such as
pyinform.activeinfo.active_info()
to building distributions and then applying standard entropy measures.
__init__
(n)[source]¶ Construct a distribution.
If the parameter n is an integer, the distribution is constructed with a zeroed support of size n. If n is a list or
numpy.ndarray
, the sequence is treated as the underlying support.Examples:
>>> d = Dist(5) >>> d = Dist([0,0,1,2])
Parameters: n (int, list or
numpy.ndarray
) – the support for the distributionRaises:  ValueError – if support is empty or multidimensional
 MemoryError – if memory allocation fails within the C call

__len__
()[source]¶ Determine the size of the support of the distribution.
Examples:
>>> len(Dist(5)) 5 >>> len(Dist[0,1,5]) 3
See also
counts()
.Returns: the size of the support Return type: int

__getitem__
(event)[source]¶ Get the number of observations made of event.
Examples:
>>> d = Dist(2) >>> (d[0], d[1]) (0, 0)
>>> d = Dist([0,1]) >>> (d[0], d[1]) (0, 1)
See also
__setitem__()
,tick()
andprobability()
.Parameters: event (int) – the observed event Returns: the number of observations of event Return type: int Raises: IndexError – if event < 0 or len(self) <= event

__setitem__
(event, value)[source]¶ Set the number of observations of event to value.
If value is negative, then the observation count is set to zero.
Examples:
>>> d = Dist(2) >>> for i, _ in enumerate(d): ... d[i] = i*i ... >>> list(d) [0, 1]
>>> d = Dist([0,1,2,3]) >>> for i, n in enumerate(d): ... d[i] = 2 * n ... >>> list(d) [0, 2, 4, 6]
See also
__getitem__()
andtick()
.Parameters:  event (int) – the observed event
 value (int) – the number of observations
Raises: IndexError – if
event < 0 or len(self) <= event

resize
(n)[source]¶ Resize the support of the distribution in place.
If the distribution…
 shrinks  the last
len(self)  n
elements are lost, the rest are preserved  grows  the last
n  len(self)
elements are zeroed  is unchanged  well, that sorta says it all, doesn’t it?
Examples:
>>> d = Dist(5) >>> d.resize(3) >>> len(d) 3 >>> d.resize(8) >>> len(d) 8
>>> d = Dist([1,2,3,4]) >>> d.resize(2) >>> list(d) [1, 2] >>> d.resize(4) >>> list(d) [1, 2, 0, 0]
Parameters: n (int) – the desired size of the support
Raises:  ValueError – if the requested size is zero
 MemoryError – if memory allocation fails in the C call
 shrinks  the last

copy
()[source]¶ Perform a deep copy of the distribution.
Examples:
>>> d = Dist([1,2,3]) >>> e = d >>> e[0] = 3 >>> list(e) [3, 2, 3] >>> list(d) [3, 2, 3]
>>> f = d.copy() >>> f[0] = 1 >>> list(f) [1, 2, 3] >>> list(d) [3, 2, 3]
Returns: the copied distribution Return type: pyinform.dist.Dist

counts
()[source]¶ Return the number of observations made thus far.
Examples:
>>> d = Dist(5) >>> d.counts() 0
>>> d = Dist([1,0,3,2]) >>> d.counts() 6
See also
__len__()
.Returns: the number of observations Return type: int

valid
()[source]¶ Determine if the distribution is a valid probability distribution, i.e. if the support is not empty and at least one observation has been made.
Examples:
>>> d = Dist(5) >>> d.valid() False
>>> d = Dist([0,0,0,1]) >>> d.valid() True
See also
__len__()
andcounts()
.Returns: a boolean signifying that the distribution is valid Return type: bool

tick
(event)[source]¶ Make a single observation of event, and return the total number of observations of said event.
Examples:
>>> d = Dist(5) >>> for i, _ in enumerate(d): ... assert(d.tick(i) == 1) ... >>> list(d) [1, 1, 1, 1, 1]
>>> d = Dist([0,1,2,3]) >>> for i, _ in enumerate(d): ... assert(d.tick(i) == i + 1) ... >>> list(d) [1, 2, 3, 4]
See also
__getitem__()
and__setitem__()
.Parameters: event (int) – the observed event Returns: the total number of observations of event Return type: int Raises: IndexError – if event < 0 or len(self) <= event

probability
(event)[source]¶ Compute the empiricial probability of an event.
Examples:
>>> d = Dist([1,1,1,1]) >>> for i, _ in enumerate(d): ... assert(d.probability(i) == 1./4) ...
See also
__getitem__()
anddump()
.Parameters: event (int) – the observed event
Returns: the empirical probability event
Return type: float
Raises:  ValueError – if
not self.valid()
 IndexError – if
event < 0 or len(self) <= event
 ValueError – if

dump
()[source]¶ Compute the empirical probability of each observable event and return the result as an array.
Examples:
>>> d = Dist([1,2,2,1]) >>> d.dump() array([ 0.16666667, 0.33333333, 0.33333333, 0.16666667])
See also
probability()
.Returns: the empirical probabilities of all o
Return type: numpy.ndarray
Raises:  ValueError – if
not self.valid()
 RuntimeError – if the dump fails in the C call
 IndexError – if
event < 0 or len(self) <= event
 ValueError – if
