Inform is a cross-platform C library designed for performing information analysis of complex system.
-
The
inform_dist
struct provides discrete, emperical probability distributions. These form the basis for all of the information-theoretic measures. -
A collection of information measures built upon the distribution struct provide the core algorithms for the library, and are provided through the
shannon.h
header. -
A host of measures of the information dynamics on time series are built upon the core information measures. Each measure is housed in its own header, e.g.
active_info.h
.
In addition to the core components, a small collection of utilities are also provided.
Getting Started
Installation from Source
To build Inform, you will need to have CMake. Most of you can use your
package manager, e.g. apt-get
, pacman
or yum
on Linux or homebrew
, macports
or
fink
on OS X.
Linux, OS X, and Windows (Bash, MinGW and Cygwin)
Once CMake is installed, building, testing and installing the library is a snap
λ cmake . -DCMAKE_BUILD_TYPE=Release -DEXAMPLES=Yes
λ make all tests
λ sudo make install
Windows with MSVC
Building with MSVC is a bit more involved. Open the Visual C++ MSBuild command prompt (should be in your start menu). You can then run cmake build and test from the prompt:
λ cmake -DCMAKE_BUILD_TYPE=Release -DEXAMPLES=Yes -G"Visual Studio 14 2015"
λ msbuild /p:Configuration=Release ALL_BUILD.vcxproj
λ test\Release\inform_unittest.exe
Installation requires the user to manually copy the headers and libraries to wherever the user would like. This project-by-project approach is standard for Windows development, as you probably know.
Binary Installation
Precompiled binaries can be found at https://github.com/elife-asu/inform/releases.
Getting Help
Inform is developed to help anyone interested in applying information-theoretic techniques get things done quickly and painlessly. We cannot do that without your feedback. We host the project’s source code and issue tracker on Github. Please create an issue if you find a bug, an error in this documentation, or have a feature you’d like to request. Your contribution will make Inform a better tool for everyone.
If you are interested in contributing to Inform, please contact the developers, and we’ll get you up and running!
- Inform Source Repository
- Issue Tracker
Related Projects
Inform Community
While Inform is a great tool to use directly in C or C\\, significant effort has gone into making it easy to wrap higher level languages. Here are a few of the wrappers that are under active developed:
Package Name | Language | Repository | Website |
---|---|---|---|
PyInform |
Python |
||
rinform |
R |
||
Inform.jl |
Julia |
forthcoming |
|
InformWolfram |
Mathematica/Wolfram Language |
forthcoming |
Intellectual Relatives
Inform was not created in an intellectual vacuum. Many similar projects have preceded it and are under active development. All of those projects have advantages and disadvantages compared to Inform. If Inform doesn’t meet your needs, I hope you will let us know, and try one of these:
-
JIDT: Java Information Dynamics Toolkit (Java)
-
TRENTOOL: TRansfer ENtropy TOOLbox (Matlab)
-
dit: discrete information theory (Python)
-
MuTE: MuTE Toolbox- The Dynamic DIrected Links Detector (Matlab)
-
MVGC: Multivariate Granger Causality Toolbox (Matlab)
-
ACSS: Algorithmic Complexity for Short Strings ®
-
OACC: Online Algorithmic Complexity Calculator (web-based, R)
Copyright and Licensing
Copyright © 2016-2018 ELIFE, Arizona State University. Free use of this software is granted under the terms of the MIT License.
See the LICENSE file for details.
Relevant Publications
-
Moore, D.G., Valentini, G., Walker, S.I. and Levin, M. (2018) "Inform: Efficient Information-Theoretic Analysis of Collective Behaviors" Frontiers in Robotics & AI. (under review).
-
Moore, D.G., Valentini, G., Walker, S.I. and Levin, M. (2018) "Inform: A Toolkit for Information-Theoretic Analysis of Complex Systems". In: Proceedings of the 2017 IEEE Symposium on Computational Intelligence, Symposium on Artificial Life, IEEE Press. (in press).
Support
This project was supported in part by a grant provided by the Templeton World Charity Foundation as part of the Power of Information Inititive.
Error Handling
Introduction
Much of the time invested in Inform has been ensuring that errors are properly handled. Since Inform is designed to be used both natively and wrapped for use in another language, we needed a simple way to provide error information while still being complete. The result was the inform_error enumeration.
Most of Inform's functions take a pointer to an inform_error as a last argument. This allows the function to pass error information up the call stack. Below is a standard example of how to handle errors which arise during standard usage:
size_t const n = 1, m = 7, k = 2;
// expect only binary states
int const base = 2;
// state 5 is invalid
int const series[] = {0,0,1,1,2,0,1};
// Don't forget to initialize the error
inform_error err = INFORM_SUCCESS;
double ai = inform_active_info(series, n, m, k, base, &err);
if (inform_failed(&err))
{
// ERROR: time series has states inconsistent with expected base
fprintf(stderr, "ERROR: %s\n", inform_strerror(&err));
return -1;
}
Most functions also return a value which signifies erroneous results. If the user is
uninterested in the specific error message, then they can simply ignore it by passing a
NULL
pointer.
double ai = inform_active_info(series, n, m, k, base, NULL);
if (isnan(ai))
{
fprintf(stderr, "ERROR: an error occurred in computing AI\n");
return -1;
}
We strongly recommend that you handle errors. This is both good style and helps immensely when there’s actually a problem.
This covers the most common use cases for non-developers. If you are a developer, we strongly suggest that you review the rest of the error handling API. Otherwise, you’d probably be better served by moving on to the more interesting stuff such as Time Series Measures.
API Documentation
All of the error handling API is provided by the inform/error.h
header. The user rarely
has to explicitly include this header since most of the work-horse functions are bring this
header along.
Type Definitions | |
Macros | |
Error Tests | |
Error Messages |
Time Series Measures
The original purpose of Inform was to analyze time series data. This explains why most of Inform's functionality resides in functions specifically optimized for analyzing time series. The API was designed to be easy to use in C or C++, or to be wrapped in a higher-level language, e.g. Python. This means that we avoided some of the "niceties" of C, such as extensive use of macros and other generic atrocities, in favor of wrappability. Keep this in mind as you learn the API.
Many information measures have "local" variants which compute a time series of point-wise values. These local variants have names similar to their averaged or global counterparts, e.g inform_active_info and inform_local_active_info. We have been meticulous in ensuring that function and parameter names are consistent across measures. If you notice some inconsistency, please report it as an issue.
Notation
Throughout the discussion of time series measures, we will try to use a consistent notation. We will denote random variables as \(X,Y,\ldots\), and let \(x_i,y_i,\ldots\) represent the \(i\)-th time step of a time series drawn from the associated random variable. Many of the measures consider \(k\)-histories (a.k.a \(k\)-blocks) of the time series, e.g. \(x_i^{(k)} = \left\{x_{i-k+1}, x_{i-k+2},\ldots,x_i\right\}\).
When denoting probability distributions, we will only make the random variable explicit in situations where the notation is ambiguous. We will typically write \(p(x_i)\), \(p(x_i^{(k)})\), and \(p(x_i^{(k)}, x_{i+1})\) to denote the empirical probability of observing the \(x_i\) state, the \(x_i^{(k)}\) \(k\)-history, and the joint probability of observing \(\left(x_i^{(k)}, x_{i+1}\right)\).
Please report any notational ambiguities as an issue.
Implementation Details
The Base: States and Logarithms
The word "base" has two different meanings in the context of information measures on time series. It could refer to the base of the time series itself, that is the number of unique states in the time series. For example, the time series \(\{0,2,1,0,0\}\) is a base-3 time series. On the other hand, it could refer to the base of the logarithm used in computing the information content of the inferred probability distributions. The problem is that these two meanings clash. The base of the time series affects the range of values the measure can produce, and the base of the logarithm represents a rescaling of those values.
In this library we deal with this by always using base-2 logarithms, and having the user specify the base of the time series — don’t worry, we set an error if the provided base doesn’t make sense. All of this ensures that the library is a simple as reasonably possible.
Multiple Initial Conditions
You generally need a lot of data to infer a probability distribution. An experimentalist or simulator might then collect data over multiple trials or initial conditions. Most of Inform's time series measures allow the user to pass in a two-dimensional, rectangular array which each row representing a time series from a different initial condition. From this the probability distributions are inferred, and the desired value is calculated. This has the downside of requiring that the user store all of the data in memory, but it has the advantage of being fast and simple. Trade-offs, man…
A subsequent release, likely v1.1.0, will allows the initial conditions to have time series of different lengths and will provide accumulator implementations of all of these measures that will let the incrementally construct the distributions. This will lift some of the memory burden at the expense of runtime performance.
Calling Conventions
All of the of the time series functions described in this section use the same basic calling conventions and use the same (or similar) argument names were possible.
Argument Name | Description |
---|---|
|
A 2-D or 3-D, finite-state time series in contiguous, row-major form |
|
The number of "sources" or variables in a 3-D time series |
|
The number of initial conditions per "source" |
|
The number of time steps per "source" |
|
The base of the time series |
|
The length history length |
|
An error argument |
Average measures generally return a double-precision, floating-point value while local
variants return a pointer to an appropriately shaped, contiguous array. Local measures
accept an argument, often named after the function (e.g. local active information takes an
ai
argument), which is used to store the computed local values. If that argument is NULL,
then the function allocates an array.
We will try to note any deviations from these conventions.
Active Information
Active information (AI) was introduced in [Lizier2012] to quantify information storage in distributed computations. Active information is defined in terms of a temporally local variant
where the probabilities are constructed empirically from the entire time series. From the local variant, the temporally global active information is defined as
Strictly speaking, the local and average active information are defined as
but we do not provide limiting functionality in this library (yet!).
Block Entropy
Block entropy, also known as \(N\)-gram entropy [Shannon1948], is the standard Shannon entropy of the \(k\)-histories of a time series:
which reduces to the traditional Shannon entropy for \(k=1\).
Conditional Entropy
Conditional entropy is a measure of the amount of information required to describe a random variable \(Y\) given knowledge of another random variable \(X\). When applied to time series, two time series are used to construct the empirical distributions, and inform_shannon_ce can be applied to yield
This can be viewed as the time-average of the local conditional entropy
See [Cover1991] for more information.
Cross Entropy
Cross entropy between two distributions \(p_X\) and \(q_X\) measures the amount of information needed to identify events using a coding scheme optimized for \(q_X\) when \(p_X\) is the "real" distributions over \(X\).
Cross entropy’s local variant is equivalent to the self-information of \(q_X\), and as such is implemented by inform_local_block_entropy.
See [Cover1991] for more details.
Effective Information
Effective information is a causal measure aimed at teasing out the causal structure of a dynamical system. In essence, it is the mutual information between an "intervention" distribution — a probability distribution over initial states — and the distribution after one time step:
where \(A\) is the transition probability matrix. Functionality to construct a transition probability matrix from time series is provided by the [inform_tpm] function.
See [Tononi2003], [Hoel2013] or [Hoel2017] for more information.
Entropy Rate
Entropy rate quantifies the amount of information needed to describe the next state of \(X\) given observations of \(X^{(k)}\). In other wrods, it is the entropy of the time series conditioned on the \(k\)-histories. The local entropy rate
can be averaged to obtain the global entropy rate
Much as with active information, the local and average entropy rates are formally obtained in the limit
but we do not provide limiting functionality in this library (yet!).
See [Cover1991] for more details.
Excess Entropy
Formally, the excess entropy is the mutual information between two adjacent, semi-infinite blocks of variables:
Of course, we cannot take the limit in practice, so we implement the finite form:
We can think of excess entropy as a slight generalization of active information or a special case of predictive information.
See [Crutchfield2003] and [Feldman2003] for more details.
Information Flow
Information flow (IF) was introduced by Ay and Polani as a measure of the "strength of a causal effect" ([Ay2008]). Unlike many of the other measures in Inform, information flow was designed around interventions: rather than allowing dynamics to progress naturally, some subset of the random variables are forced into a particular state. If no interventions are performed, then IF reduces to conditional mutual information.
Practially, there is no way for Inform to know from time series whether or not any interventions have actually been performed upon the system. As such, the onous is on the user to determine whether this function is in-fact computing Ay and Polani’s information flow.
In accordance with [Ay2008], information flow is defined as
All probabilities are estimated from sequences of observations — the order of those sequences is irrelevant. A subsequent version will allow the user to specify intervention distributions.
Evidence Of Integration
Evidence of Integration (EoI) was introduced in [Biehl2016] as a means of identifying representations of agents in dynamical systems. Given a collection of random variables, \(\mathcal{X}=\{X_1,X_2,\ldots,X_l\}\), we can form (non-trivial) partitionings of \(\mathcal{X}\):
Each partitioning \(\mathcal{Z} = \{Z_1, \ldots, Z_m\}\) consists of a collection of joint random variables built from \(\mathcal{X}\). The configurations of \(\mathcal{Z}\) are in one-to-one correspondence with the configurations of \(\mathcal{X}\). Given such a partitioning, we can ask what the pointwise mutual information is:
If \(z=\{z_1,\ldots,z_m\}\) corresponds to the configuration \(x=\{x_1,\ldots,x_l\}\), we refer to the above mutual information as the "evidence of integration of \(x\) with respect to the partitioning \(\mathcal{Z}\)".
We say that a configuration \(x\) is integrated if and only if the evidence of integration of \(x\) is positive for all partitionings of the system.
Mutual Information
Mutual information (MI) is a measure of the amount of mutual dependence between at least two random variables. Locally, MI is defined as
The mutual information is then just the time average of \(i_i(X_1,\ldots,X_l)\):
See [Cover1991] for more details.
Partial Information Decomposition
Predictive Information
Formally, the predictive information is the mutual information between a finite-history and a semi-infinite future:
Of course, we cannot take the limit in practice, so we implement the finite form:
We can think of active information and excess entropy as a special cases of predictive information.
See [Bialek2001a] and [Bialek2001b] for more details.
Relative Entropy
Relative entropy, also known as the Kullback-Leibler divergence, measures the amount of information gained in switching from a prior distribution \(q_X\) to a posterior distribution \(p_X\) over the same support:
The local counterpart is
Note that the average in moving from the local to the non-local relative entropy is taken over the posterior distribution.
See [Kullback1951] and [Cover1991] for more information.
Separable Information
Separable information (SI) was introduced in [Lizier2010] as a method for detecting information modification in distributed computations. Given a random variable \(Y\) and a collection of potentially informative sources \(V_Y\) (wherein \(Y \not\in V_Y\)), separable information quantifies the total information gained about \(Y\) from seperate observations of histories of \(Y\) and "transfer contributions" from each of the sources \(X \in V_Y\). In other words, it is the sum of the Active Information of \(Y\) and the (apparent) Transfer Entropy from each source to \(Y\):
Transfer Entropy
Transer entropy (TE) was introduced in [Schreiber2000] to quantify information transfer between an information source and destination, conditioning out shared history effects. TE was originally formulated considering only the source and destination; however many systems of interest have more just those two components. As such, it may be further necessary to condition the probabilities on the states of all "background" components in the system. These two forms are sometimes referred to as apparent and complete transfer entropy, respectively ([Lizier2008]).
Our implementation of TE allows the user to condition the probabilities on any number of background processes, within hardware limits. For the subsequent description, take \(X\) to be the source, \(Y\) the target, and \(\mathcal{W} = \left\{W_1, \ldots, W_l\right\}\) be the background processes against which we’d like to condition. For example, we might take the state of two nodes in a dynamical network as the source and target, while all other nodes in the network are treated as the background. Transfer entropy is then defined in terms of a time-local variant
where \(W_i = w_{1,i},\ldots,w_{l,i}\) are the states of each of the background nodes at time step \(i\), and the probabilities are constructed empirically from the entire time series. From the local variant, the temporally global transfer entropy is defined as
Strictly speaking, the local and average transfer entropy are defined as
but we do not provide limiting functionality in this library (yet!).
Utilities
Binning Time Series
Black-Boxing Time Series
It is often useful when analyzing complex systems to black-box components of the system. This process amounts to grouping components of the system and treating them as a single entity, without regard for its small-scale structure. As an example, consider you have two Boolean random variables \(X\) and \(Y\), and you observe time series of each simultaneously:
It may be worthwhile to consider these observations as observations of the joint variable \((X,Y)\):
The joint observations can then be naturally encoded, for example, as base-4 states
We refer this process of mapping the observations of \(X\) and \(Y\) to the encoded observations of \((X,Y)\) as black boxing. In this case, the black boxing procedure is performed in "space" (you might think of \(X\) and \(Y\) having locations in space). We may also black box in time. The canonical example of this is considering the \(k\)-history of a random variable:
and the observations of \(X^{(2)}\) can be encoded as base-4 states:
We provide a basic black-boxing function that allows the user to black-box in both space and into the future and past of a collection of random variable [inform_black_box]. The [inform_black_box_parts] allows the user to black-box time series based on a partitioning scheme (useful in the implementation of integration measures such as Evidence Of Integration).
Coalescing Time Series
Information measures over time series are invariant under any change of encoding that does not change the relative occurrance of each symbol. For example, the following time series have the same entropy:
Each of the time series has an effective base of 2, but their apparent bases are different: 5 and 2, respectively. The result is that computations using \(A\) require \(log_2{5}\times\) more memory due to larger potential state spaces. They also tend to run a bit slower since many information-theoretic algorithms have runtimes that trend with the volume of the state space.
The inform_coalesce
function maps a time series to a new time series whose effective base
is the same as its apparent base: e.g. maps \(A \mapsto B\).
Encoding/Decoding States
Many of Inform's implementations require that states be encoded as integers. Two functions, [inform_encode] and [inform_decode] handle encoding and decoding states to and from integers, respectively.
Partitioning Time Series
Many analyses of complex systems consider partitioning of the system into components or modules. One example of this is Evidence Of Integration.
For generality, we represent a partitioning of \(N\) items into \(1 \leq M \leq N\) partitions as a sequence of integers \(p_1, \ldots, p_N\) where \(0 \leq p_i < M\) is the partition to which the \(i\)-th item belongs.
As an example, suppose we partitioned \(\{X_1, X_2, X_3\}\) as \(\{\{X_1\}, \{X_2,X_3\}\}\). This would be represented as \((0,1,1)\) because \(X_1\) belongs to the zeroth partition and \(X_2,X_3\) belong to the first partition.
We provide two functions to facilitate the generation of all unique partitionings of a
system of N
elements: [inform_first_partitioning] and [inform_next_partitioning].
Random Time Series
It is sometimes useful to generate random time series, particularly when testing functions. Since we’ve implemented a few functions which take care of the memory allocation, etc…, we decided to expose them.
Time Series to TPM
Some information measures are defined on transition probability matrices (TPM), e.g. Effective Information. For this reason, we provide a function [inform_tpm] which estimates the one-time-step TPM from time series data.
Empirical Distributions
The inform_dist
struct provides an empirical distribution,
i.e. a histogram, representing the observed frequencies of some fixed-sized set of events.
This structure is the basis for all of the fundamental information measures provided on
discrete probability distributions.
The purpose of the inform_dist
is to make writing custom
implementations of information-theoretic measures possible. Most of the measures implemented
in Inform use this structure at their core.
Type | |
Allocation |
inform_dist_alloc, inform_dist_realloc, inform_dist_copy, inform_dist_dup, inform_dist_create, inform_dist_infer, inform_dist_approximate, inform_dist_uniform |
Deallocation | |
Accessors/Mutators |
inform_dist_size, inform_dist_counts, inform_dist_is_valid, inform_dist_get, inform_dist_set, inform_dist_tick, inform_dist_accumulate |
Probabilities |
Distribution Type
Allocation
Deallocation
Accessors/Mutators
Probabilities
Shannon Information Measures
The inform/shannon.h
header provides a collection of entropy and information measures on
discrete probability distributions (inform_dist
). These
functions provide the core of Inform as all of the time series analysis functions are
build upon them.
Entropy | |
Mutual Information |
inform_shannon_pmi, inform_shannon_mi, inform_shannon_multi_pmi, inform_shannon_multi_mi |
Conditional Entropy | |
Conditional Mutual Information | |
Relative Entropy | |
Cross Entropy |
References
-
[Bialek2001a] Bialek, W., Nemenman, I. and Naftali, T. (2001) "Predictability, complexity, and learning". Neural computation. 13 (11): 2409-2463. doi:10.1162/089976601753195969
-
[Bialek2001b] Bialek, W., Nemenman, I. and Naftali, T. (2001) "Complexity through nonextensivity" Physica A. 302 (1): 89-99. doi:10.1016/S0378-4371(01)00444-7
-
[Biehl2016] Biehl, M., Ikegami, T. and Polani, D. (2016) "Towards information based spatiotemporal patterns as a foundation for agent representation in dynamical systems". Proceedings of the Artificial Life Conference 2016. 722-9. doi:10.7551/978-0-262-33936-0-ch115
-
[Cover1991] Cover, T.M and Thomas, J.A. (1991) "Elements of information theory". New York: Wiley. ISBN 0-471-06259-6.
-
[Crutchfield2003] Crutchfield, J.P. and Feldman D.P. (2003) "Regularities unseen, randomness observed: Levels of entropy convergence". Chaos. 13 (1): 25-54. doi:10.1063/1.1530990.
-
[Feldman2003] Feldman, D.P. and Crutchfield, J.P. (2003) "Structural information in two-dimensional patterns: Entropy convergence and excess entropy". Physical Review E. 67 (5): 051104. doi:10.1103/PhysRevE.67.051104.
-
[Hoel2013] Hoel, E.P., Albantakis, L. and Tononi, G. (2013) "Quantifying causal emergence shows that macro can beat micro". Proceedings of the National Academy of Sciences. 110 (4): 19790-19795. doi:10.1073/pnas.1314922110.
-
[Hoel2017] Hoel, E.P. (2017) "When the map is better than the territory". 19 (5): 188. doi:10.3390/e19050188.
-
[Kullback1951] Kullback, S. and Leibler, R.A. (1951) "On information an sufficiency". Annals of Mathematical Statistics. 22 (1): 79-86. doi:10.1214/aoms/1177729694. MR 39968.
-
[Lizier2008] Lizier, J.T., Prokopenko, M. and Zomaya, A.Y. (2008) "Local information transfer as a spatiotemporal filter for complex systems". Physical Review E. 77 (2): 026110. doi:10.1103/PhysRevE.77.026110.
-
[Lizier2010] Lizier, J.T., Prokopenko, M. and Zomaya, A.Y. (2010) "Information modification and particle collisions in distributed computation". Chaos. 20 (3): 037109. doi:10.1063/1.3486801.
-
[Lizier2012] Lizier, J.T., Prokopenko, M. and Zomaya, A.Y. (2012) "Local measures of information storage in complex distributed computation". Information Sciences. 208: 39-54. doi:10.1016/j.ins.2012.04.016.
-
[Schreiber2000] Schreiber, T. (2000) "Measuring information transfer". Physical Review Letters. 85 (2): 461-464. doi:10.1103/PhysRevLett.85.461.
-
[Shannon1948] Shannon, C.E. (July-October 1948) "A Mathematical Theory of Communication". Bell System Technology Journal. 27 (3): 379-423. doi:10.1002/j.1538-7305.1948.tb01448.x.
-
[Tononi2003] Tononi, G. and Sporns, O. (2003) "Measuring information integration". BMC neuroscience. 4 (1): 31. doi:101186/1471-2202-4-31.