In [1]:
import nbsetup
import numpy as np
import ensemblepy as ep

In [4]:
def pmf(sequence):
    """ Calculates the pmf for a given boolean sequence """
    p1, total = sequence.count("1"), len(sequence)
    return [p1/total, 1 - p1/total]

def run_cases(cases):
    """ Calculates complexity and ensembles entropies for sequence of bools """
    measures = ep.measures([pmf(c) for c in cases], with_meta=True)
    [print(ep.LEGEND[k][0], v) for k,v in measures.items()]
    return measures

# Example accessible walk through

For example we have 3 systems A, B, C which we are treating as ensembles. We only get a small amount of data from these systems, so we want to know if we can combine the results treating them like a standard statistical system or if they are \emph{ergodicially complex} and so need more data.

## Ordered case

The only measure the systems produce is a sequence of booleans e.g. A = 1110000000, B = 1100000000, C = 1110000000.

Using Shannon's formula (\ref{equation_shannon_entropy}) a common method\cite{ref_selforg} to approximate the information entropy \(H_i\) for a message string like this is by using the probability of getting a 0 or 1, using all the data from the sequence we've measured. So for ensemble A we have \(P_A(0) = 7/10, P_A(1) = 3/10\), giving an information entropy of
\[
H_A = -\sum p_i \log{p_i}
= -0.7\log(0.7)-0.3\log(0.3)
= 0.6109
\]
Using the same method we find \(H_B = 0.5004, H_C = 0.6109\).

To find \(H_\varepsilon\) we conceptually assume the system is ergodic so observations from all ensembles can be combined into a new single system E = 111000000011000000001111000000, 
\[
H_\varepsilon
= -\frac{9}{30}\log(\frac{9}{30})-\frac{21}{30}\log(\frac{21}{30})
= 0.5799
\]
Which is the same as using the mean ensemble probability distribution \(\overline{P}=\{\overline{p_1}, \overline{p_0}\}\).

Finally, using (\ref{equation_complexity}) we get
\[
C_\varepsilon
= \sqrt{\frac{(H_\varepsilon-H_A)^2+(H_\varepsilon-H_B)^2+(H_\varepsilon-H_C)^2}{3*H_\varepsilon}} = 0.07
\]
Which is low and matches our intuition that we could treat it as statistical.

In [5]:
ordered = run_cases(["1110000000", "1100000000", "1110000000"])

Mean ensemble entropy 0.5740436758826583
Pooled entropy 0.5799151714181009
Ensemble divergence 0.005871495535442603
Ensemble complexity 0.06881250337735914
Entropies of individual ensembles [0.6108643020548935, 0.5004024235381879, 0.6108643020548935]
Weights of each ensemble [0.33333333 0.33333333 0.33333333]


## Ergodically complex case

In [6]:
erg = run_cases(["1000000000", "0111111111", "0111111111"])

Mean ensemble entropy 0.3250829733914482
Pooled entropy 0.6571577614973405
Ensemble divergence 0.3320747881058923
Ensemble complexity 0.409638798265074
Entropies of individual ensembles [0.3250829733914482, 0.3250829733914482, 0.3250829733914482]
Weights of each ensemble [0.33333333 0.33333333 0.33333333]


In [7]:
erg['complexity']/ordered['complexity']

5.952970436472369

## Disordered case

In [8]:
disordered = run_cases(["0111101011", "1110101101", "1111101110"])

Mean ensemble entropy 0.5740436758826583
Pooled entropy 0.5799151714181009
Ensemble divergence 0.005871495535442601
Ensemble complexity 0.06881250337735914
Entropies of individual ensembles [0.6108643020548935, 0.6108643020548935, 0.5004024235381879]
Weights of each ensemble [0.33333333 0.33333333 0.33333333]
