# A Wright-Fisher simulation implemented in C via Cython.

This tutorial implements a Wright-Fisher simulation with mutation and recombination using [Cython](http://www.cython.org).  Cython is two things:

* A grammer/dialect of Python that allows static typing Python and of C/C++ types.
* A static compiler to turn the Cython grammer in to C or C++ code to compile into a Python extension module.

Cython has a learning curve of its own. A lot of what is shown below reflects best practices.  For those, we refer you to the Cython documentation.

Here, we avoid all use of [numpy](http://www.numpy.org) until we have to talk to [msprime](http://msprime.readthedocs.io).  We replace all numpy functionality with the equivalent routines from the excellent GNU Scientific Library, or [GSL](https://www.gnu.org/software/gsl/doc/html/index.html). Yes, numpy is fast!  Numpy is written in C!  But, numpy has to talk back and forth to Python, meaning we can out-perform it by writing routines that execute completely on the C side.

This example is closer to reality for those working in lower-level languages.  First, we must build our world, which means defining data types (structs, in this case), functions acting on those types, and a bunch of auxillary code to manage memory and handle errors.  After all that, we can code up the `simplify` and `evolve` functions. Such is the price of speed.

First, we load an extension allowing us to write Cython in a notebook:

In [1]:
%load_ext Cython
# Set ourselves up for some plotting, too
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

The next code block is our Cython code.  The notebook environment will magically compile it from Cython to C, from C to a compiled Python module, and load the module into memory.

The code block is unavoidably long.  It defines a single Python function called `evolve`, which may then be run later in the notebook.  All of the functions marked `cdef` are visible only as C functions by other functions within the module.  This limited function scope is why we must write everything in one giant block.

Some comments:

* We use [CythonGSL](https://github.com/twiecki/CythonGSL) to get access to GSL types and functions in Cython.  Conda or pip install it if you want to use it for your projects.
* The memory managment and error handling on the C side is minimal, and the error handling in particular is naive.
* The recombination function is 100% executed in C.  Mutation is very close to that, except for where we use the `dict` to manage the infinitely-many sites mutation model.
* We get copy-free transfer (I think...) from C to numpy arrays via Cython's typed memory views.
* The `cimport` commands below bring names into scope. It is considered best practice to only `cimport` the symbols you use, but that quickly gets tedious here, and I gave myself a break and imported everything from `gsl_vector`.

Some details:

For reasons I don't understand, attempting to pickle a `collections.namedtuple` instance in a `cdef` function raises an exception.  The workaround is a Cython class decorated with `\@cython.auto_pickle(True)`.  A side-effect is that the semantics for un-pickling the metadata differ from our Python examples.  Clearly, metadata is one of the trickier corners of `msprime/tskit`.

In [31]:
%%cython -3 -lgsl -lgslcblas -lm

import msprime
import numpy as np
import struct
cimport numpy as np
from cython.view cimport array as cvarray
from libc.stdlib cimport malloc, realloc, free
from libc.stdint cimport int32_t, uint32_t

from cython_gsl.gsl_rng cimport gsl_rng
from cython_gsl.gsl_rng cimport gsl_rng_mt19937
from cython_gsl.gsl_rng cimport gsl_rng_alloc
from cython_gsl.gsl_rng cimport gsl_rng_free
from cython_gsl.gsl_rng cimport gsl_rng_set
from cython_gsl.gsl_rng cimport gsl_rng_uniform
from cython_gsl.gsl_random cimport gsl_ran_flat
from cython_gsl.gsl_random cimport gsl_ran_poisson
from cython_gsl.gsl_vector cimport *
from cython_gsl.gsl_sort cimport gsl_sort_vector

cdef int32_t * malloc_int32_t(size_t n):
    return <int32_t*>malloc(n*sizeof(int32_t))

cdef int32_t * realloc_int32_t(void * x, size_t n):
    return <int32_t*>realloc(x,n*sizeof(int32_t))

cdef double * malloc_double(size_t n):
    return <double*>malloc(n*sizeof(double))

cdef double * realloc_double(double * x, size_t n):
    return <double*>realloc(<double *>x,n*sizeof(double))

cdef struct Mutations:
    double * pos
    int32_t * time
    int32_t * node
    size_t next_mutation, capacity
    
cdef int init_Mutations(Mutations * m):
    m.next_mutation = 0
    m.capacity = 10000
    m.pos = malloc_double(m.capacity)
    if m.pos == NULL:
        return -1
    m.time = malloc_int32_t(m.capacity)
    if m.time == NULL:
        return -1
    m.node = malloc_int32_t(m.capacity)
    if m.node == NULL:
        return -1
    return 0

cdef int realloc_Mutations(Mutations * m):
    m.capacity *= 2
    m.pos = realloc_double(m.pos,
                          m.capacity)
    if m.pos == NULL:
        return -1
    m.time = realloc_int32_t(m.time,
                            m.capacity)
    if m.time == NULL:
        return -1
    m.node = realloc_int32_t(m.node,
                            m.capacity)
    if m.node == NULL:
        return -1
    return 0

cdef void free_Mutations(Mutations * m):
    free(m.pos)
    free(m.time)
    free(m.node)
    m.next_mutation = 0
    m.capacity = 10000
    
cdef int add_mutation(double pos,
                     int32_t generation,
                     int32_t node,
                     list metadata,
                     Mutations * m):
    cdef int rv = 0
    if m.next_mutation+1 >= m.capacity:
        rv = realloc_Mutations(m)
        if rv != 0:
            return rv
    m.pos[m.next_mutation] = pos
    m.time[m.next_mutation] = generation
    m.node[m.next_mutation] = node
    m.next_mutation+=1
    metadata.append(struct.pack('id',generation,pos))
    return rv
    
cdef struct Nodes:
    double * time
    size_t next_node, capacity
    
cdef int init_Nodes(Nodes * n):
    n.next_node = 0
    n.capacity = 10000
    n.time = malloc_double(n.capacity)
    if n.time == NULL:
        return -1
    return 0

cdef int realloc_Nodes(Nodes * n):
    n.capacity *= 2
    n.time = realloc_double(n.time,
                            n.capacity)
    if n.time == NULL:
        return -1
    return 0
    
cdef void free_Nodes(Nodes * n):
    if n.time != NULL:
        free(n.time)
    n.next_node = 0
    n.capacity = 10000

cdef int add_node(double t, Nodes *n):
    cdef int rv = 0
    if n.next_node >= n.capacity:
        rv = realloc_Nodes(n)
        if rv != 0:
            return rv
    n.time[n.next_node] = t
    n.next_node+=1
    return rv
    
cdef struct Edges:
    double *left
    double *right
    int32_t *parent
    int32_t *child
    size_t next_edge, capacity
    
cdef int init_Edges(Edges * e):
    e.next_edge = 0
    e.capacity = 10000
    e.left = malloc_double(e.capacity)
    if e.left == NULL:
        return -1
    e.right = malloc_double(e.capacity)
    if e.right == NULL:
        return -1
    e.parent = malloc_int32_t(e.capacity)
    if e.parent == NULL:
        return -1
    e.child = malloc_int32_t(e.capacity)
    if e.child == NULL:
        return -1
    return 0
   
cdef int realloc_Edges(Edges * e):
    e.capacity *= 2
    e.left = realloc_double(e.left,e.capacity)
    if e.left == NULL:
        return -1
    e.right = realloc_double(e.right,e.capacity)
    if e.right == NULL:
        return -1
    e.parent = realloc_int32_t(e.parent,e.capacity)
    if e.parent == NULL:
        return -1
    e.child = realloc_int32_t(e.child,e.capacity)
    if e.child == NULL:
        return -1
    return 0

cdef void free_Edges(Edges * e):
    free(e.left)
    free(e.right)
    free(e.parent)
    free(e.child)
    e.next_edge = 0
    e.capacity = 10000
    
cdef int add_edge(double left, double right,
             int32_t parent, int32_t child,
             Edges * edges):
    cdef int rv=0
    if edges.next_edge+1 >= edges.capacity:
        rv = realloc_Edges(edges)
        if rv != 0:
            return rv
        
    edges.left[edges.next_edge] = left
    edges.right[edges.next_edge] = right
    edges.parent[edges.next_edge] = parent
    edges.child[edges.next_edge] = child
    edges.next_edge += 1
    return rv

cdef struct Tables:
    Nodes nodes
    Edges edges
    Mutations mutations
    gsl_rng * rng
    
cdef int init_Tables(Tables * t, int seed):
    cdef int rv = 0
    rv = init_Nodes(&t.nodes)
    if rv != 0:
        return rv
    rv = init_Edges(&t.edges)
    if rv != 0:
        return rv
    rv = init_Mutations(&t.mutations)
    if rv != 0:
        return rv
    t.rng = gsl_rng_alloc(gsl_rng_mt19937)
    if t.rng == NULL:
        return -1
    gsl_rng_set(t.rng, seed)
    return rv

cdef void free_Tables(Tables * t):
    free_Nodes(&t.nodes)
    free_Edges(&t.edges)
    free_Mutations(&t.mutations)
    gsl_rng_free(t.rng)
    
cdef int infsites(double mu, int32_t generation,
                  int32_t next_offspring_index,
                  Tables * tables,
                  list metadata,
                  dict lookup):
    cdef unsigned nmut = gsl_ran_poisson(tables.rng, mu)
    cdef unsigned i = 0
    cdef double pos
    cdef int rv = 0
    for i in range(nmut):
        pos = gsl_rng_uniform(tables.rng)
        while pos in lookup:
            pos = gsl_rng_uniform(tables.rng)
        rv = add_mutation(pos,
                         generation,
                         next_offspring_index,
                         metadata,
                         &tables.mutations)
        if rv != 0:
            return rv
        lookup[pos] = True
    return rv

cdef int value_present_vector(gsl_vector * v, double x,
                              size_t start, size_t stop):
    cdef size_t i
    for i in range(start,stop):
        if gsl_vector_get(v,i)==x:
            return 1
    return 0

cdef int poisson_recombination(double r,
                               size_t pg1, size_t pg2,
                               int32_t next_offspring_id,
                               Tables * tables):
    cdef unsigned nbreaks = gsl_ran_poisson(tables.rng, r)
    cdef gsl_vector * b = NULL
    cdef unsigned i = 0#,drew_zero=0
    cdef double x
    cdef int rv = 0
    cdef double left,right
    cdef int32_t p
    if nbreaks == 0:
        # The parent passes the 
        # entire region onto the child
        rv = add_edge(0.0,1.0,pg1,
                      next_offspring_id,
                      &tables.edges)
        if rv != 0:
            return rv
    else:
        b = gsl_vector_calloc(nbreaks+2) 
        while i < nbreaks:
            x = gsl_rng_uniform(tables.rng)
            while value_present_vector(b,x,0,i)==1:
                x = gsl_rng_uniform(tables.rng)
            gsl_vector_set(b,i,x)
            i += 1
        if gsl_vector_get(b,0) == 0.0:
            pg1,pg2 = pg2,pg1
            # We already have a zero
            # in there, so we need
            # to adjust size so that the 
            # 1.0 we insert below goes 
            # into the right place
            b.size -= 1
        else:
            # shift all values by 1
            # index and set element
            # 0 to 0
            # for i in range(b.size):
            #     print(gsl_vector_get(b,i))
            # print("-----")
            for i in range(b.size-1,0,-1):
                gsl_vector_set(b,i,
                              gsl_vector_get(b,i-1))
            gsl_vector_set(b,0,0.0)
                
        gsl_vector_set(b,b.size-1,1.0)
        gsl_sort_vector(b)
        # print("nbreaks=",nbreaks)
        # for i in range(b.size):
        #     print(gsl_vector_get(b,i))
        # print("//")
        # if drew_zero == 1:
        #     pg1,pg2 = pg2,pg1
        for i in range(b.size-1):
            left = gsl_vector_get(b,i)
            right = gsl_vector_get(b,i+1)
            rv = add_edge(left,right,pg1,
                          next_offspring_id,
                          &tables.edges)
            if rv != 0:
                gsl_vector_free(b)
                return rv
            pg1,pg2 = pg2,pg1
    gsl_vector_free(b)
    return 0

cdef int make_offspring(double mu, double r,
                        size_t generation,
                        size_t pg1, size_t pg2,
                        int32_t next_offspring_index,
                        list metadata,
                        dict lookup,
                        Tables * tables):
    cdef int rv
    rv = poisson_recombination(r,pg1,pg2,
                               next_offspring_index,
                               tables)
    if rv != 0:
        return -2
                
    rv = infsites(mu,generation+1,
                  next_offspring_index,
                  tables,metadata,lookup)
    if rv != 0:
        return -3
            
    rv = add_node(generation+1, &tables.nodes)
    if rv != 0:
        return -4
   
    return 0

cdef void handle_error_code(int error, Tables * tables):
    """
    Only to be called after make_offspring
    """
    if error == 0:
        return
    print("Error occurred")
    free_Tables(tables)
    if error == -2:
        raise RuntimeError("error during recombination")
    elif error == -2:
        raise RuntimeError("error during mutation")
    elif error == -4:
        raise RuntimeError("erorr adding nodes")
    else:
        raise ValueError("invalid error code")

cdef int simplify(Tables * tables, 
            double dt,
            list metadata,
            object nodes,
            object edges,
            object sites,
            object mutations):
    cdef int rv = 0,gap
    
    cdef size_t i
    cdef np.ndarray[double,ndim=1] dview,lview,rview
    cdef np.ndarray[int32_t,ndim=1] pview,cview
    # Reverse time for our new nodes
    cdef gsl_vector_view vt
    vt = gsl_vector_view_array(tables.nodes.time,<size_t>tables.nodes.next_node)
    cdef double tmax,tmin
    gsl_vector_minmax(&vt.vector,&tmin,&tmax)
    for i in range(tables.nodes.next_node):
        tables.nodes.time[i] = -1.0*(tables.nodes.time[i]-tmax)
    gsl_vector_minmax(&vt.vector,&tmin,&tmax)
    
    nodes.set_columns(time=nodes.time+dt,flags=nodes.flags)
    gap=nodes.time.min()-tmax
    if gap != 1:
        return -1
    dview = np.asarray(<double[:tables.nodes.next_node]>tables.nodes.time)
    nodes.append_columns(time=dview,
                        flags=np.ones(tables.nodes.next_node,dtype=np.uint32))
    
    
    lview = np.asarray(<double[:tables.edges.next_edge]>tables.edges.left)
    rview = np.asarray(<double[:tables.edges.next_edge]>tables.edges.right)
    pview = np.asarray(<int32_t[:tables.edges.next_edge]>tables.edges.parent)
    cview = np.asarray(<int32_t[:tables.edges.next_edge]>tables.edges.child)
    edges.append_columns(left=lview,
                        right=rview,
                        parent=pview,
                        child=cview)
    
    # We are trying to be as fast as possible here,
    # so we'll use the more cumbersome 
    # append_columns interface instead of the 
    # much slower (but easier to understand)
    # add_rows
    cdef size_t nsites = len(sites)
    if tables.mutations.next_mutation > 0:
        encoded, offset = msprime.pack_bytes(metadata)
        dview = np.asarray(<double[:tables.mutations.next_mutation]>tables.mutations.pos)
        sites.append_columns(position=dview,
                            ancestral_state=np.zeros(len(dview),dtype=np.int8)+ord('0'),
                            ancestral_state_offset=np.arange(len(dview)+1,dtype=np.uint32))
        pview = np.asarray(<int32_t[:tables.mutations.next_mutation]>tables.mutations.node)
        mutations.append_columns(site=np.arange(tables.mutations.next_mutation,
                                                dtype=np.int32)+nsites,
                                node=pview,
                                derived_state=np.ones(len(dview),
                                                      dtype=np.int8)+ord('0'),
                                derived_state_offset=np.arange(len(dview)+1,
                                                              dtype=np.uint32),
                                metadata_offset=offset, metadata=encoded
                                )
    
    msprime.sort_tables(nodes=nodes,edges=edges,
                       sites=sites,mutations=mutations)
    
    samples = np.where(nodes.time == 0)[0]
    msprime.simplify_tables(samples=samples.tolist(),
                            nodes=nodes,
                            edges=edges,
                            sites=sites,
                            mutations=mutations)
    
    # "clear" our temp containers
    tables.nodes.next_node = 0
    tables.mutations.next_mutation = 0
    tables.edges.next_edge = 0
                          
    return rv

def evolve(int N, int ngens, double theta, double rho, int gc, int seed):
    nodes = msprime.NodeTable()
    edges = msprime.EdgeTable()
    sites = msprime.SiteTable()
    mutations = msprime.MutationTable()
    
    cdef double mu = theta/<double>(4*N)
    cdef double r = rho/<double>(4*N)
    
    cdef int rv
    cdef size_t i, generation
    cdef Tables tables
    rv = init_Tables(&tables, seed)
    if rv != 0:
        free_Tables(&tables)
        raise RuntimeError("could not initialize tables")
        
    for i in range(2*<size_t>N):
        nodes.add_row(time=0.0,
                      flags=msprime.NODE_IS_SAMPLE)
        
    cdef int32_t next_offspring_index, first_parental_index
    next_offspring_index = len(nodes)
    first_parental_index = 0
    cdef size_t parent1, parent2,pindex
    cdef int32_t p1g1, p1g2, p2g1, p2g2
    cdef dict lookup = {}
    cdef list metadata = []
    cdef size_t last_gen_gc = 0
    for generation in range(<size_t>(ngens)):
        if generation>0 and generation%gc == 0.0:
            rv = simplify(&tables,
                         generation-last_gen_gc,
                         metadata,
                         nodes,edges,sites,mutations)
            if rv != 0:
                free_Tables(&tables)
                raise RuntimeError("simplification error")
            lookup = {spos:True for spos in sites.position}
            # print(sites.position)
            # print(lookup)
            # print("lookup reset to",len(lookup),len(sites))
            metadata.clear()
            last_gen_gc=generation
            next_offspring_index = len(nodes)
            first_parental_index = 0
        else:
            first_parental_index = next_offspring_index - 2*N
            
        for pindex in range(0,2*N,2):
            parent1=<size_t>gsl_ran_flat(tables.rng,0.0,<double>N)
            parent2=<size_t>gsl_ran_flat(tables.rng,0.0,<double>N)
            p1g1 = first_parental_index + 2*parent1
            p1g2 = p1g1 + 1
            p2g1 = first_parental_index + 2*parent2
            p2g2 = p2g1 + 1
            
            if gsl_rng_uniform(tables.rng) < 0.5:
                p1g1, p1g2 = p1g2, p1g1
            if gsl_rng_uniform(tables.rng) < 0.5:
                p2g1, p2g2 = p2g2, p2g1
                
            rv = make_offspring(mu,r,generation,
                                p1g1,p1g2,
                                next_offspring_index,
                                metadata,
                                lookup,
                                &tables)
            handle_error_code(rv,&tables)
            next_offspring_index+=1
            rv = make_offspring(mu,r,generation,
                                p2g1,p2g2,
                                next_offspring_index,
                                metadata,
                                lookup,
                                &tables)
            # print(generation,pindex,len(lookup),len(sites))
            assert(len(lookup)>=len(sites))
            handle_error_code(rv,&tables)
            next_offspring_index+=1
        
    if tables.nodes.next_node > 0:
        rv=simplify(&tables,
                    generation+1-last_gen_gc,
                    metadata,
                    nodes,edges,sites,mutations)
        if rv == -1:
            free_Tables(&tables)
            raise RuntimeError("simplification error")
    
    free_Tables(&tables)
    return msprime.load_tables(nodes=nodes,edges=edges,
                               sites=sites,mutations=mutations)

In [32]:
%%time
ts = evolve(100, 1000, 100.0, 100.0, 1, 42)

CPU times: user 2.26 s, sys: 518 ms, total: 2.78 s
Wall time: 2.78 s


Make sure that output is invariant to how often we simplify:

In [33]:
for gc in range(10,1000,29):
    print(gc)
    ts2 = evolve(100, 1000, 100.0, 100.0, gc, 42)
    print(len(ts.tables.sites),len(ts2.tables.sites))
    assert(ts2.tables.nodes == ts.tables.nodes)
    assert(ts2.tables.edges == ts.tables.edges)
    assert(ts2.tables.sites == ts.tables.sites)
    assert(ts2.tables.mutations == ts.tables.mutations)

10
767 767
39
767 767
68
767 767
97
767 767
126
767 767
155
767 767
184
767 767
213
767 767
242
767 767
271
767 767
300
767 767
329
767 767
358
767 767
387
767 767
416
767 767
445
767 767
474
767 767
503
767 767
532
767 767
561
767 767
590
767 767
619
767 767
648
767 767
677
767 767
706
767 767
735
767 767
764
767 767
793
767 767
822
767 767
851
767 767
880
767 767
909
767 767
938
767 739


AssertionError: 

In [None]:
%%time
ts = evolve(1000,10000,100.0,1000.0,1000,42)

In [None]:
%%prun -l 10 -s cumulative
ts = evolve(1000,10000,100.0,1000.0,1000,42)

In [None]:
mdata = msprime.unpack_bytes(ts.tables.mutations.metadata,
                             ts.tables.mutations.metadata_offset)

In [None]:
for i in mdata:
    md = struct.unpack('id',i)

# Comparison to msprime

In this section, we compare the distribution of outputs to msprime using [pylibseq](https://github.com/molpopgen/pylibseq), a Python interface to [libsequence](http://molpopgen.github.io/libsequence/)

In [None]:
from IPython.display import SVG
import msprime
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
from libsequence.polytable import SimData
from libsequence.summstats import PolySIM
from libsequence.msprime import make_SimData
import concurrent.futures
import pandas as pd
from collections import namedtuple

SummStats=namedtuple('SummStats',['S','pi','D','hprime','rmin'])

Let's take a quick tour of pylibseq:

In [None]:
# Simulate data with msprime
ts = msprime.simulate(10,mutation_rate=1,random_seed=666)

# Get it into the format expected by pylibseq
d = make_SimData(ts)

# This should look familiar! :)
print(d)

In [None]:
# Create object to calculate summary stats
x = PolySIM(d)
# Calculate a few:
print(x.thetapi(),x.tajimasd(),x.hprime(),x.rm())

In [None]:
%%time
msprime_raw_data=[]
for i in msprime.simulate(20,mutation_rate=100.0/4.0,
                          recombination_rate=100.0/4., #100.0/4.0,
                          num_replicates=1000,
                          random_seed=42):
    d = make_SimData(i)
    ps = PolySIM(d)
    # A little check that the two pieces of code agree
    assert(ps.numpoly() == i.num_mutations)
    msprime_raw_data.append(SummStats(ps.numpoly(),
                                      ps.thetapi(),ps.tajimasd(),
                                      ps.hprime(),ps.rm()))

To run the forward simulations, we will use multiple Python processes via Python 3's [`concurrent.futures`](https://docs.python.org/3/library/concurrent.futures.html) library. The short of it is that we need a Python function to send out to different processes and return results, which will be pickled into a future back in the main process.

In [None]:
def run_forward_sim(nreps,repid):
    """
    Run our forward sim, calculate
    a bunch of stats, and return 
    the list.
    """
    # Not the best seeding scheme, 
    # but good enough for now...
    np.random.seed(repid+1)
    msp_rng = msprime.RandomGenerator(repid+1)
    seeds = np.random.randint(0,1000000,nreps) * repid
    sims = []
    for i in range(nreps):
        ts = evolve(1000,20000,0.0,100.0,1000,seeds[i])
        samples = np.random.choice(2000,20,replace=False)
        ts2 = ts.simplify(samples=samples.tolist())
        n=msprime.NodeTable()
        e=msprime.EdgeTable()
        s=msprime.SiteTable()
        m=msprime.MutationTable()
        ts2.dump_tables(nodes=n,edges=e)
        mutgen = msprime.MutationGenerator(
            msp_rng, 100.0/(float(4*1000)))
        mutgen.generate(n,e,s,m)
        # print(len(n),len(e),len(s),len(m),n.time.max())
        ts2=msprime.load_tables(nodes=n,edges=e,sites=s,mutations=m)
        # print(samples)
        # print(n.time[samples])
        # print(s)
        # print(m)
        # Simplify from entire pop down
        # to random sample of n << 2N
        # slist = samples.tolist()
        # slist.append(8)
        # ts2=ts.simplify(slist)
        # print(ts2.num_mutations)
        # print(len(ts2.tables.nodes),
        #      len(ts2.tables.edges),
        #      len(ts2.tables.sites),
        #      len(ts2.tables.mutations))
        d=make_SimData(ts2)
        ps=PolySIM(d)
        sims.append(SummStats(ps.numpoly(),
                              ps.thetapi(),
                              ps.tajimasd(),
                              ps.hprime(),
                              ps.rm()))
    return sims

In [None]:
%%time
x=run_forward_sim(1,3511)
print(x)

In the next bit, we map our function into four separate processes.

**Note:** We could use a `concurrent.futures.ThreadPoolExecutor` instead of the process pool executor.  However, some of our Cython functions rely on Python types (`list`, `dict`, etc.), meaning that the Global Interpreter Lock is a barrier to efficient concurrency.  In practice, we've found it better to take the hit of pickling between processes so that your simulations can run at 100% CPU in different processes.

In [None]:
%%time
fwd_sim_data=[]
np.random.seed(101)
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
    futures = {executor.submit(run_forward_sim,250,i): i for i in range(4)}
    for fut in concurrent.futures.as_completed(futures):
        fn = fut.result()
        fwd_sim_data.extend(fn)

In [None]:
msprime_df = pd.DataFrame(msprime_raw_data)
msprime_df['engine'] = ['msprime']*len(msprime_df.index)
fwd_df = pd.DataFrame(fwd_sim_data)
fwd_df['engine']=['forward']*len(fwd_df)
summstats_df = pd.concat([msprime_df,fwd_df])

In [None]:
sns.set(style="darkgrid")
g = sns.FacetGrid(summstats_df,col="engine",margin_titles=True)
bins = np.linspace(summstats_df.pi.min(),summstats_df.pi.max(),20)
g.map(plt.hist,'pi',bins=bins,color="steelblue",lw=0);

In [None]:
len(fwd_df.index)

In [None]:
from scipy.stats import ks_2samp

In [None]:
print(summstats_df.groupby(['engine']).agg(['mean','std']))

The stats are clearly off:

In [None]:
ks_2samp(fwd_df.pi,msprime_df.pi)

In [None]:
ks_2samp(fwd_df.S,msprime_df.S)

In [None]:
ks_2samp(fwd_df.D,msprime_df.D)

In [None]:
ks_2samp(fwd_df.rmin,msprime_df.rmin)