Skip to content
Martin Bagic edited this page Aug 18, 2022 · 1 revision

The model description is split into two parts. The first part, Data, specifies which data AEGIS is manipulating and how the data is structured. The second part, Algorithm, explains how the data is being manipulated during the simulation.

Data

Core data in AEGIS can be captured by three variables — ages $\vec a$, genomes $\bf G$ and phenotypes $\bf P$. These encode for individual-specific life histories. AEGIS also computes some derived variables – notably the age structure, death tables, average genomes and median phenotypes.

Ages

$\vec a = (a_i) \in [1,L]$ where $i \in [1,n]$; $n$ is the population size at a specific time point and $L$ is the maximum attainable lifespan (parameter MAX_LIFESPAN).

Genomes

$$ \mathbf G = (g_{i,b}) = \begin{bmatrix} g_{1,1} & g_{1,2} & ... & g_{1,B} \\ g_{2,1} & g_{2,2} & ... & g_{2,B} \\ \vdots & \vdots & \ddots & \vdots \\ g_{n,1} & g_{n,2} & ... & g_{n,B} \\ \end{bmatrix} = \begin{bmatrix} \vec g_1 \\ \vec g_2 \\ \vdots \\ \vec g_n \end{bmatrix} $$

where $\vec g_i$ is the genome of the individual $i$ and $n$ is the population size at a specific time point. $B \in \mathbb{N}$ is the number of bits per genome. $g_{i,b} \in {0, 1}$ denotes the state of the bit — $0$ if inactive and $1$ if active. When a mutation occurs the bit switches from $0$ to $1$ or from $1$ to $0$.

Phenotypes

$$ \mathbf P = (p^s_{i,b}, p^r_{i,b}) = \begin{bmatrix} p_{1,1}^s & p_{1,2}^s & ... & p_{1,L}^s, & p_{1,1}^r & p_{1,2}^r & ... & p_{1,L}^r \\ p_{2,1}^s & ... & ... & p_{2,L}^s, & p_{2,1}^r & ... & ... & p_{2,L}^r \\ \vdots & & & & \vdots \\ p_{n,1}^s & ... & ... & p_{n,L}^s, & p_{n,1}^r & ... & ... & p_{n,L}^r \end{bmatrix} = \begin{bmatrix} \vec p_1 \\ \vec p_2 \\ \vdots \\ \vec p_n \end{bmatrix} $$

for $i \in [1,n]$; $n$ is the population size at a specific time point. An individual $i$ at age $a_i$ has a probability $p_{i,a_i}^r$ to reproduce and a probability $p^s_{i,a_i}$ to survive and reach age $a_i + 1$. $p_{i,a_i}^{trait} \in [0,1]$ for $trait \in \{s,r\}$.

$\bf P$ can be computed from $\bf G$ with a matrix multiplication $\bf G \bf M + \bf P_0 = \bf P$. $\bf P_0$ is the initial phenotype where $\bf P_0$ represents a matrix of the same shape as $\bf P$ but with parameters $p^s \in [0,1]$ and $p^r \in [0,1]$ being the default survival and reproduction probabilities, when whole genome is filled with zeros $(g_{i,b} = 0; \forall i, \forall b)$.

$$ \mathbf{M} = (m_{b,l}) = \begin{bmatrix} m_{1,1} & m_{1,2} & ... & m_{1,2L} \\ m_{2,1} & m_{2,2} & ... & m_{2,2L} \\ \vdots & \vdots & \ddots & \vdots \\ m_{B,1} & m_{B,2} & ... & m_{B,2L} \\ \end{bmatrix} $$ where $m_{b,l} \in \mathbb{R}$. Note that locus $b$ is pleiotropic when there are two or more $a$’s such that $m_{a,b} \ne 0$; i.e. it affects multiple traits. $$

Age structure

$$\begin{bmatrix} A_1 & A_2 & ... & A_L \end{bmatrix}$$ where $A_x = |{i\ \forall a_i = x}|$ is the number of individuals of age $x$.

Death tables

$$\begin{bmatrix} D_1^{c} & D_2^c & ... & D_L^c \end{bmatrix}$$ where $D_x^c = |{i\ \forall a_i = x, i \text{ died}, \text{cause of death is } c} |$ is the number of individuals of age $x$ that died due to overcrowding or genetics; $c \in \{ \text{overcrowding}, \text{genetics} \}$.

Average genomes

$$\begin{bmatrix} \frac{1}{n} \Sigma_i^n g_{i,1} & \frac{1}{n} \Sigma_i^n g_{i,2} & ... & \frac{1}{n} \Sigma_i^n g_{i,B} \end{bmatrix}$$ where $\frac{1}{n} \Sigma_i^n g_{i,b}$ is the proportion of the population that has a bit $1$ at position $b$.

Median phenotypes

$$\begin{bmatrix} med_i (p_{i,1}^s) & ... & ... & med_i (p_{i,L}^s), & med_i (p_{i,1}^r) & ... & ... & med_i (p_{i,L}^r) \end{bmatrix}$$ where $med_i (p_{i,x}^s)$ is the population-median probability to survive at age $x$ and $med_i (p_{i,x}^r)$ is the population-median probability to reproduce at age $x$.

Algorithm

The simulation is not continuous but discrete; i.e. it runs in stages. The number of stages is defined by the parameter STAGES_PER_SIMULATION_. During every stage, a few steps are performed (relevant code in Ecosystem.run_stage). These are described below.

Overcrowding check

If the population size $n$ exceeds the maximum population size MAX_POPULATION_SIZE, individuals incur a survival penalty – i.e. some individuals will die. The number and the selection of individuals to kill depends on the parameter OVERSHOOT_EVENT. The default setting of that parameter is starvation in which the number of dying individuals increases with time (as long as the population size exceeds the specified threshold) and the selection of individuals is random (young are equally susceptible to it as old). However, other settings are available in which selection can be biased toward young or old and in which the number of dying individuals is more or less.

Intrinsic mortality

Each individual $i$ has a genetically encoded probability to survive/die $p_{i, a_i}^s$ when they are of age $a_i$.

Reproduction

Each individual $i$ has a genetically encoded probability to reproduce $p_{i, a_i}^r$ at age $a_i$ as long as they have reached the age of maturity $(a_i >$MATURATION_AGE$)$. Reproduction can be sexual or asexual, depending on the parameter REPRODUCTION_MODE.

If the reproduction is asexual, the offspring genomes are constructed by taking copies of the parental genomes and then introducing random mutations.

If the reproduction is sexual, the construction of offspring genomes is more complex (relevant code in Reproducer.__call__) as it also entails recombination and random pairing. First, every reproducing individual emits two germ cells into the genetic pool. Each germ cell is generated by copying the genome, recombining it (at the rate set by RECOMBINATION_RATE) and then splitting it making it haploid while discarding the other half. Next, germ cells randomly fuse to create diploid genomes. No self-fertilization is allowed to occur, so if only one individual can reproduce, no actual reproduction occurs. After pairing, random mutations are introduced into the constructed genomes.

In both sexual and asexual reproduction, every bit has some probability of mutating after the genomes are generated. $p^m_{0\rightarrow1}$ is the probability of a bit $0$ to mutate to $1$ and $p^m_{1\rightarrow0}$ the probability of a bit $1$ to mutate to $0$. Their ratio $p^m_{0\rightarrow1}$ : $p^m_{1\rightarrow0}$ is set by the parameter MUTATION_RATIO, and their sum probability $p^m_{0\rightarrow1} + p^m_{1\rightarrow0}$ is set by the parameter G_muta_initial.

Aging

Each individual $i$ ages, so that their age $a_i$ is incremented becoming $a_i+1$. Individuals whose age exceeds the maximum attainable lifespan $(a_i > $MAX_LIFESPAN$)$ are removed from the population.

Data recording

After the whole stage is executed, collected data is saved as is described here.