Skip to content
Martin Bagic edited this page Aug 18, 2022 · 1 revision

Folder tree

This subsection describes how the files are organized into subfolders.

{path/to/file}/
    progress.log
    {ecosystem-number}/
        output-summary.json
        phenomap.csv
        snapshots/
            demography/
                {stage}.feather
                ...
            genotypes/
                {stage}.feather
                ...
            phenotypes/
                {stage}.feather
        visor/
            genotypes.csv
            phenotypes.csv
            spectra/
                age_at_birth.csv
                age_at_end_of_sim.csv
                age_at_genetic.csv
                age_at_overshoot.csv
                age_at_season_shift.csv
                cumulative_ages.csv
        pickles/
            {stage}
            ...
        popgen/
            allele_frequencies.csv
            genotype_frequencies.csv
            reference_genome_gsample.csv
            reference_genome.csv
            sfs.csv
            simple.csv

{path/to/file} is the path to the directory set by the user when they run the command aegis {path/to/file} to start the simulation. {stage} is the stage at which the file was recorded.

Example files can be found here. 🚧

File types

Output files either contain cross-sectional (data at the time of recording) or longitudinal data (data accumulated over multiple stages).

Snapshots

Snapshots are files that contain cross-sectional data. There are three kinds of files created - genotypes, phenotypes and demography. They are created at the frequency set by the parameter SNAPSHOT_RATE_ and they are saved in .feather format.

The genotypes files contain $\bf G$ (as is described here); i.e., genomes of all individuals at the time of recording (first column containing the data for the first bit, second column for the second bit, etc.), while the phenotypes contain $\bf P$; i.e., all the phenotypes (first column containing the first value from the .phenotype variable, second column the second value, etc.).

Visor files

Visor files contain longitudinal data. As the simulation is progressing, data is being accumulated and periodically recorded as one row and appended to the relevant tables; the frequency of appending is determined by the parameter VISOR_RATE_.

  • genotypes.csv contains average genomes (described here); i.e., rows containing proportion of 1s for every position in the genome
  • phenotypes.csv contains median phenotypes (described here); i.e., rows containing the median phenotypic value for every trait in the phenotype

Spectra files

Spectra files contain longitudinal data – age structure and multiple death tables (one for each cause of death) and a birth table. All structures are averaged over the period of recording.

  • cumulative_ages.csv contains the age structure averaged over the recording period

  • age_at_genetic.csv contains the death structure for death caused by mutations

  • age_at_overshoot.csv contains the death structure for death caused by overcrowding

  • age_at_season_shift.csv contains the death structure for death caused by reaching the end of the season (applicable when annualism is activated)

  • age_at_birth.csv contains the birth table; i.e., the number of offspring born to parents at various ages from 0 to MAX_LIFESPAN

Pickles

These binary files contain Python pickles of the Ecosystem instances recorded at various stages. These can be used to initialize the population of a new simulation.

Popgen

These files contain population genetic statistics about the simulated population.

Miscellaneous files

progress.log

This file is updated continuously during the simulation and can be used to monitor the progress of the simulation. It contains information about the estimated time to completion (ETA), time needed to run one million stages at the current speed (t1M), current runtime, and the speed of simulation (number of stages per minute, stg/min). LOGGING_RATE_ determines the frequency of writing to this file.

output-summary.json

This file is created at the end of the simulation and it contains simple information about the simulation - such as extinction status of the population, random seed used for the random number generator and the time at the start and end of the simulation.

phenomap.csv

This file contains the pleiotropic map calculated from the PHENOMAP_SPECS parameter. When that parameter is [], then there is no pleiotropy, and the map is simply an identity matrix.