<a name='top'></a>
<h1>Sequence Simulator: a Training Notebook</h1>
<p>...</p>
<h3><b>Sections:</b></h3>
<ul>
    <li><a href='#Imports'>Imports</a></li>
    <li><a href='#Basic'>Basic simulations</a></li>
    <li><a href='#Results'>Retrieving results</a></li>
    <li><a href='#Vars'>Variations</a></li>
</ul>

<a name='Imports'></a>
<h2>Imports</h2>
<a href='#top'>Back to top</a>

In [1]:
#import the Python modules for running the models
import sequence_simulator as ss
from concentrations import concentrations_generator as cg
#some other Python packages are required to run the scripts below
import pandas as pd
import matplotlib.pyplot as plt

<a name='Basic'></a>
<h2>Basic simulation</h2>
<a href='#top'>Back to top</a>

<p>To run a basic simulation the following need to be provided:</p>
<ul>
    <li>Codon-specific tRNA concentrations need to be calculated from tRNA concentration data, which need to be provided via a .csv-file such as the enclosed 'N_GCN_Scer.csv' file.</li>
    <li>A sequence to be simulated needs to be specified.</li>
    <li>Translation intiation and termination rates need to be set.</li>
    <li>An exit conditions needs to be set (a condition tellign the simulator when to stop simulating.</li>
</ul>

<p>The processing of tRNA abundance data is identical to the codon simulator.</p>

In [2]:
tRNAs = pd.read_csv('N_GCN_Scer.csv')
codons = pd.read_csv('codons.csv')
matrices = cg.make_matrix(tRNAs, codons)
concs = cg.make_concentrations(matrices, tRNAs, codons, concentration_col_name='Seq').to_csv(path_or_buf = None)

<p>A basic simulation is set up as follows:</p>

In [3]:
#instantiate an instance of the simulator class
sim = ss.SequenceSimulator()
#load the calcuated tRNA concentratiosn for each codon
sim.load_concentrations_from_string(concs)
#specify the RNA sequence to be simulated
sim.input_MRNA("ATGTTCAGCGAATTAATTAACTTCCAAAATGAAGGTCATGAGTGCCAATGCCAATGTGGTAGCTGCAAAAATAATGAACAATGCCAAAAATCATGTAGCTGCCCAACGGGGTGTAACAGCGACGACAAATGCCCCTGCGGTAACAAGTCTGAAGAAACCTGA")
#specify translation initiation and termination rates to be used
sim.set_initiation_rate(2)
sim.set_termination_rate(100)
#specify a termination condition
sim.set_time_limit(150) # simulate 2.5 minutes (150 seconds)
#run the simulation
sim.run()

<a name='Results'></a>
<h2>Retrieving results</h2>
<a href='#top'>Back to top</a>

<p>Results can be retrieved in varous ways, depending on which parameters are of interest.</p>

<p><b>Changes in ribosome positions</b>, and the time elapsed between changes, can be accessed via the 'ribosome_positions_history' and 'dt_history' attributes of sim.</p>

In [4]:
positions = sim.ribosome_positions_history
timesteps = sim.dt_history
#these can usefully be inspected in a dataframe
pd.DataFrame({"Ribosomes positions": positions, "Time elapsed in secs":timesteps})

Unnamed: 0,Ribosomes positions,Time elapsed in secs
0,[],0.000000
1,[0],0.326240
2,[1],0.813548
3,[2],0.145209
4,[3],0.038119
...,...,...
799,"[5, 16, 26, 36, 51]",0.016009
800,"[6, 16, 26, 36, 51]",0.555027
801,"[6, 16, 26, 36, 52]",0.300657
802,"[6, 16, 26, 36, 53]",0.002872


<p>In this case the simulation startes with an empty RNA ('[]'), after 0.65 second sthe firdst ribosome initiates and takes position  0, after a further 0.14 seconds the ribosome moves from position 0 to position 1, etc. Entries with more than one number, eg '[9, 20, 32, 43]' denote polysomes with multiple ribosomes and their respectie positions.</p>

<p>The <b>average decoding time of each codon</b> in the sequence can be accessed using the 'average_times' attribute of sim.</p>

In [5]:
#print average deocding times of the first ten codons
print(sim.average_times[:10])

[0.5081542134284973, 0.5131803154945374, 1.6684316396713257, 0.7047904133796692, 1.0152301788330078, 0.7200312614440918, 0.7845287919044495, 0.7498359680175781, 1.1521186828613281, 0.5792173147201538]


<p>The <b>average decoding time for the entire mRNA</b> can be accessed usign the 'get_elongation_duration' method of sim. This method returns a tuple of lists, with the first list containing the times it took individual codons to translate the sequence, and the first list containing the indices of the timesteps at which this ribosome terminated.</p>

In [6]:
durations = sim.get_elongation_duration()[0]
indices = sim.get_elongation_duration()[1]
pd.DataFrame({"Elongation duration": durations, "Index":indices})

Unnamed: 0,Elongation duration,Index
0,29.724009,1
1,32.152184,13
2,31.624758,35
3,32.171276,65
4,36.445595,109
5,38.107761,173
6,39.941414,222
7,35.722572,287
8,38.269253,370
9,42.732876,402


<a name='Vars'></a>
<h2>Variations</h2>
<a href='#top'>Back to top</a>

<p><b>Reading in sequences from fasta files</b>. As well as specifying sequences as strings, sequences can be read in from fasta formatted files. 'path_to_fasta_file' must be preplaced with a valid file path for this cell to eexecute without error.</p>

In [7]:
# for a fasta file with a single sequence:
#sim.load_MRNA(path_to_fastaFile) 
#for a fasta file with multiple sequences:
#sim.load_MRNA(path_to_fastaFile, "sequence_name")

<p><b>Setting different stop conditions</b>. Of the different methods for setting stop conditions, only one can be used at the same time - applying any of these methods overrides all previously applied methods.</p>

In [8]:
#simulate for 100 second (system time)
sim.set_time_limit(100)
#simulate until 500 ribosomes have translateed the mRNA
sim.set_finished_ribosomes(500)
#simulate until 5000 ribosome movements have ocurred
sim.set_iteration_limit(5000)

<p><b>Changing rate parameters.</b> This is done similarly to the codon simulator, but the 'get_propensities' method of the sequence simulator retrieves one parameter set for each codon of the sequence. Propensities can be changed with the 'set_propensities' method, eitherfor all codons in parallel, or for a single codon only.</p>

In [9]:
#retrieve the inbuilt reaction rates (propensities)
propensities = sim.get_propensities();
#print the propensities of the sixth codon
print(propensities[5])

{'WC1f': 1230.60498046875, 'WC1r': 85.0, 'WC2f': 190.0, 'WC2r': 0.23000000417232513, 'WC3f': 260.0, 'WC4f': 1000.0, 'WC5f': 1000.0, 'WC6f': 1000.0, 'WCdiss': 60.0, 'dec7f': 200.0, 'near1f': 5111.74365234375, 'near1r': 85.0, 'near2f': 190.0, 'near2r': 80.0, 'near3f': 0.4000000059604645, 'near4f': 1000.0, 'near5f': 1000.0, 'near6f': 60.0, 'neardiss': 1000.0, 'non1f': 20162.990234375, 'non1r': 100000.0, 'trans1f': 2040.0, 'trans1r': 140.0, 'trans2': 250.0, 'trans3': 350.0, 'trans4': 1000.0, 'trans5': 1000.0, 'trans6': 1000.0, 'trans7': 1000.0, 'trans8': 1000.0, 'trans9': 1000.0, 'wobble1f': 94.66191864013672, 'wobble1r': 85.0, 'wobble2f': 190.0, 'wobble2r': 1.0, 'wobble3f': 25.0, 'wobble4f': 1000.0, 'wobble5f': 1000.0, 'wobble6f': 6.400000095367432, 'wobblediss': 1.100000023841858}


In [10]:
#set the 'WCdiss' reaction rate for the second codon to 1000
propensities[1]['WCdiss'] = 5
sim.set_propensities(propensities)

# OR #

#set the 'WCdiss' reaction rates for all codons to 1000
for propensity in propensities:
    propensity['WCdiss'] = 5
#set the new propensities in the simulator
sim.set_propensities(propensities)

<p><b>Controlling the starting ribosome population</b>. If not further specifieed, the simulation startes with an empty sequence in which case it takes time for the system to reach a steady state. Parameters like ribosome collisions will be sensitive to this. The transition to the steady state can be accelerated by pre-seeding the sequence with ribosomes, either with the deterministic ribosome density, or at positions explicitly specified by the user.</p>

In [11]:
# prepolulate to steady state
sim.set_prepopulate(True)

# OR #

# set ribosomes to specified initial positions
sim.set_ribosome_positions([1,11, 21])

<p><b>Model all non-cognate reactions.</b> Ribosomal reactions with non-cognates make up the majority of reactions during a translation cycle, but because non-cognate contacts are very fleeting their contribution to ribosomal step times is negligible. Disregarding non-cognate reactions 9the default) accelerates computation time without notably affecting results. Modelling of non-cognates can be reinstated with the 'set_nonCognate' method of the simulator.</p>

In [12]:
# enable the non-cognate pathway. Disabled by default
sim.set_nonCognate(True) 

<p><b>Record ribosome states in detail</b>. By default, the simulation only records when ribosomes change position. It is also possible to record all internal state changes by calling the 'log_codon_states' method of sim with the True parameter.</p>

In [13]:
sim.set_log_codon_states(True) #if sim is run now, internal state changes of all ribosomes are recorded 
                                #(this greatly increases the amount of data generated).
sim.set_log_codon_states(False) # sets the simulator back to the default. 