### Introduction
This Workflow simulates putting the E. Coli core genome under the regulation of specific transcription factors and mutating certain reactions. The programm then calculates the changed metabolic flux. The changes can be visualized with an escher metabolic map.

### Set up symstem

In [None]:
import sys # loading commands to control/navigate within the system architecture
# Loading pandas, a library for data manipulation
from os.path import join
# import xlrd
import pandas as pd
# import lxml

# Loading numpy, a library fo manipulation of numbers
import numpy as np

# loading matplotlib, a library for visualization
import matplotlib.pyplot as plt
%matplotlib inline

from biolabsim import Host, Strain, Ecol
from biolabsim import measure_EnzymeLevel1, Help_GenomeGenerator, Strain, make_UpdateExpression, Help_StrainCharacterizer, Help_PromoterStrength, Help_Expr2Flux
from Bio.Seq import Seq
from pandas.core.frame import DataFrame
from copy import deepcopy

# import escher
# from escher import Builder

print('System ready')

In [None]:
# changed or new functions for this workflow
%run "TFMetEngSim.ipynb" 

print('Functions ready')

### List of reactions regulated by certain TFs
According to the aim of the experiment and the used medium you can choose the transcription factor(s) from this list.

In [None]:
# for 'CRP1'
CRP_activated = ['PFL','PGK','ACALD','ACKr','PPCK','AKGDH','CS','SUCCt2_2','SUCDi','SUCOAS','FBA','FBP','FORt2','FORt','FRUpts2','FUM','FUMt2_2','GAPD','GLCpts','GLNS','MALt2_2','MDH','PDH']
# for 'CRP2'
CRP_repressed = ['PPCK','AKGDH','GLCpts','GLNS','GLUDy','GLUSy','ICL','MALS','PDH']
# for 'Cra1'
Cra_activated = ['ACOMTa','ACONTb','PPCK','PPS','CYTBD','FRUpts2','GLCpts','ICDHyr','ICL','MALS']
# for 'Cra2'
Cra_repressed = ['PFK','PGK','ALCD2x','PPC', 'ACOMTa','ACONTb','PYK','ENO','TPI','FBA','FRUpts2','G6PDH2r','GAPD','GLCpts','PDH']
# for 'ArcA1'
ArcA_activated = ['PFL','ACKr','AKGDH','CYTBD','SUCDi','SUCOAS','FUM']
# for 'ArcA2'
ArcA_repressed = ['ACONTa','ACONTb','AKGDH','CS','SUCCt2_2','D_LACt2','SUCDi','SUCOAS','FUM','GLCpts','GLNS','ICDHyr','MALS','MALt2_2','MDH','NADH16','PDH']
# for 'FNR1'
FNR_activated = ['PFL','Plt2r','ALCD2x','ACKr','ACONTa','ACONTb','AKGDH','CYTBD','FRD7','FUM','GLUDy','NH4t','PDH']
# for 'FNR2'
FNR_repressed = ['PFL','AKGDH','CYTBD','SUCDi','SUCOAS','FORt2','FORt','G6PDH2r','GLNS','GLUSy','GND','NADH16','NH4t','PDH']

### Set up wildtype and add transcription factor regulation
Set the name of the transcriptionfactor as string on the second variable of `add_TranscriptionFactor()` and set the RctIDs of the targeted enzymes as list of strings as the third variable. Add more rows with more regulators if needed.

In [None]:
wtHost = Ecol()
print('Wild type growth rate: {:.2f}'.format(wtHost.strain.objective))

# adding regulator TF to dataframe and genome of wt
WTdf = wtHost.strain.genes_df
WTdf = add_TranscriptionFactor(WTdf, 'Cra1', Cra_activated, 'Activator')
WTdf = add_TranscriptionFactor(WTdf, 'Cra2', Cra_repressed, 'Repressor')

# update wildtype dataframe's Expression and Expr2Flux
WTdf = Update_DF(WTdf)
# putting the genes dataframe back to the wild type
wtHost.strain.genes_df = WTdf
# Generating new Genome
WTGenome_new = Help_GenomeGenerator(WTdf, 500, .6)
wtHost.strain.genome = Seq(WTGenome_new)

wtHost.strain.genes_df

# braucht noch Einstellung welche werte der TF bekommt

### Mutate the Host
Insert the RctIDs of the Proteins you want to mutate in `Mutant_Proteins` as a list of strings. 
Insert the mutated Sequence in `insertSeq` as a string.
The sequence to be replaced is standardized as the always present ATTGA.

In [None]:
# Generating mutant
# We will actually define the mutations by hand and delete the automatically generated genome for now

myHost = deepcopy(wtHost)

Mutant_Proteins = ['Cra1','Cra2']
insertSeq = 'CCCCC'

# mutate Promoter in Genome and genes_df
MutGenome, MTdf = Help_mutatePromoter(Mutant_Proteins, myHost.strain, insertSeq, cutSeq='ATTGA')
myHost.strain.genome = Seq(MutGenome)
myHost.strain.genes_df = MTdf.copy()

# updating expression of regulated enzymes
AllNewDF = measure_EnzymeLevel('Ecol', wtHost.strain, myHost.strain)
CorrRegExpr = calculate_mutatedRegulatedExpression(WTdf, AllNewDF)
AllNewDF['NewExpr'] = CorrRegExpr
AllNewDF['NewFlux'] = AllNewDF['NewExpr']*WTdf['Expr2Flux']
AllNewDF

## Calculation of mutant flux distribution
The updated expression activities are converted to reference flux values and according to the reference flux new boundaries are set in the GSMM. The wild type and mutant flux distributions are stored in a dictionary for visualization in Escher.

In [None]:
solution,_ = Help_FluxCalculator('Ecol', myHost.strain, AllNewDF)
RefFluxDict = WTdf.set_index('RctID')['Fluxes'].to_dict()
NewFluxDict = {a:b for a,b in zip(WTdf['RctID'].values, solution)}
AllFluxDict = [RefFluxDict, NewFluxDict]
# {WTdf['RctID'].tolist(): solution}

### Visualization of flux differences
Escher is used to visualize the flux differences on the central carbon metabolism map. Open the output-html file `Escher-Metabol-Map.html` by double click in the Jupyter Lab environment and `Trust` the file.

Then load the out.json file in Data -> load reaction data

In [None]:
# create json file of AllFluxDict

import json

flux_dictionary = AllFluxDict
with open('out2.json', 'w') as f:
    json.dump(flux_dictionary, f)
    
print('created json file')