# Chapters 6 and 7 Assignments

## Assignment Dictionaries

We know that the process of protein translation starts by transcribing a gene from DNA to RNA nucleotides, followed by translating the RNA codons to protein.
Conventionally, we write a DNA sequence from the 5’-end to the 3’-end. The transcription process, however, starts from the 3’-end of a gene to the 5’- end (anti-sense strand), resulting in a sense mRNA sequence complementing the sense DNA strand. This is because RNA polymerase can only add nucleotides to the 3’-end of the growing mRNA chain, which eliminates the need for the Okazaki fragments as seen in DNA replication.
Example: The DNA sequence ATGTCTAAA is transcribed into AUGUCUAAA.

Given a conversion table:

![](conversion_table.png)

and this 5’- to 3’-end DNA sequence of 717 nucleotides for the Green Fluorescent Protein (GFP) mutant 3 extracted from Aequorea victoria:

In [3]:
dna_sequence = (
         'ATGTCTAAAGGTGAAGAATTATTCACTGGTGTTGTCCCAATTTTGGTTGAATTAGATGGTGATGTTAATGGT'
         'CACAAATTTTCTGTCTCCGGTGAAGGTGAAGGTGATGCTACTTACGGTAAATTGACCTTAAAATTTATTTGT'
         'ACTACTGGTAAATTGCCAGTTCCATGGCCAACCTTAGTCACTACTTTCGGTTATGGTGTTCAATGTTTTGCT'
       'AGATACCCAGATCATATGAAACAACATGACTTTTTCAAGTCTGCCATGCCAGAAGGTTATGTTCAAGAAAGA'
         'ACTATTTTTTTCAAAGATGACGGTAACTACAAGACCAGAGCTGAAGTCAAGTTTGAAGGTGATACCTTAGTT'
         'AATAGAATCGAATTAAAAGGTATTGATTTTAAAGAAGATGGTAACATTTTAGGTCACAAATTGGAATACAAC'
         'TATAACTCTCACAATGTTTACATCATGGCTGACAAACAAAAGAATGGTATCAAAGTTAACTTCAAAATTAGA'
         'CACAACATTGAAGATGGTTCTGTTCAATTAGCTGACCATTATCAACAAAATACTCCAATTGGTGATGGTCCA'
         'GTCTTGTTACCAGACAACCATTACTTATCCACTCAATCTGCCTTATCCAAAGATCCAAACGAAAAGAGAGAC'
         'CACATGGTCTTGTTAGAATTTGTTACTGCTGCTGGTATTACCCATGGTATGGATGAATTGTACAAATAA'
 )

Use the DNA sequence and the conversion table to:
1. Write a Python script to transcribe this sequence to mRNA as it occurs in a biological organism. That is, determine the complementary DNA first, and use that to work out the mRNA.

2. Use the following dictionary in a Python script to obtain the translation (protein sequence) of the Green Fluorescent Protein using the mRNA sequence you obtained.

In [4]:
codon2aa = {
    "UUU": "F", "UUC": "F", "UUA": "L", "UUG": "L", "CUU": "L",
    "CUC": "L", "CUA": "L", "CUG": "L", "AUU": "I", "AUC": "I",
    "AUA": "I", "GUU": "V", "GUC": "V", "GUA": "V", "GUG": "V",
    "UCU": "S", "UCC": "S", "UCA": "S", "UCG": "S", "AGU": "S",
    "AGC": "S", "CCU": "P", "CCC": "P", "CCA": "P", "CCG": "P",
    "ACU": "T", "ACC": "T", "ACA": "T", "ACG": "T", "GCU": "A",
    "GCC": "A", "GCA": "A", "GCG": "A", "UAU": "Y", "UAC": "Y",
    "CAU": "H", "CAC": "H", "CAA": "Q", "CAG": "Q", "AAU": "N",
    "AAC": "N", "AAA": "K", "AAG": "K", "GAU": "D", "GAC": "D",
    "GAA": "E", "GAG": "E", "UGU": "C", "UGC": "C", "UGG": "W",
    "CGU": "R", "CGC": "R", "CGA": "R", "CGG": "R", "AGA": "R",
    "AGG": "R", "GGU": "G", "GGC": "G", "GGA": "G", "GGG": "G",
    "AUG": "<Met>", "UAA": "<STOP>", "UAG": "<STOP>", "UGA": "<STOP>"
}

In [7]:
dna2cdna = {
    'A': 'T',
    'C': 'G',
    'G': 'C',
    'T': 'A'
}


dna2mrna = {
    'A': 'U',
    'T': 'A',
    'G': 'C',
    'C': 'G'
}

In [12]:
#1. 

cDNA_sequence = str()
rna_sequence = str()

for cdna in dna_sequence:
    cDNA_sequence += dna2cdna[cdna]

for rna in cDNA_sequence:
    rna_sequence += dna2mrna[rna]
print(rna_sequence)
    

AUGUCUAAAGGUGAAGAAUUAUUCACUGGUGUUGUCCCAAUUUUGGUUGAAUUAGAUGGUGAUGUUAAUGGUCACAAAUUUUCUGUCUCCGGUGAAGGUGAAGGUGAUGCUACUUACGGUAAAUUGACCUUAAAAUUUAUUUGUACUACUGGUAAAUUGCCAGUUCCAUGGCCAACCUUAGUCACUACUUUCGGUUAUGGUGUUCAAUGUUUUGCUAGAUACCCAGAUCAUAUGAAACAACAUGACUUUUUCAAGUCUGCCAUGCCAGAAGGUUAUGUUCAAGAAAGAACUAUUUUUUUCAAAGAUGACGGUAACUACAAGACCAGAGCUGAAGUCAAGUUUGAAGGUGAUACCUUAGUUAAUAGAAUCGAAUUAAAAGGUAUUGAUUUUAAAGAAGAUGGUAACAUUUUAGGUCACAAAUUGGAAUACAACUAUAACUCUCACAAUGUUUACAUCAUGGCUGACAAACAAAAGAAUGGUAUCAAAGUUAACUUCAAAAUUAGACACAACAUUGAAGAUGGUUCUGUUCAAUUAGCUGACCAUUAUCAACAAAAUACUCCAAUUGGUGAUGGUCCAGUCUUGUUACCAGACAACCAUUACUUAUCCACUCAAUCUGCCUUAUCCAAAGAUCCAAACGAAAAGAGAGACCACAUGGUCUUGUUAGAAUUUGUUACUGCUGCUGGUAUUACCCAUGGUAUGGAUGAAUUGUACAAAUAA


In [13]:
#2

protein_sequence = str()
final_protein_sequence = str()
for n in range(0, len(rna_sequence), 3):
    codon = rna_sequence[n: n+3]
    protein_sequence += codon2aa[codon]

print(protein_sequence)

for aa in protein_sequence:
    final_protein_sequence += aa
    if aa == '<STOP>':
        print(final_protein_sequence)
        

    

<Met>SKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFGYGVQCFARYPDH<Met>KQHDFFKSA<Met>PEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYI<Met>ADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDH<Met>VLLEFVTAAGITHG<Met>DELYK<STOP>


## Assignment Functions

Write a function with the following features:

- Call the function `get_basic_stats()` and let it take one input argument which, however, may contain any number of input arrays, e.g. a tuple of arrays.

- Using a for loop, for each of the arrays calculate the mean and the variance using the functions ‘mean’ and ‘variance’ given above, i.e. call those functions from within the function `get_basic_stats()`.

- Calculate the standard deviation for each array as the square root of the variance. You will have to import the function sqrt from module math.

- Return a single array containing (in that order) the mean, the variance, and the standard deviation for each array.

To test the function, combine three arrays in a tuple as follows:


In [None]:
my_arrays = (
    [1, 2, 3, 4, 5],
    [7, 7, 7, 7],
    [1.0, 0.9, 1.2, 1.12, 0.95, 0.76],
)

Call the function get_basic_stats() with this tuple as argument and write the output to a variable. Display the results in the following form:

```
STD of array' index, ':' STD
```

The result for the above arrays should be:


```
STD of array 0 :  1.4142135623730951
STD of array 1 :  0.0
STD of array 2 :  0.14357537702854514
```

In [2]:
def mean(arr):
    """
    Calculates the mean of an array.

    :param arr: Array of numbers.
    :type arr: list, tuple, set
    :return: Mean of the values in the array.
    :rtype: float
    """
    summation = sum(arr)
    length = len(arr)

    result = summation / length

    return result

def variance(arr):
    """
    Calculates the variance of an array.

    :param arr: Array of numbers.
    :type arr: list, tuple, set
    :return: Variance of the values in the array.
    :rtype: float
    """
    arr_mean = mean(arr)
    denominator = len(arr)

    numerator = 0

    for num in arr:
        numerator += (num - arr_mean) ** 2

    result = numerator / denominator

    return result

def stan_dev(arr):
    """
    Calculates the standard deviation of an array.

    :param arr: Array of numbers.
    :type arr: list, tuple, set
    :return: Standard deviation of the values in the array.
    :rtype: float
    """
    from math import sqrt

    var = variance(arr)

    result = sqrt(var)

    return result

In [8]:
def get_basic_stats(nested_array):
    """
    Calculates the basic statistics, including mean, varraiance and standard deviation, of a nested array.
    
    :param arr: Array of numbers.
    :type arr: list, tuple, set
    :return: basic statistics of the values in the array.
    :rtype: list """
    
    result = list()
    for inner_arr in nested_array:
        ia = inner_arr
        mean_ia = mean(ia)
        var_ia = variance(ia)
        stan_dev_ia = stan_dev(ia)
        reult = result.append((mean_ia, var_ia, stan_dev_ia))
    return result

In [10]:
my_arrays = (
    [1, 2, 3, 4, 5],
    [7, 7, 7, 7],
    [1.0, 0.9, 1.2, 1.12, 0.95, 0.76],
)

basic_stats_my_arrays = get_basic_stats(my_arrays)
bsma = basic_stats_my_arrays

for res in bsma:
    print('STD of array ', bsma.index(res), ': ', res[2])

STD of array  0 :  1.4142135623730951
STD of array  1 :  0.0
STD of array  2 :  0.14357537702854514
