# Bioinformatics Project #1
## 1. Introduction

At the heart of every living organism is DNA (deoxyribonucleic acid). It is double stranded and consists of 4 nucleotides, Adenine, Thymine, Cytosine, and Guanine (ATCG).  One strand runs from 5’ to 3’, the other 3’ to 5’, this is based on the orientation of the sugar/phosphate backbone.  Adenine and Guanine are purines, Thymine and Cytosine are pyrimidines.  Adenine always pairs with Thymine, and Guanine always pairs with Cytosine, vice-versa. <br> <br>

Using python, you will be replicating the central dogma of Biology by taking a strand of DNA, and following it through replication, transcription, and translation.  Python skills will involve string manipulation, looping and basic calculations. <br>


## 2. Counting DNA Base Pairs

Given this 99 base pair 3’ strand of DNA, count the number of A, T, C, and G’s.  Return the counts as well as the total percentage of the nucleic acids. <br>

Hint: you can store counts of items using a dictionary. `{‘A’:0,’T’:0...}`<br>

`3’-TACTCTCGTTCTTGCAGCTTGTCAGTACTTTCAGAATCATGGTGTGCATGGTAGAATGACTCTTATAACGAACTTCGACATGGCAATAACCCCCCGATT-5’`


In [1]:
dna_string = 'TACTCTCGTTCTTGCAGCTTGTCAGTACTTTCAGAATCATGGTGTGCATGGTAGAATGACTCTTATAACGAACTTCGACATGGCAATAACCCCCCGATT'

In [2]:
dna_dict = {'A':0, 'T':0, 'C':0, 'G':0}

In [3]:
for letter in dna_string:
    value = dna_dict[letter]
    value +=1
    dna_dict.update({letter:value})

In [4]:
for letter in dna_dict.keys():
    print(f'{letter} occurs {dna_dict[letter]} times')

A occurs 25 times
T occurs 31 times
C occurs 24 times
G occurs 19 times


In [9]:
string_sum = len(dna_string) # counts the number if items in the string.

for letter in dna_dict.keys():
    print(f'{letter} is {(int(dna_dict[letter])/string_sum)*100:.2f}% of total string') #':.2f' returns rounded 100th place

A is 25.25% of total string
T is 31.31% of total string
C is 24.24% of total string
G is 19.19% of total string


## 3. Replication: Creating Complement DNA strands <br>

Given the above 99 base pair template strand (3’ to 5’) of DNA, return the complementary coding strand. For example, if the template strand has an A, then your coding strand at the same position will be a T.


In [6]:
comp_dict = {'A':'T','T':'A','C':'G','G':'C'} # dictionary of conversions for complement strand

In [7]:
dna_complement = '' # initialize an empty string

for letter in dna_string:
    dna_complement = dna_complement + comp_dict[letter]
    
print(f'complement strand is: {dna_complement}')

complement strand is: ATGAGAGCAAGAACGTCGAACAGTCATGAAAGTCTTAGTACCACACGTACCATCTTACTGAGAATATTGCTTGAAGCTGTACCGTTATTGGGGGGCTAA


## 4. Transcription: Turning DNA to mRNA sequences 

RNA polymerase reads DNA from 3’ to 5’  direction. The mRNA created conversely is 5’ to 3’, the sequence should be the same as the coding strand except Thymine is replaced by Uracil. Return the transcribed mRNA fragment from the template strand of DNA.  Remember, A is paired with Uracil in mRNA. 


In [8]:
# lazy way is to take the complement coding strand and change all the T's to U's
mRNA = dna_complement.replace('T','U')
print(f'mRNA strand is: {mRNA}')

mRNA strand is: AUGAGAGCAAGAACGUCGAACAGUCAUGAAAGUCUUAGUACCACACGUACCAUCUUACUGAGAAUAUUGCUUGAAGCUGUACCGUUAUUGGGGGGCUAA


## 5. Translation: Converting mRNA sequences to proteins

Protein translation occurs when ribosomes attach to the mRNA and bring in tRNA.  The tRNA has an amino acid attached to the protein.  These amino acids are fused together into a polypeptide to create a protein.  tRNA reads mRNA with a triplet coding system.  Using the below chart, Translate the mRNA into an amino acid chain.


In [10]:
# create a translation dictionary
translation = {
    'UUU':'PHE','UUC':'PHE',
    'UUA':'LEU','UUG':'LEU','CUU':'LEU','CUC':'LEU','CUA':'LEU','CUG':'LEU',
    'AUU':'ILE','AUC':'ILE','AUA':'ILE',
    'AUG':'MET',
    'GUU':'VAL','GUC':'VAL','GUA':'VAL','GUG':'VAL',
    'UCU':'SER','UCC':'SER','UCA':'SER','UCG':'SER',
    'CCU':'PRO','CCC':'PRO','CCA':'PRO','CCG':'PRO',
    'ACU':'THR','ACC':'THR','ACA':'THR','ACG':'THR',
    'GCU':'ALA','GCC':'ALA','GCA':'ALA','GCG':'ALA',
    'UAU':'TYR','UAC':'TYR',
    'UAA':'STOP','UAG':'STOP','UGA':'STOP',
    'CAU':'HIS','CAC':'HIS',
    'CAA':'GLN','CAG':'GLN',
    'AUA':'ASN','AAC':'ASN',
    'AAA':'LYS','AAG':'LYS',
    'GAU':'ASP','GAC':'ASP',
    'GAA':'GLU','GAG':'GLU',
    'UGU':'CYS','UGC':'CYS',
    'UGG':'TRP',
    'CGU':'ARG','CGC':'ARG','CGA':'ARG','CGG':'ARG',
    'AGU':'SER','AGC':'SER',
    'AGA':'ARG','AGG':'ARG',
    'GGU':'GLY','GGC':'GLY','GGA':'GLY','GGG':'GLY'
}

In [16]:
split_mRNA = [mRNA[i:i+3] for i in range(0,len(mRNA),3)] # split mRNA into a list
translated = [translation[i] for i in split_mRNA]        # iterate over list to get translation

# return as a string
AA_string = ''
for i in translated:
    AA_string = AA_string + i + ' '

AA_string

'MET ARG ALA ARG THR SER ASN SER HIS GLU SER LEU SER THR THR ARG THR ILE LEU LEU ARG ASN LEU LEU GLU ALA VAL PRO LEU LEU GLY GLY STOP '