## Problem

Write a program to convert nucleotide sequence to protein sequence

Input

Codon to Amino-Acid mapping: `./data/codons.txt`    
Nucleotide sequences: `./data/sequence.fasta`  

Output
Protein sequences in FASTA format

---

Nucleotide FASTA: `./data/sequence.fasta`  
\>seq1  
ATGTCACACCGC  
\>seq2  
ATGGCCAATACAAAC  
...  
... 

Protein FASTA:  
\>seq1  
MSHR  
\>seq2  
MANTN  
...  
...

---

### Step1: Read codon mapping and store the information 

In [None]:
# open a file handle
mfh = open("./data/codons.txt", "r")

codonDict = dict()  # intialize empty dictionary

mfh.readline()  # read the header line, ignore it

# iterate over each line and save the information
for line in mfh:
    line = line.strip()  #remove whitespaces 
    lineList = line.split('\t')  #split the line into a list
    aa = lineList[1]  # save SLC to aa 
    aaList = lineList[2].split(",")  #split the codons into list
    # iterate over each codon
    for codon in aaList:
        codonDict[codon] = aa
    
mfh.close()

print(codonDict)

### Step2: User-defined function to translate 

In [None]:
seq = "ATGACTGCATGTACGT"

def nuc2prot(seq):

    prot_seq = ""
    
    for i in range(0, len(seq)-2, 3):
        seqCodon = seq[i:i+3]
        
        # handle STOP codons
        if seqCodon in ["TAA", "TAG", "TGA"]:
            break
        else:
            prot_seq += codonDict[seqCodon]
    
    # return translated protein sequence to the caller
    return(prot_seq)

nuc2prot(seq)

### Step3: Read the nucleotide FASTA, translate to protein

In [None]:
# open file handles
ifh = open("./data/sequence.fasta", "r")
ofh = open("./data/prot_seq.fasta", "w")

# read the nucl fasta
for line in ifh:
    line = line.strip()
    
    # capture seq identifier
    if line.startswith(">"):
        print(line, file=ofh)
    else:
        prot_seq = nuc2prot(line)
        print(prot_seq, file=ofh)
    
# close file handles
ifh.close()
ofh.close()

### BioPython Solution

[BioPython](https://biopython.org/) is a set of freely available tools for biological computation written in Python.

Install it by typing the following in your terminal:  
`pip install biopython`

In [None]:
from Bio import SeqIO
from Bio.Seq import Seq

# my files
nuclFile = "./data/sequence.fasta"
protFile = "./data/prot_seq.fasta"

# create empty list of protein records
protRecList = []

# use SeqIO to iterate over every sequence
# use Seq class to translate 
for rec in SeqIO.parse(nuclFile, "fasta"):
    prot_rec = rec.translate(to_stop=True)
    prot_rec.id = rec.id
    protRecList.append(prot_rec)

# Write the protein records
SeqIO.write(protRecList, protFile, "fasta")