Problem: <br>

After identifying the exons and introns of an RNA string, we only need to delete the introns and concatenate the exons to form a new string ready for translation.

Given: A DNA string s (of length at most 1 kbp) and a collection of substrings of s acting as introns. All strings are given in FASTA format.

Return: A protein string resulting from transcribing and translating the exons of s. (Note: Only one solution will exist for the dataset provided.)

In [1]:
# Import necessary packages for solution
import re
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

We know that the given substrings are introns that can be found in the DNA sequence. Regular expressions (re) can be used to search the DNA sequence for these substrings. Once a substring is found within the DNA seqeunce, we can remove it from the DNA sequence. We repeat this each time a substring is found until all substrings are removed from the DNA sequence. Once the DNA sequence containing no introns is returned, we can translate the sequence to get the protein string.

The two packages used in this soultion are `re` and `BioPython`. BioPython is used to traverse FASTA files and translate DNA strings while re is used to identify patterns in text. Please visit the origianl documentation for a more in depth look at the modules ([re](https://docs.python.org/3/library/re.html), [BioPython](https://biopython.org/wiki/Documentation).)

The code splices a DNA sequence, translates the resulting sequence into a protein string, and writes protein string into a new FASTA file.

In [2]:
fileName = 'rosalind_splc.txt'

with open(fileName,'r') as f:
    
    first_line = True
    regex_list = []
    dna_string = ''
    
    # Assigns the first sequnce in the FASTA file to a varaible and adds all other seqeunces
    # in the FASTA file to a list designated for all the substrings
    for record in SeqIO.parse(f, 'fasta'):
        if first_line == True:
            dna_string = str(record.seq)
            first_line = False
        else:
            regex_list.append(str(record.seq))

    # Everytime you find a substring, replace it with an empty string ''
    # This is equivalent to deleting the introns from the DNA string
    for substring in regex_list:
        dna_string = re.sub(substring, '', dna_string)            
    
    # Translate the remaining string into protein string
    # Can go straight into translation of the DNA string assuming it is the coding strand
    protein_string = Seq(dna_string).translate()
    
    # Write the protein seqeunce to a new Fasta File 
    protein_record = SeqRecord(protein_string, id = 'prot_01', description = 'protein string after splicing')
    with open('RNA_Splicing.fasta', 'w') as f:
        SeqIO.write(protein_record, f, 'fasta')
