# Transcribing DNA into RNA

## Problem

An RNA string is a string formed from the alphabet containing 'A', 'C', 'G', and 'U'.

Given a DNA string tt corresponding to a coding strand, its transcribed RNA string uu is formed by replacing all occurrences of 'T' in tt with 'U' in uu.

_Given_: A DNA string tt having length at most 1000 nt.

_Return_: The transcribed RNA string of tt.

**Sample Dataset**

    GATGGAACTTGACTACGTAAATT
    
**Sample Output**

    GAUGGAACUUGACUACGUAAAUU
    
_______________

**Solution**: There are not a lot of algorithms to consider when solving this problem so after implementing the straight-forward solution using Python's 'replace' built-in method we will take the chance to explore Biopython, a popular module used by many researchers and developers worldwide for computational molecular biology.

As mentioned above, Python offers a highly optimized replace function which works perfectly for the task at hand. However, if you do not know this method and are stuck somewhere without internet, an interesting approach is to first split the original string into a list of substrings at each occurrence of the character we want to replace using 'split'. For example the string "GATGG" will become the list ["CA","GG"]. Then we can simply join them with the character we want to replace using 'join' and done!

In [1]:
sample = "GATGGAACTTGACTACGTAAATT"

# First split the string at the "T"s
spl_string = sample.split("T")
# These are the first 5 elements of the resulting list
print("First 5 elements of split string:", spl_string[:5])
# Then join the substrings with "U"s
print("Final result:", 'U'.join(spl_string))

# Note: You can condense this in a one-liner:
# print('U'.join(sample.split('T')))

First 5 elements of split string: ['GA', 'GGAAC', '', 'GAC', 'ACG']
Final result: GAUGGAACUUGACUACGUAAAUU


If you plan on performing this kind of replacement somewhere else in your project, it is useful to write a function. A function simply takes an input, performs a series of transformations and gives back an output.

In [2]:
def translate_joinsplit(some_string):
    return 'U'.join(some_string.split('T'))

print(translate_joinsplit(sample))

GAUGGAACUUGACUACGUAAAUU


And using Python's 'replace' built-in function:

In [3]:
def translate_replace(some_string):
    return some_string.replace('T','U')

print(translate_replace(sample))

GAUGGAACUUGACUACGUAAAUU


On to Biopython. From Biopython's website:

_"Biopython is a set of freely available tools for biological computation written in Python by an international team of developers._

_It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics. The source code is made available under the Biopython License, which is extremely liberal and compatible with almost every license in the world."_

You can find instructions on downloading and installing Biopython [here](http://biopython.org/wiki/Biopython).

**Please note**: You need to have installed Biopython for the following cells to run.

Biopython offers a wide array of very useful tools. One of them is the Sequence (Seq). A Seq holds the input data string along with an associated "alphabet". The associated alphabet categorizes the sequence as DNA, RNA, or proteins, for example, unlocking a series of methods designed specifically for each.

Let's start by reading our input data into a Seq and associating the generic_dna alphabet to it:

In [11]:
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna

dna = Seq(sample, generic_dna)

# Our dna variable is now an object containing our input string and an associated DNA Alphabet:
dna

Seq('GATGGAACTTGACTACGTAAATT', DNAAlphabet())

The Seq object has many associated methods, some of which we are familiar with such as 'count' and 'find'. But it includes other domain-specific nucleotide methods such as 'transcribe', 'translate', and even 'reverse_complement'.

Our dna variable has been identified as a chain of DNA nucleotides, so performing a transcription will give us a new Seq, this time made up of RNA nucleotides, exactly the kind of replacement we want!

In [12]:
rna = dna.transcribe()

print(rna)

GAUGGAACUUGACUACGUAAAUU


Note that the transcribe method returns another sequence, with the correct alphabet associated!:

In [13]:
rna

Seq('GAUGGAACUUGACUACGUAAAUU', RNAAlphabet())