<img align="left" style="padding-right:10px;" src="figures/cartel.jpg">
<!--COURSE_INFORMATION-->
## This notebook contains the index from the course [Biology Meets Programming](https://www.coursera.org/learn/bioinformatics/home/welcome) by University of California in Coursera 


### The content is available [on GitHub](https://github.com/vencejo/Curso_BiologyMeetsProgramming).

<!--NAVIGATION-->
< [4.1 Motif Finding Meets Oliver Cromwell](4.1 Motif Finding Meets Oliver Cromwell.ipynb)| [Contents](Index.ipynb) |[4.3 How Can a Randomized Algorithm Perform So Well? ](4.3 How Can a Randomized Algorithm Perform So Well%3F.ipynb) >


### Rolling dice to find motifs

We will now turn to **randomized algorithms** that flip coins and roll dice in order to search for motifs. Making random algorithmic decisions may sound like a disastrous idea — just imagine a chess game in which every move would be decided by rolling a die. However, an 18th Century French mathematician and naturalist, Comte de Buffon, first proved that randomized algorithms are useful by randomly dropping needles onto parallel strips of wood and using the results of this experiment to accurately approximate the constant π (see DETOUR: Buffon’s Needle). 

Randomized algorithms may be nonintuitive because they lack the control of traditional algorithms. Some randomized algorithms are ** Las Vegas algorithms **, which deliver solutions that are guaranteed to be exact, despite the fact that they rely on making random decisions. Yet most randomized algorithms, including the motif finding algorithms that we will consider in this chapter, are **Monte Carlo algorithms**. These algorithms are not guaranteed to return exact solutions, but they do quickly find approximate solutions. Because of their speed, they can be run many times, allowing us to choose the best approximation from thousands of runs.

<img align="center" style="padding-right:10px;" src="figures/fig61.png">

<img align="center" style="padding-right:10px;" src="figures/fig62.png">

 Sample Input:

0.8 0.0 0.0 0.2
0.0 0.6 0.2 0.0
0.2 0.2 0.8 0.0
0.0 0.2 0.0 0.8
TTACCTTAAC
GATGTCTGTC
ACGGCGTTAG
CCCTAACGAG
CGTCAGAGGT

Sample Output:

ACCT
ATGT
GCGT
ACGA
AGGT

```python
# Input:  A profile matrix Profile and a list of strings Dna
# Output: Motifs(Profile, Dna)
def Motifs(Profile, Dna):
    # insert your code here
    motifs = []
    for i in range(len(Dna)):
        kmer = ProfileMostProbablePattern(Dna[i], 4, Profile)
        motifs.append(kmer)
        
    return motifs
```

<img align="center" style="padding-right:10px;" src="figures/fig63.png">



Simulating the process of generating a random integer is more difficult than you might think and requires more mathematics than we would like to describe here. Fortunately, since the task of generating random numbers arises in so many applications, Python provides a module called random for generating them. You can think of a module as a “bundle” of related functions. To use the random module, we place the following statement at the top of our file.

    import random

Inside of the random module is a built-in function called randint(1, M) that generates a random integer between 1 and M, inclusively. To call this function, we use

    random.randint(1, M)

Code Challenge (3 points): Write a function RandomMotifs(Dna, k, t) that uses random.randint to choose a random k-mer from each of t different strings Dna, and returns a list of t strings. Then add this function to Motifs.py.

Note: ﻿The remaining Code Challenges are all based on randomized algorithms.  Because the output of these algorithms depends on randomization, there is no unique answer to each problem, and so we will not provide test datasets.

Sample Input:

3 5
TTACCTTAAC
GATGTCTGTC
ACGGCGTTAG
CCCTAACGAG
CGTCAGAGGT

Sample Output:

TTA
ATG
GGC
GAG
CGT

```python
# import Python's 'random' module here
import random

# Input:  A list of strings Dna, and integers k and t
# Output: RandomMotifs(Dna, k, t)
# HINT:   You might not actually need to use t since t = len(Dna), but you may find it convenient
def RandomMotifs(Dna, k, t):
    # place your code here.
   motifs = []
   for i in range(t):
        init = random.randint(0,len(Dna[0])-k)
        motifs.append(Dna[i][init:init+k])
   return motifs

```

We are now ready to develop RandomizedMotifSearch. We start by generating a collection of random motifs using the function from the previous step, which we set as the best-scoring collection of motifs.

```python
    M = RandomMotifs(Dna, k, t)
    BestMotifs = M
```

The code below stops running as soon as the score of the motifs that we generate stops improving. It uses the loop “while True”, which iterates until it encounters a return statement. It can be dangerous to use such a loop, since it could lead to an infinite loop in which a program never terminates. However, in this particular case, the motif score must eventually stop improving, so that RandomizedMotifSearch must eventually terminate.

```python
      while True:
        Profile = ProfileWithPseudocounts(M)
        M = Motifs(Profile, Dna)
        if Score(M) < Score(BestMotifs):
            BestMotifs = M
        else:
            return BestMotifs 
```

Code Challenge (1 point): Put this code into a function RandomizedMotifSearch that takes a list of strings Dna along with integers k and t as input.   Then add this function to Motifs.py.

 Sample Input:

8 5
CGCCCCTCTCGGGGGTGTTCAGTAAACGGCCA
GGGCGAGGTATGTGTAAGTGCCAAGGTGCCAG
TAGTACCGAGACCGAAAGAAGTATACAGGCGT
TAGATCAAGTTTCAGGTGCACGTCGGTGAACC
AATCCACCAGCTCCACGTGCAATGTTGGCCTA

Sample Output:

AACGGCCA
AAGTGCCA
TAGTACCG
AAGTTTCA
ACGTGCAA

```python
# Input:  Positive integers k and t, followed by a list of strings Dna
# Output: RandomizedMotifSearch(Dna, k, t)
def RandomizedMotifSearch(Dna, k, t):
    M = RandomMotifs(Dna, k, t)
    BestMotifs = M
    while True:
        Profile = ProfileWithPseudocounts(M)
        M = Motifs(Profile, Dna,k)
        if Score(M) < Score(BestMotifs):
            BestMotifs = M
        else:
            return BestMotifs
```





Exercise Break (2 points): In practice, we retain the best-scoring set of motifs over many runs of RandomizedMotifSearch. Add an input parameter N representing the number of runs of RandomizedMotifSearch, and then find the best-scoring motifs with k-mer length equal to 15 in the DosR dataset over N runs (click [here](dnas/DosR.txt) to download). Don't forget to use pseudocounts!

Sample Input:

3 5
GGCGTTCAGGCA
AAGAATCAGTCA
CAAGGAGTTCGC
CACGTCAATCAC
CAATAATATTCG

Sample Output:

['CAG', 'CAG', 'CAA', 'CAA', 'CAA']
2

Right.

Passed test #2. Correct! Below is the best set of Motifs in the DosR with k = 15 using RandomizedMotifSearch (with a score of 60):
CCTGGCTTCGATGGC

<img align="center" style="padding-right:10px;" src="figures/fig64.png">

<img align="center" style="padding-right:10px;" src="figures/fig65.png">

<img align="center" style="padding-right:10px;" src="figures/fig66.png">

<!--NAVIGATION-->
< [4.1 Motif Finding Meets Oliver Cromwell](4.1 Motif Finding Meets Oliver Cromwell.ipynb)| [Contents](Index.ipynb) |[4.3 How Can a Randomized Algorithm Perform So Well? ](4.3 How Can a Randomized Algorithm Perform So Well%3F.ipynb) >