<img align="left" style="padding-right:10px;" src="figures/cartel.jpg">
<!--COURSE_INFORMATION-->
## This notebook contains a unit from the course [Biology Meets Programming](https://www.coursera.org/learn/bioinformatics/home/welcome) by University of California in Coursera 


### The content is available [on GitHub](https://github.com/vencejo/Curso_BiologyMeetsProgramming).

<!--NAVIGATION-->
< [2.3 Peculiar Statistics of the Forward and Reverse Half-Strands](2.3 Peculiar Statistics of the Forward and Reverse Half-Strands.ipynb) | [Contents](Index.ipynb) | [2.5 Some Hidden Messages Are More Elusive than Others](2.5 Some Hidden Messages Are More Elusive than Others.ipynb)>



In the table containing nucleotide counts for T. petrophila (reproduced below), we noted that not just C but also G has peculiar statistics on the forward and reverse half-strands.

In practice, scientists use a more accurate approach that accounts for both G and C when searching for ori. As the above figure illustrates, the difference between the total amount of guanine and the total amount of cytosine is negative on the reverse half-strand and positive on the forward half-strand.

<img align="center" style="padding-right:10px;" src="figures/fig23.png">

Thus, our idea is to traverse the genome, keeping a running total of the difference between the counts of G and C. If this difference starts increasing, then we guess that we are on the forward half-strand; on the other hand, if this difference starts decreasing, then we guess that we are on the reverse half-strand (see figure below).

<img align="center" style="padding-right:10px;" src="figures/fig26.png">


We will keep track of the difference between the total number of occurrences of G and the total number of occurrences of C that we have encountered so far in Genome by using a skew array. This array, denoted Skew, is defined by setting Skew[i] equal to the number of occurrences of G minus the number of occurrences of C in the first i nucleotides of Genome (see figure below). We also set Skew[0] equal to zero.

<img align="center" style="padding-right:10px;" src="figures/fig27.png">

Code Challenge (3 points): Use this idea to write a function Skew(Genome) that takes a DNA string Genome as input and returns the skew array of Genome in the form of a dictionary mapping the i-th symbol of Genome to Skew[i]. Then add this function to Replication.py.

```python
# Input:  A String Genome
# Output: Skew(Genome)
def Skew(Genome):
    skew = {} #initializing the dictionary
    # your code here
    n = len(Genome)
    skew[0] = 0
    for i in range(1,n+1):
        if Genome[i-1] == "G":
            skew[i] = skew[i-1]+1
        elif Genome[i-1] == "C":
            skew[i] = skew[i-1]-1
        else:
            skew[i] = skew[i-1]
            
    return skew
```

The skew diagram of Genome is defined by plotting i against Skew[i] as i ranges from 0 to len(Genome). The figure below shows the skew diagram for the genome from the previous step.

<img align="center" style="padding-right:10px;" src="figures/fig28.png">

The figure below depicts the skew diagram for a linearized E. coli genome. The pattern is even stronger than the pattern observed when we visualized the symbol array! It turns out that the skew diagram for many bacterial genomes has a similar characteristic shape.

<img align="center" style="padding-right:10px;" src="figures/fig29.png">

STOP and Think: After looking at the E. coli skew diagram (reproduced below), where do you think that ori is located in E. coli?

Let’s follow the 5' → 3' direction of DNA and walk along the chromosome from ter to ori (along a reverse half-strand), then continue on from ori to ter (along a forward half-strand). In the figure below, we see that the skew is decreasing along the reverse half-strand and increasing along the forward half-strand. Thus, the skew should achieve a minimum at the position where the reverse half-strand ends and the forward half-strand begins, which is exactly the location of ori!

<img align="center" style="padding-right:10px;" src="figures/fig26.png">




We have just developed an insight for a new algorithm for locating ori: it should be found where the skew array attains a minimum.

Minimum Skew Problem:  Find a position in a genome where the skew diagram attains a minimum.
 Input: A DNA string Genome. 
 Output: All integer(s) i minimizing Skew[i] among all values of i (from 0 to len(Genome)).

Code Challenge (3 points): Write a function MinSkew taking a DNA string Genome as input and returning all integers i minimizing Skew[i] for Genome. Then add this function to Replication.py. (Hint: make sure to call Skew(Genome) as a subroutine, and keep in mind that Python has a built-in min function in addition to max.)

Click here for this problem's test datasets.

Sample Input:

TAAAGACTGCCGAGAGGCCAACACGAGTGCTAGAACGAGGGGCGTAAACGCGGGTCCGAT

Sample Output:

11 24

```python
# Input:  A DNA string Genome
# Output: A list containing all integers i minimizing Skew(Prefix_i(Text)) over all values of i (from 0 to |Genome|)
def MinimumSkew(Genome):
    positions = [] # output variable
    # your code here
    skew = Skew(Genome)
    minValue = skew[0]
    positions.append(minValue)
    for i in range(1,len(Genome)):
        if skew[i] < minValue:
            minValue = skew[i]
            positions = [i]
        if skew[i] == minValue and i not in positions:
            positions.append(i)
    return positions
```

STOP and Think: Note that the skew diagram (reproduced below for E. coli) changes depending on where we start our walk along the circular chromosome. Does the minimum of the skew diagram point to the same genomic location regardless of where we begin walking to generate the skew diagram?

<img align="center" style="padding-right:10px;" src="figures/fig29.png">



<!--NAVIGATION-->
< [2.3 Peculiar Statistics of the Forward and Reverse Half-Strands](2.3 Peculiar Statistics of the Forward and Reverse Half-Strands.ipynb) | [Contents](Index.ipynb) | [2.5 Some Hidden Messages Are More Elusive than Others](2.5 Some Hidden Messages Are More Elusive than Others.ipynb)>