##  Where in the Genome Does Replication Begin? (Part 1/2) 


**nucleotides**:
- adenine (A) and thymine (T) 
- cytosine (C) and guanine (G)


**genome replication**: 
- carried out in the cell
- cells must replicate its genome so children cells can inherit
- **replication origin (ori)** is the genome region where replication begins
- is carried out by molecular copy machines called **DNA polymerases**

**DnaA**
- a protein that binds to a short segment (known as a **DnaA box**) within ori  


In [1]:
"""code challenge"""

# PatternCount(Text, Pattern)
#     count ← 0
#     for i ← 0 to |Text| − |Pattern|
#         if Text(i, |Pattern|) = Pattern
#             count ← count + 1
#     return count

def pattern_count(
        text: str,
        pattern: str,
) -> int:
    count: int = 0
    for i in range(len(text) - len(pattern)):
        if text[i:].startswith(pattern):
            count += 1
    return count

test_text: str = 'TACCCTTCGTACCCTTTATACCCTTTACCCTTCTACCCTTATAGCCTCGTGTACCCTTTACCCTTTGTACCCTTTACCCTTTCTACCCTTTGGATACCCTTACGTTCCAGTATGCTGCATACCCTTAATACCCTTATACCCTTGTCTACCCTTGTATTACCCTTGGTACCCTTTTACCCTTTACCTACCCTTCCCCTACCCTTAGACTACCCTTTACCCTTTTTACCCTTTATACCCTTTCTACCCTTTACCCTTTTACCCTTTTACCCTTCATACCCTTCTAGTTATTACCCTTTTACCCTTATTACCCTTTACCCTTTCCGATTACCCTTTACCCTTTACCCTTTCGACGTGGTTACCCTTTACCCTTGCGTCCCGTTACCCTTGAAGGAAATACCCTTTACCCTTTCTACCCTTTCGTACCCTTTGAATACCCTTTCTACCCTTAGTACCCTTATACCCTTATTCTACCCTTACCTACCCTTTGTTACCCTTTACCCTTTACCCTTAAACGACTACCCTTTACCCTTCAGCTACCCTTCTACCCTTCTACCCTTGGCTACCCTTTACCCTTTACCCTTGTACCCTTTGTACCCTTGCAACAGCGATTTACCCTTTTACCCTTAGCTGTACCCTTTACCCTTGTACCCTTTTACCCTTTTACCCTTCCGTACCCTTAGGGCCGGCGACCTTTACCCTTCGTTACCCTTTGTATTACCCTTTTACCCTTAAATACCCTTTACCCTTTACCCTTTACCCTTGCATACCCTTCCGGGGTGGTTTACCCTTTGTATACCCTTTACCCTTCTAGTCTCCGTTTACCCTTACCTCTACCCTTATACCCTTCAATCTGCTACCCTTTAGTGGGGCTACCCTTCGGTTACCCTTTTCCGGTTACCCTTTACCCTTTCATTAGCTATACCCTTCTTACCCTTTACCCTTTTACCCTTTACCCTTTACCCTTGTTACCCTTATGTTAATACCCTTGATGATACCCTTTACCCTTTACCCTTATACCCTT'
test_pattern: str = 'TACCCTTTA'
print(pattern_count(text=test_text, pattern=test_pattern))


30


In [4]:
"""code challenge"""

# FrequentWords(Text, k)
#     FrequentPatterns ← an empty set
#     for i ← 0 to |Text| − k
#         Pattern ← the k-mer Text(i, k)
#         Count(i) ← PatternCount(Text, Pattern)
#     maxCount ← maximum value in array Count
#     for i ← 0 to |Text| − k
#         if Count(i) = maxCount
#             add Text(i, k) to FrequentPatterns
#     remove duplicates from FrequentPatterns
#     return FrequentPatterns

from typing import Tuple, Dict

def frequent_words(
        text: str,
        k: int,
) -> Tuple[str, ...]:
    
    # a better implementation compared to the given pseudo code
    _kmer_dict: Dict[str, int] = {}
    for i in range(len(text) - k + 1):
        _kmer = text[i: i + k]
        if _kmer in _kmer_dict:
            _kmer_dict[_kmer] += 1
        else:
            _kmer_dict[_kmer] = 1
    
    _max_occur = max(_kmer_dict.values())
    return tuple(k for k, v in _kmer_dict.items() if v == _max_occur)

test_text: str = 'AGCCCTGAAACGGAATTCAAAGGGCGCGATACTGTCAAAGGGAACGGAATAACGGAATCGCGATACTGAGCCCTGACGCGATACTGAACGGAATAGCCCTGAAGCCCTGAAACGGAATAGCCCTGAAGCCCTGAAACGGAATTCAAAGGGTCAAAGGGCGCGATACTGGCAGCCAGTTCGCGATACTGAACGGAATGCAGCCAGTTAGCCCTGAAACGGAATCGCGATACTGTCAAAGGGGCAGCCAGTTTCAAAGGGCGCGATACTGAACGGAATAGCCCTGAAGCCCTGAAACGGAATAGCCCTGAGCAGCCAGTTGCAGCCAGTTTCAAAGGGAGCCCTGAAGCCCTGAGCAGCCAGTTAGCCCTGAGCAGCCAGTTCGCGATACTGCGCGATACTGGCAGCCAGTTGCAGCCAGTTTCAAAGGGTCAAAGGGCGCGATACTGAACGGAATAACGGAATAGCCCTGAAACGGAATGCAGCCAGTTCGCGATACTGGCAGCCAGTTTCAAAGGGCGCGATACTGAACGGAATCGCGATACTGAGCCCTGAAACGGAATAACGGAATTCAAAGGGCGCGATACTGAGCCCTGAAGCCCTGAAACGGAATTCAAAGGGAGCCCTGATCAAAGGGGCAGCCAGTTGCAGCCAGTTGCAGCCAGTTCGCGATACTGAGCCCTGACGCGATACTGCGCGATACTGGCAGCCAGTTAGCCCTGACGCGATACTGCGCGATACTGAACGGAATTCAAAGGGCGCGATACTGTCAAAGGGGCAGCCAGTTCGCGATACTGCGCGATACTGGCAGCCAGTTGCAGCCAGTTGCAGCCAGTTTCAAAGGGCGCGATACTGAACGGAATCGCGATACTGAACGGAATAACGGAATGCAGCCAGTTAGCCCTGAAGCCCTGAAACGGAATAGCCCTGA'
test_k: int = 14
print(frequent_words(text=test_text, k=test_k))

('AGCCCTGAAACGGA', 'GCCCTGAAACGGAA', 'CCCTGAAACGGAAT')
