## How to keep track of the "state" of something

### Homework help

Write a Python program called `run.py` that will compress strings of DNA using [run-length encoding](RLE) 
where runs of the same base (homopolymers) are represented by the base followed by a numeral representing the number of repetitions.
The program should accept a single input that is either a sequence to encode or a file containing sequences on each line.

```
$ cat inputs/sample1.txt
ACCGGGTTTT
```

This is the expected output with this file as the input:

```
$ ./run.py inputs/sample1.txt 
AC2G3T4
```

### Let's process some Python Poetry!

In [34]:
text = '''
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
'''


In [35]:
def lowercase(text):
    return text.lower()

def removePunctuation(text):
    punctuations = ['.', '-', ',', '*']
    for punctuation in punctuations:
        text = text.replace(punctuation, '')
    return text

def removeNewlines(text):
    text = text.replace('\n', ' ')
    return text

def removeSpaces(text):
    text = text.replace(' ', '')
    return text

def removeShortWords(text):
    return ' '.join([word for word in text.split() if len(word) > 3])

def removeLongWords(text):
    return ' '.join([word for word in text.split() if len(word) < 6])

In [36]:
processingFunctions = [lowercase, removePunctuation, removeNewlines, removeLongWords]

for func in processingFunctions:
    text = func(text)

print(text)

is than ugly is than is than is than flat is than is than dense cases to break the rules beats never pass in the face of the to guess there be one and only one way to do it now is than never never is often than right now if the is hard to it's a bad idea if the is easy to it may be a good idea are one great idea let's do more of


In [37]:
words = sorted(text.split())

In [38]:
# Let's try this by writing a function that can count how many times we see duplicate words in our sorted text from above.
print(words)

['a', 'a', 'and', 'are', 'bad', 'be', 'be', 'beats', 'break', 'cases', 'dense', 'do', 'do', 'easy', 'face', 'flat', 'good', 'great', 'guess', 'hard', 'idea', 'idea', 'idea', 'if', 'if', 'in', 'is', 'is', 'is', 'is', 'is', 'is', 'is', 'is', 'is', 'is', 'it', 'it', "it's", "let's", 'may', 'more', 'never', 'never', 'never', 'now', 'now', 'of', 'of', 'often', 'one', 'one', 'one', 'only', 'pass', 'right', 'rules', 'than', 'than', 'than', 'than', 'than', 'than', 'than', 'than', 'the', 'the', 'the', 'the', 'the', 'there', 'to', 'to', 'to', 'to', 'to', 'ugly', 'way']


### Tracking state

1. Create a data structure to hold the words and the count. I am going to create a list of lists.
   For example, [('a', 2), ('and', 1)]. I could use a dictionary too, but why wouldn't that work in this case?
2. Create a variable to count how many times you see each of the words. We will reset this count each time we get to a new word.
3. Keep track of the word you are currently on. We will update this when we get to a new word. 

In [39]:
def count_repeated_words(words) -> list:
    """ Counting Repeated Words """

    counts = []
    count = 0
    prev = ''
    for word in words:
        # We are at the start
        if prev == '':
            prev = word
            count = 1
        # This word is the same as before
        elif word == prev:
            count += 1
        # This is a new word, so record the count
        # of the previous word and reset the counter
        else:
            counts.append((prev, count))
            count = 1
            prev = word

    # get the last word after we fell out of the loop
    counts.append((prev, count))

    return counts




In [40]:
words_repeated = count_repeated_words(words)
print(words_repeated)

[('a', 2), ('and', 1), ('are', 1), ('bad', 1), ('be', 2), ('beats', 1), ('break', 1), ('cases', 1), ('dense', 1), ('do', 2), ('easy', 1), ('face', 1), ('flat', 1), ('good', 1), ('great', 1), ('guess', 1), ('hard', 1), ('idea', 3), ('if', 2), ('in', 1), ('is', 10), ('it', 2), ("it's", 1), ("let's", 1), ('may', 1), ('more', 1), ('never', 3), ('now', 2), ('of', 2), ('often', 1), ('one', 3), ('only', 1), ('pass', 1), ('right', 1), ('rules', 1), ('than', 8), ('the', 5), ('there', 1), ('to', 5), ('ugly', 1), ('way', 1)]
