# List algorithms

For each challenge here, first write the algo in pseudo code. then implement

## Linear search
```python
robbers = ['Tokyo', 'Professor', 'Lisbon', 'Oslo', 'Helsinki', 'Rio', 'Moscow', 'Berlin', 'Palermo']
linear_search(robbers, 'Professor') == 1
linear_search(robbers, 'alon') == -1

```

In [11]:
robbers = ['Tokyo', 'Professor', 'Lisbon', 'Oslo', 'Helsinki', 'Rio', 'Moscow', 'Berlin', 'Palermo']

def linear_search(sequence:list, target)-> int:
    for idx, item in enumerate(sequence):
        if item == target:
            return idx
    return -1

assert linear_search(robbers, 'Lisbon') == 2
len(robbers)

9

## Revisting the big O notation

>Big O notation describes the **complexity of your code using algebraic terms.**

So what is the big O notation for our linear_search?
1. How many steps it would take in the worse case?
2. How many would it take in the best case?

[Read more about big O](https://www.freecodecamp.org/news/big-o-notation-why-it-matters-and-why-it-doesnt-1674cfa8a23c/)

In [None]:
On

In [3]:
## Let's tackle a more real life problem

In [8]:

vocab = ["apple", "boy", "dog", "down",
                          "fell", "girl", "grass", "the", "tree"]
book_words = "the apple fell from the tree to the grass".split()


In [16]:
def find_unknown_words(vocab:list, book_words:list):
    # Use linear search
    result = []
    for word in book_words:
        
        if linear_search(vocab, word) < 0: 
            result.append(word)
    return result


In [17]:
assert find_unknown_words(vocab, book_words) == ["from", "to"]
assert find_unknown_words([], book_words) == book_words

How would you implement `find_unknown_words` using the linear_Search we've built before

In [18]:
# but for lets try with a bigger vocab

def load_words_from_file(filename):
    """ Read words from filename, return list of words. """
    f = open(filename, "r")
    file_content = f.read()
    f.close()
    wds = file_content.split()
    return wds

bigger_vocab = load_words_from_file("vocab.txt")
len(bigger_vocab)

19455

In [20]:
# For loading words from a book with need some normalization

def text_to_words(the_text):
    """ return a list of words with all punctuation removed,
        and all in lowercase.
    """

    my_substitutions = the_text.maketrans(
      # If you find any of these
      "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&()*+,-./:;<=>?@[]^_`{|}~'\\",
      # Replace them by these
      "abcdefghijklmnopqrstuvwxyz                                          ")

    # Translate the text now.
    cleaned_text = the_text.translate(my_substitutions)
    wds = cleaned_text.split()
    return wds


def get_words_in_book(filename):
    """ Read a book from filename, and return a list of its words. """
    f = open(filename, "r")
    content = f.read()
    f.close()
    wds = text_to_words(content)
    return wds


In [21]:
assert text_to_words('"Well, I never!", said Alice.') ==  ["well", "i", "never", "said", "alice"]
# How we can implement text to words?
# Note you should use split and probably replace or python translate

In [22]:
# Getting back to big O
len(get_words_in_book('alice_in_wonderland.txt'))

27336

In [31]:
import time
time.process_time()
vocab = vocab.extend(vocab)

In [37]:
import time
alice_words = get_words_in_book('alice_in_wonderland.txt')
vocab = load_words_from_file('vocab.txt')
vocab.extend(vocab)
vocab.extend(vocab)

start = time.process_time() 
find_unknown_words(vocab, alice_words)
end = time.process_time()
print(f"Delta {end-start}")

Delta 27.70539735899999


## What happened here? 
* Seeing big O notation in action
* What could we do differently?

## The road not taken

* Before we go and implement our own version of search let's explore other options *

### What else could we do? 

So after we explored some more "pythonic" solutions, let's go back into implementig an classic algorithmic solution - Binary search

In [None]:
def binary_search(sequence, target):
    pass

# lets try to visualize this kind o

In [1]:
def find_unknown_words(vocab:list, book_words:list):
    # Use linear search
    result = []
    for word in book_words:
        
        if binary_search(vocab, word) < 0: 
            result.append(word)
    return result

In [None]:
### now let's check the difference between our previous linear algo and the new algo
start = time.process_time() 
find_unknown_words(vocab, alice_words)
end = time.process_time()
print(f"Delta {end-start}")

## Removing adjacent duplicates from a list

Given a list where some adjacent items are the same, remove adjacent (and only adjacent) items



In [None]:
def remove_adjacent(a_list:list) -> list:
    pass

# What is the O complexity?
# What is the "memory usage" ?
sample = [1,1,2, 4, 2, 2,7,7,8,7,7]
assert remove_adjacent(sample) == [1,2, 4, 2, 7, 8, 7]

In [None]:
### What would happen if the list was sorted? 

In [None]:
def merge_lists(a_sorted_list_1: list, a_sorted_list_2:list)-> list:
    """
    Gets two sorted lists, returns a sorted list
    """
    pass

# What's the problem with the immediate answer? (what is the immediate answer)
# Can we think of an algorithemic " divide conquer" solution? 
# What is the smallest step possible here

## Merging sorted lists as a general problem

How can we adapt this algo to 

* Return only those items that are present in both lists.

* Return only those items that are present in the second list, but not in the first.

In [None]:
Back to alice

# Read and play more with algorithms

https://www.khanacademy.org/computing/computer-science/algorithms

Most of this tutorial was "borrowed" from [here](http://openbookproject.net/thinkcs/python/english3e/list_algorithms.html)