# Before building a search engine

This notebook will introduce you to some concepts and tools in Python before we build a search engine. You will be needing these besides the things you have learned in the previous weeks to complete the exercise.

- `namedtuple`
- Keyword arguments
- `sorted()`
- `Counter`

## Named tuples
A tuple is a sequence of values separated by commas and surrounded by parentheses, and items inside can be accessed by its index, for example:

In [1]:
example_paper = ("Sparsity-certifying Graph Decompositions", ["Streinu, Ileana", "Theran, Louis"], 2008)
print(example_paper)
print(example_paper[1])

('Sparsity-certifying Graph Decompositions', ['Streinu, Ileana', 'Theran, Louis'], 2008)
['Streinu, Ileana', 'Theran, Louis']


However, it is not always easy to remember the position of a particular item in the tuple, and [named tuples](https://docs.python.org/3.8/library/collections.html#collections.namedtuple) solves that problem by making items of a tuple accessible by a name. Below we will define a new namedtuple type called `paper` with the names of the properties we want to use, and then convert the example above to the `paper` type.

In [2]:
from collections import namedtuple

# Define a namedtuple type named paper
# with three fields: title, authors and year
paper = namedtuple("paper", ["title", "authors", "year"])

# The asterisk(*) before example_paper
# unpacks the items inside into individual arguments
# https://stackoverflow.com/a/36908
example_paper = paper(*example_paper)
print(example_paper, "\n")

# Getting elements out of a named tuple
print(example_paper.title)
print(example_paper.authors)
print(example_paper.year)

paper(title='Sparsity-certifying Graph Decompositions', authors=['Streinu, Ileana', 'Theran, Louis'], year=2008) 

Sparsity-certifying Graph Decompositions
['Streinu, Ileana', 'Theran, Louis']
2008


## Keyword arguments

When we covered functions, a function can have some mandatory named arguments, and we give arguments to a function by the order they are defined. These arguments are called positional arguments.

Keyword arguments are another kind of arguments you can define and give to a function. These arguments require a default value, and therefore are optional when calling the function later. They are defined in the form of `key=value` after all the positional arguments.

Instead of showing examples, I would recommend reading [this excellent page](https://treyhunner.com/2018/04/keyword-arguments-in-python/) (until the section **Where you see keyword arguments**).

### Some more details
Positional arguments are passed to a function as a `tuple`, while keyword arguments are passed as a `dict`. Therefore it is possible to define functions accepting any amount of positional and keyword arguments. [This Stack Overflow answer](https://stackoverflow.com/a/1419159) gives a few good examples if the above link doesn't explain it clear enonugh.

## `sorted()` and `lambda` functions
Since we will be ordering results a lot, it is useful to know a bit more about the "advanced" features of [`sorted()`](https://docs.python.org/3/library/functions.html#sorted). [This page](https://docs.python.org/3/howto/sorting.html) (until **Ascending and Descending**) explains how sorting works, and below I will give some specific examples relevant to the exercise.

In [12]:
# A list of New York Times Best Sellers Nonfiction, 2019-10-27
books = [("Blowout", "Rachel Maddow"),
         ("Talking to Strangers", "Malcolm Gladwell"),
         ("Witch Hunt", "Gregg Jarrett"),
         ("The United States of Trump", "Bill O'Reilly"),
         ("Educated", "Tara Westover")
        ]

from IPython.display import display, HTML

def print_books(header, books):
    out = [header, "<ul>"]
    for title, author in books:
        out.append(f"<li>{title}, {author}</li>")
    out.append("</ul>")
    display(HTML("".join(out)))

header = "Top 5 NYT Best Sellers Nonfiction, 2019-10-27"
print_books(header, books)
print_books(f"{header} (Sorted by book name)", sorted(books))
print_books(f"{header} (Sorted by Author name)", sorted(books, key=lambda b:b[1]))

## Counter

Imagine this problem: Given a bag containing some arbitrary amount of apples and bananas, count the number of them in the bag. For this, we can use a loop or something like this: `num_apples = len([n for n in bag if n == "apple"]` for each fruit, but this is quite cumbersome when there are more fruits to count. What if we don't know how many different kinds of fruits are in the bag?

The [`Counter`](https://docs.python.org/3.8/library/collections.html#collections.Counter) class from the `collections` module gives you some convenient utilities to solve the above problem. You can call `Counter()` with an iterable object (e.g. list) and it will return a `dict` where the keys are the values and the values their respective counts.

In [4]:
from collections import Counter

fruits = ["apple", "apple", "banana", "orange", "banana", "apple"]
counts = Counter(fruits)
print(counts)
print(counts.most_common(1))


Counter({'apple': 3, 'banana': 2, 'orange': 1})
[('apple', 3)]


Now that we have introduced the concepts, you should be prepared to build a search engine yourself. If there is anything unclear, don't be shy and ask on the Slack group!