# CSS 201.5 - CSS Bootcamp

## Python Programming

### Umberto Mignozzetti (UCSD)

# Strings in Python

## What is a string?

> A **string** is a *sequence* of characters. It belongs to the `str` type in Python.

A string stores characters as text, and is created using either single (`''`) or double (`""`) quotes.

Note that although strings are often used to store *words*, this isn't necessarily the case. A string could be:

In [None]:
"dog\tand\tcat"

In [None]:
"abcdef"

In [None]:
"1 + 4"

With many more possibilities. Basically, *any* character that you wrap with quotes becomes part of a `str` in Python.

### Multi-line strings

Multi-line strings can be defined using `""" """`, as below.

In [None]:
long_str = """
This string spans multiple lines.
    This is the second line.
\"This is the third line.\"
Umberto's
{} and {}.
"""
print(long_str.format('{}','house'))

### Side note: a `str` is a kind of sequence

> A **sequence** is a collection of items (e.g., numbers, characters, etc.) with some *determined order*.  

A `list` and `str` are both kinds of sequences. 

We'll discuss **sequences** more when we talk about `list`s, but there are a couple of important properties to remember:

- Sequences have a particular *order*.  
- You can **index** into a sequence to obtain the item at a particular position.  

### Checking whether something is a `str`

Recall that you can check the **type** of a variable using `type`.

In [None]:
type("This is a sentence.")

In [None]:
type("1 + 4")

In [None]:
type(1 + 4)

### Check-in

Which of the following variables would evaluate to a `str`?

In [None]:
x1 = 1.5
x2 = True
x3 = "2 * 100"

## Why care about strings?

**Strings** are incredibly useful and versatile, so it's important to understand how they work and how to manipulate them.

Common uses of strings:

- Pretty much all text data is stored as a `str` (e.g., a text corpus, a word, etc.).  
- Storing information that can't be represented as `int` or `bool`, such as **password**.  
- Declaring **features** of an object in Python that can't be represented as `int` or `bool`. 
- Representing a **filename**.

Strings are so useful that virtually all programming languages have something like a `str` type.

## Working with strings: basic operations

Today, we're going to focus on a few **basic operations** we can use with strings. In a future lecture, we'll talk about more complex operations.

The basic operations include:

1. Getting the length (`len`) of a string.  
2. Indexing into a string (`string_name[0]`).  
3. Looping through a string (`for ch in string_name...`).  

You'll note that each of these operations can also be applied to a `list` type!

### Calculating string length with `len`

> The `len` operator calculates the number of characters in a `str` (or `list`).  

In [None]:
x1 = "CSS 201 jhdfkjahsdjfh kfjdhsfkjhasdkjhf. fkjdhsaklfjasdf"
print(len(x1))

In [None]:
x2 = "class"
print(len(x2))

#### Check-in

How many characters are in the string `"2 + 2"`?

Try answering before you try typing in the expression.

#### Spaces count as characters!

An empty space (`" "`) counts as a character in Python.

Thus, the `str` `"big dog"` has one extra character than the `str` `"bigdog"`. 

In [None]:
len("big dog")

In [None]:
len("bigdog")

#### Check-in

How many characters are in the `str` below?

In [None]:
str_test = "Computational Social Science is fun."

#### Putting quotes into a string

Certain characters, like quotes, require an **escape** character if you want to put them into a string. Otherwise they'll simply *end* the string.

In [None]:
quote_str = "Then he said, \"I love CSS!\""
print(quote_str)

### Indexing into a `str`

> In programming, **indexing** into a sequence means retrieving the item at a particular position.

Because a `str` is a kind of sequence, we can retrieve the character at a particular position.

We can index into a `str` (or `list`) using the `string_name[...]` notation, where `...` would be replaced with the **index** of the character we want to retrieve.

In [None]:
test_var = "computer"
test_var[::-1]

#### Note on indexing

Python uses **zero-indexing**: the first element in a sequence is assigned the index `0`, the second is assigned `1`, and so on.

- This can be hard to get used to at first!  
- But over time, it'll start to seem more natural.  

#### Check-in

Which of the indexing operations below would return the letter `"S"`?

In [None]:
s = "CSS"

#### Check-in

Why does the code below return an **error**?

In [None]:
s = "CSS"
s[4]

### Slicing into a `str`

> **Slicing** is like indexing, but allows you to return a *subset* within a sequence.

For example, rather than getting the *n-th* character of a `str`, you can return the characters between index `0` and index `2`.

- To **slice**, use the syntax `[start_index:end_index]`.  
- `start_index` is the index of the first character you want to return.  
- `end_index` is the index of the final character you want to return, plus one.
   - Like `range`, the final index is not "inclusive".  

In [None]:
s = "programming"
s[0:4]

#### Check-in

How many characters would the following **slice** return? *Which* characters would they be?

In [None]:
s = "programming"
subset = s[5:7] ## how many characters is this?

#### Check-in

Write a **slice** operation to return the `str` `"humid"` within the string `"dehumidify"`.

In [None]:
original_str = "dehumidify"
### Your code here

### Looping through strings

> **Looping** through a `str` means repeating some piece of code for each (or a subset) of the characters within a string.

We've already discussed [loops in previous lectures](06-loops), so this will be a brief review:

- A `for` loop **iterates** through each item in a sequence (like a `str`), repeating some piece of code.  
- A `while` loop **continues** as long as some condition is met, and can also be used to iterate through a sequence.

#### Looping with a `for` loop

In [None]:
seq = "CSS"
for i in seq:
    print(i)

#### Looping with a `while` loop

In [None]:
i = 0
seq = "CSS"
while i < len(seq):
    print(seq[i])
    i += 1

## Modifying case

Often, you'll need to modify the **case** of a `str` (i.e., make it either *upper* or *lower* case). 

- One use-case for this is needing to *compare* two strings, but not caring about whether they have identical case. 
- E.g., "APplE" is the same *word* as "apple", but these strings wouldn't evaluate as equal.

In [None]:
"appLe" == "apple"

In [None]:
"apple" == "apple"

'2 * 2' == '4'

### `upper` and `lower`

As the names imply, `upper` and `lower` are both *functions* that you can use on a `str`.  

In [None]:
"APPLE".lower()

In [None]:
"apple".upper()

In [None]:
"APPLE".lower() == "apple"

### `title`

The `title` function is a variant of `upper`/`lower`, which just capitalizes the *first* letter of each word.

In [None]:
og_string = "my name is umberto"
og_string.title()

Note that if you have capital letters *after* the first letter of a word, these will now become lowercase!

In [None]:
og_string = "DNA"
og_string.title()

### Evaluating case

Just as you can **modify** the case of these strings, you can also evaluate it:

- `isupper()` 
- `islower()` 
- `istitle()`

These functions all check whether a string conforms to those patterns.

In [None]:
"CSS".isupper()

In [None]:
"CSS".islower()

In [None]:
"I Love Programming".istitle()

### Check-in

If you called `istitle()` on the following string, would it evaluate to `True` or `False`?

In [None]:
test_str = "I love CSS"
### Your answer/code here

### Other helpful evaluation methods

There are a few other helpful methods for **evaluating** properties of a string:

- `isdigit`: checks if the characters are entirely digits (e.g., $0, 1, ..., 9$)  
- `isalpha`: checks if the characters are entirely alphabetic characters (e.g., `abcd...`). 
- `isspace`: checks if the string is entirely space characters (e.g., ` `). 

## Replacing characters

Another common operation is [**replacing** elements of a string](https://www.w3schools.com/python/ref_string_replace.asp). 

Examples:

- In a `list` of filenames, replacing every `-` with a `_`. 
- Removing certain words or characters, e.g., replacing every instance of a word with a ` `.  

This can be done with the `replace` function.

In [None]:
## Replace "-" with "_"
og_filename = "css-lecture-06"
og_filename.replace("-", "_")

### Replacing the first $N$ instances

`replace` can also be used to replace only the first $N$ instances of a string. 

In [None]:
## Replace only the first instance of "bananas"
og_string = "bananas, apples, bananas, grapes"
og_string.replace("bananas", "oranges", 1)

### Check-in

Use the `replace` function to replace the **first 2 instances** of `-` with `_`.

In [None]:
original_filename = "css-l06-su23-test.py"
### Your code here

### `replace` is case-sensitive

Note that `replace` attempts an **exact match** of the `str` you're looking to replace.

- This includes exact **case match**. 
- `"apple" != "APPLE"`. 

In [None]:
case_mismatch = "I like Apples"
### replace won't do anything here
case_mismatch.replace("apples", "bananas")

In [None]:
case_mismatch = "I like Apples"
### replace will replace it here
case_mismatch.replace("Apples", "bananas")

## Concatenating strings

> String **concatenation** simply means *combining* multiple strings.

Often, you'll need to *combine* the characters in multiple strings.

- Combining the **directory path** and a **filename** to get the full path of a file.
- Combining parts of strings to get a valid **URL**.  
- Combining the first and last name of a client to `print` out the **full name**.

### Approach 1: the `+` operator

The `+` operator can be used to **combine** multiple `str` objects.

In [None]:
"Comput" + "ational"

In [None]:
"css201/" + "lec06/" + "file.py"

#### Check-in

What do you notice about how these strings are combined? Is a space added between each constituent `str` or no?

#### Watch out for spaces (and lack thereof)!

By default, `+` will just combine two different string objects directly.

That is, `"Hello" + "World"` will become `"HelloWorld"`.

If you want to add a space *between* these objects, make sure to add a space character in your concatenation operation.

In [None]:
p1 = "Hello"
p2 = "World"
p1 + " " + p2

#### Check-in

Why does the code below throw an error? 

**Bonus**: What would you need to do to make it *not* throw an error?

In [None]:
2 + " cats"

#### Concatenating an `int` to a `str`

The `+` operator assumes you are concatenating multiple `str` objects. Thus, trying to combine an `int` with a `str` this way will throw an error.

However, you can use **type-casting** to turn the `int` into a `str`, and then combine them.

In [None]:
str(2) + " cats"

#### Check-in

Use the `+` operator to combine the variables below into a single string (in order, i.e., `var1` followed by `var2`, etc.). 
- Add a space between each variable. 
- Watch out for conflicting types!

In [None]:
var1 = "This"
var2 = "Is"
var3 = "CSS"
var4 = 202
#### Your code here

### Approach 2: using `format`

The `format` method can also be used to merge multiple strings together.

- This approach is less intuitive at first, but is very flexible.  
- I use this approach when I'm `print`ing out lots of custom variable values, e.g., as in an output message.

With `format`, you can declare "variables" within a `str` using the `{x}` syntax. 

In [None]:
first = "Smarty"
last = "Student"
print("Hello, {f} {l}".format(f = first, l = last))

#### Check-in

Use `format` to `print` out a message that reads: 

`"Welcome to CSS 201"`.

In [None]:
department = "CSS"
number = "201"
#### Your code here

### Approach 3: using `join`

Another somewhat common use-case is **joining** strings that are currently stored as elements of a list.

The `join` syntax starts with the *character* (or character*s*) you'll be using to **join** each `str` together.

- This could be a space character, an underscore, or anything you want.  
- It then makes a call to `.join(list_name)`. 

In [None]:
separate_str = ['The', 'quick', 'brown', 'fox', 'jumped']
separate_str

In [None]:
" ".join(separate_str)

#### Check-in

Use `join` to turn the following list of directory and sub-directory names into a full file path, connected by the `"\"` symbol. 

In [None]:
dirs = ["css", "201", "lectures", "lec06"]
#### Your code here

### Other approaches

There are a number of [other approaches](https://www.pythontutorial.net/python-string-methods/python-string-concatenation/) to concatenating strings. 

Personally, I primarily use:

- The `format` operator when I'm `print`ing out complicated strings. 
- The `+` operator for everything else.  

## `split`ting a string

Just as you can `join` parts of a `list` into a `str`, you can also `split` a `str` into a `list`!

Common use cases:

- Extracting directories and sub-directories of a file path.  
- **Tokenizing** a sentence, i.e., retrieving all the distinct *words* (e.g., in English, written words are typically separated by spaces).  
- Extracting different **hash-tags** from a tweet (e.g., `"#CSS#Programming"`). 

In [None]:
example_sentence = "The quick brown fox jumped over the lazy dog"
example_sentence.split(" ")

#### Check-in

How many **words** (i.e., character-sequences separated by spaces) are in the sentence below?

Hint: use a combination of `split` and `len` to solve this question.

In [None]:
test_sentence = "This sentence has a number of different words and your goal is to count them"
### Your code here

## Combining lists

Two or more lists can be combined using the `+` operator.

In [None]:
list1 = [1, 2, 3]
list2 = ['4', 5, '6']
list1 + list2

These lists do *not* have to have the same `type` or number of objects.

In [None]:
list3 = ["a", "b"]
list1 + list3

### Check-in

Use the `+` operator to combine the lists below, then use `join` to join the words into a complete sentence (with each word separated by a `" "`).

In [None]:
l1 = ['CSS', '201']
l2 = ['is', 'fun']
### Your code here

## Adding items to a `list`

In addition to using the `+` operator, you can add individual *items* to a list using the `append` function.

- Note that this modifies the list "in place", i.e., it doesn't *return* a value, but rather it mutates the existing `list` object.

In [None]:
fruits = ['apple', 'banana']
fruits.append('orange')
print(fruits)

### Filling up an empty `list`

The `append` function is often used to **fill up** a `list` with items, such as during a `for` loop.

For example, you might:

- Initialize an *empty* list.  
- Loop through numbers between `1` and `100`.
- Add those numbers to the empty list if they're odd.

In [None]:
new_list = [] ### Initialize empty list
for num in range(1, 101): ### Loop through range
    if num % 2 == 1: ### If number is odd...
        new_list.append(num) ### Append it to list
new_list[0:3] ### Get the first three elements of new list

### Check-in

Add the number `4` to the list below using `append`.

In [None]:
sample_list = [1, 2, 3]
### Your code here

### Check-in

The code cell below contains two lists: one contains a list of foods, the other contains a list of words with the letter "a". 

Using `append` and a `for` loop, add the items from `foods` to `a_words` if:

- they contain the letter "a".
- they don't already appear in `a_words`. 

In [None]:
foods = ['apple', 'banana', 'orange', 'kiwi', 'strawberry', 'mango', 'pineapple', 'berry']
a_words = ['board', 'table', 'apple', 'human']
### Your code here
for f in foods:
    if 'a' in f and f not in a_words:
        a_words.append(f)

print(a_words)

### Using `insert`

- The `append` function always adds items to the **end** of a list.  
- Instead, you can use `insert` to insert items at a specific location, such as the start.
- Syntax: `list_name.insert(position, item)`

In [None]:
sample_list = [2, 3, 4]
sample_list.insert(0, 1) ### insert a 1 at the zero-th position
print(sample_list)

## Removing items from a `list`

There are two primary ways to **remove** an item from a list.

- `pop`: this removes the item at a given index (by default, this is the *last* item), and also **returns** that item. 
- `remove`: this removes the first occurrence of a particular *value* from a `list`.

So, roughly:

- `pop` removes by *position*.  
- `remove` removes by *value*.  

### `pop`ping in action

The syntax for `pop` is straightforward: `list_name.pop()`

In [None]:
sample_list = [1, 2, 5, 7]
sample_list.pop() ### by default, returns final element

Now, if we look back at `sample_list`, we see that the final element has indeed been removed.

In [None]:
sample_list

### Check-in

What do you think would happen if we `pop` from an empty list?

In [None]:
empty_list = []
### what would happen if we call empty_list.pop()

### `remov`ing in action

The syntax for `remove` is also straightforward: `list_name.remove(value)`

- Where `value` is the value that you want to remove.  
- Note that unlike `pop`, `remove` does *not* return a particular value, but it does modify the list in place.

In [None]:
sample_list = [1, 2, 5, 7]
sample_list.remove(5)
print(sample_list)

### Check-in

What would happen to `test_list` if we call `test_list.remove("apple")`?

1. `['bread', 'apple', 'cheese', 'apple']`
2. `['bread', 'cheese', 'apple']`
3. `['bread', 'cheese']`

In [None]:
test_list = ['bread', 'apple', 'cheese', 'apple']
### Your code here

## Finding the index of a particular value

The `index` function allows you to return the index corresponding to the *first occurrence* of a particular value.

**Basic syntax**: `list_name.index(value)`

- Note that you can also (optionally) parameterize the `start` and `end` of this search: 
   - `list_name.index(value, start, end)`

In [None]:
test_list = ['bread', 'apple', 'cheese', 'apple', 'house', 'car', 'yard', 'apple']
test_list.index("bread")

In [None]:
### Returns *first* occurrence of "apple"
test_list.index("apple")

In [None]:
### Returns first occurrence of "apple", *after* index = 2
test_list.index("apple", 4, 8)

### Check-in

Use the `index` function to retrieve the index of the first occurrence of the number `10` in the list below.

In [None]:
number_list = [1, 10, 15, 20, 10, 55]
### Your code here

### Check-in

Use the `index` function to retrieve the index of the first occurrence of the number `10` between the indices `2` and `5` in the list below.

In [None]:
number_list = [1, 10, 15, 20, 10, 55, 10]
### Your code here

## `sort`ing a list

> **Sorting** a `list` means rearranging its elements according to some measure of "least" and "greatest".

There are many different [**algorithms** for sorting a list](https://en.wikipedia.org/wiki/Sorting_algorithm), which we won't cover in detail here.

However, in Python, there are two main *functions*:

- `sorted(list)`: returns a sorted version of a `list`.  
- `list.sort()`: sorts a particular `list` **in place**. 

In [None]:
number_list = [2, 1, 9, 5, 3, 4]
sorted_list = sorted(number_list)
sorted_list

In [None]:
number_list = [2, 1, 9, 5, 3, 4]
number_list.sort()
number_list

### Ascending vs. descending?

- By default, `sorted` will sort a list in **ascending** order.
- The `reverse` key allows you to instead sort that list in **descending** order (i.e., largest elements first).



In [None]:
number_list = [2, 1, 9, 5, 3, 4]
sorted_list = sorted(number_list, reverse = True)
sorted_list

### Check-in

The list `names` below is unsorted. Use the `sorted` function to return a new list with the names sorted, in **descending** order.

In [None]:
names = ['Umberto', 'Will', 'Sean', 'Eileen', 'Sam']
names.sort()
names

## Nested lists

A `list` can contain many different `type`s of objects: `str`, `int`, and even other `list`s!

- Each **nested list** can contain further nested lists, or other types of objects.  
- Nested lists do not have to be the same length.

In [None]:
nested_list = [[1, 2, 3],
              ['css', 'poli', 'econ'],
              ['tea', 'coffee'],
              'text',
              [1, 2, 3, 4]]
nested_list[4]

### Check-in

What would `len(nested_list)` return?

In [None]:
nested_list = [[1, 2, 3],
              ['css', 'poli', 'econ'],
              ['tea', 'coffee']]
## Your answer here

### Check-in

Write a `for` loop that iterates through each item in `nested_list`, and prints its length.

In [None]:
nested_list = [[1, 2, 3],
              ['css', 'poli', 'econ'],
              ['tea', 'coffee']]
### Your code here

## Lists vs. tuples

So far, we've focused on **lists**.

A **tuple** is another type of ordered sequence. They share several similarities with lists:

- You can index into both a **tuple** and a **list**.  
- You can loop through both a **tuple** and a **list**. 

However, there are also a couple key differences:

- Tuples are declared using `()`, not `[]`.  
- Unlike `list`s, a `tuple` is not mutable (i.e., it can't be changed in place).

In [None]:
example_tuple = (1, 2, 3)
example_tuple

### Tuples (continued)

- We won't focus *too much* on tuples for now.  
- However, I wanted to highlight some of those similarities and differences. 
- It's likely that at some point in your journey with Python, you'll end up using or encountering tuples.

In [None]:
for i in example_tuple:
    print(i)

## Conclusion

In this lecture we learned:

1. `strings`
1. `lists` 
1. And how to operate with these objects

Next lecture:

1. `dictionaries`
1. `functions`

# Dictionaries

## What is a dictionary?

> In Python, a **dictionary**, or `dict`, is a mutable collection of items, which stores **key/value** pairings.

Key features:

- **Mutable**: dictionaries can be updated.  
- **Collection**: like a `list`, dictionaries can contain multiple *entries*.  
- **Key/value pairings**: unlike a `list`, dictionary entries consist of a *key* (i.e., how you *index* into that entry), and its *value* (i.e., what it maps onto). 

### Simple example of a `dict`

A dictionary is very useful for storing **structured information**. 

In [None]:
person = {'Name': 'Smarty Student',
          'Occupation': 'UCSD Grad Student',
          'Location': 'San Diego'}
print(type(person))
print(person)

They also make it really easy to **access** that information. 

In [None]:
print(person['Name'])
print(person['Occupation'])

### `dict` vs. `list`

We could store the same information in a `list`, but it would be a little harder to work with.

In [None]:
person_list = ['Smarty Student', 
               'UCSD Grad Student',
               'San Diego']

To access the information, we have to remember **where** a particular value was stored. This is harder to do, especially if there's not any intrinsic ordering to the values.

In [None]:
print(person_list[0])

### Rules about keys and values

- A `dict` cannot contain **duplicate keys**. That is, all keys must be unique.  
- However, multiple keys can have the same **value**.

In [None]:
## Different keys, same value
fruits = {'apple': 25, 
         'banana': 25}
fruits

## How do you create a dictionary?

A dictionary (`dict`) can be created with curly brackets `{}`, along with the syntax `{key_name:value}`.

In [None]:
simple_dict = {'a': 1,
              'b': 2,
              'c': 1}
simple_dict

In [None]:
simple_dict['d']

### Keys vs. values

**Keys** are your access-point into a dictionary. 

- Must be an immutable type (e.g., a `str` or `int`); they *can't* be a `list`.  
- Not all keys must be of same `type`.

**Values** are what the keys *map onto*.  

- Values can be anything: a `str`, `int`, `list`, or even another `dict`.

In [None]:
allowable_dict = {'a': [1, 2, 3]}
allowable_dict['a']

In [None]:
bad_key = {[1, 2, 3]: 'a'}

### Dictionary length

The `len` of a `dict` is the number of **keys** that it has (*not* the number of values).

In [None]:
allowable_dict = {'a': [1, 2, 3],
                 'b': [2, 3, 4, 5, 6, 8]}
len(allowable_dict)

### Check-in

What would the `len` of the dictionary below?

In [None]:
test_dict = {'Artist': 'The Beatles',
            'Songs': ['Hey Jude', 'Revolution', 'In My Life']}
### Your code here

### Check-in

What would the `len` of the dictionary below?

In [None]:
test_dict = {'name': 'John',
            'items': {'food': 'sandwich',
                     'money': '$40'}}
### Your code here

## Indexing into a dictionary

Once you've created a dictionary, you'll want to **access** the items in it.

- An advantage of a `dict` (over a `list`) is that key/value pairings are inherently **structured**.  
- So rather than indexing by *position*, you can index by *key*.

The syntax for indexing is: `dict_name[key_name]`. 

In [None]:
person = {'Name': 'Smarty Student',
          'Occupation': 'UCSD Grad',
          'Location': 'San Diego'}
print(person['Name'])

In [None]:
print(person['Location'])
print(person['Occupation'])

### Check-in

How would you retrieve the value `25` from the dictionary below?

In [None]:
test_dict = {'apple': 25,
            'banana': 37}
### Your code here

### Indexing requires a key

To index into a `dict`, you **need to use the key**.

- The *position* of a value will not work.  
- The *value* itself will also not work.

In [None]:
test_dict[0] ### will throw an error

In [None]:
test_dict[25] ### will throw an error

## Updating a `dict`

Once you've created a `dict`, it's not set in stone––there are multiple ways to **modify** that dictionary.

- Adding new entries.  
- Deleting existing entries.  
- Combining two dictionaries.

### Adding new entries

In [None]:
## First, let's create a new dictionary
registrar = {'Mignozzetti': 'POLI', 
             'Trott': 'COGS'}
print(registrar)

We can add a new entry using the `dict_name[key_name] = new_value` syntax.

In [None]:
## Now we add a new entry to the dictionary
registrar['Styler'] = 'LING'
print(registrar)

### Check-in

Add an entry for the price of `"pasta"` to `prices_dict` below using this new syntax. 

In [None]:
prices_dict = {'rice': 4, 'bananas': 3}
### Your code here

### Check-in

What would the `len` of `prices_dict` be after you've added that entry?

In [None]:
### How long is prices_dict after you've added "pasta"?
len(prices_dict)

### Deleting entries

We can also use the `del` function to delete specific key/value pairs from a dictionary.

In [None]:
## First, we create a new dictionary.
attendance = {'A1': True, 'A2': False}
print(attendance)

In [None]:
## Then, we delete the entry with the "A2" key.
del(attendance['A2'])
print(attendance)

### Merging dictionaries using `update`

What if we have **two different dictionaries** that we want to combine or *merge*? 

The `update` function can be used to do this.

In [None]:
## First, we create a new dictionary.
registrar = {'Mignozzetti': 'POLI', 
             'Trott': 'COGS'}
print(registrar)

In [None]:
## Now, we define another dictionary with more info.
registrar_other = {'Styler': 'LING',
                   'Mignozzetti': ['POLI', 'CSS'],
                   'Rangel': 'COGS'}
## Finally, we "update" original registrar
registrar.update(registrar_other)

In [None]:
print(registrar)

### Check-in

Recall that a dictionary cannot contain **duplicate keys**. What do you think would happen to `original_dict` if we ran the code below?

In [None]:
original_dict = {'a': 1, 'b': 3}
new_dict = {'a': 2}
original_dict.update(new_dict)
### What happens to original_dict['a']?
original_dict

#### Updating with duplicate keys

If we `update` a dictionary with another dictionary that contains **overlapping keys**, the **new values** replace the old values.

In [None]:
original_dict = {'a': 1, 'b': 3}
new_dict = {'a': 2}
original_dict.update(new_dict)
print(original_dict['a'])

## Iterating through a `dict`

Dictionaries are **structured** collections of **key/value pairings**.

As such, there are several ways to iterate (i.e., **loop**) through a `dict`:

- Iterating through a `list` of **keys** (`.keys()`).  
- Iterating through a `list` of **values** (`.values()`). 
- Iterating through a `list` of **key/value** `tuples` (`.items()`).

### Looping through keys with `.keys()`

Each dictionary can be thought of as a `list` of **keys**; each key in turn maps onto some **value**.

We can retrieve that `list` of keys using `dict_name.keys()`.

In [None]:
courses = {'CSS 201': 'Introduction to Computational Social Science',
           'CSS 202': 'Computational Social Science Technical Bootcamp',
           'CSS 296': 'Research in Computational Social Science'}
courses.keys()

This `dict_keys` object behaves like a `list`: we can index into it, loop through it, and so on.

In [None]:
for abr in courses.keys():
    print(abr)

#### Check-in

How could we retrieve each **value** of the `dict` using `keys()`?

In [None]:
### Your code here

#### Retrieving values

Because each key maps onto a **value**, we can simply use it to index into `courses`.

In [None]:
for course in courses.keys():
    ## Index into courses
    name = courses[course]
    print(name)

### Looping through values with `.values()`

We can also retrieve the **values** directly using `dict_name.values()`.

In [None]:
courses.values()
for abr in courses:
    print(abr)

In [None]:
for course_name in courses.values():
    print(course_name)

### Looping through key/value pairings with `.items()`

Dictionaries are, at their core, a list of **key/value pairings**. 

- We can access each of these using `dict_name.items()`.  
- `items()` returns a `list` of `tuples`:
  - The first element of each `tuple` is the **key**.
  - The second element of each `tuple` is the **value**.

In [None]:
print(list(courses.keys()))
for key, value in courses.items():
    print(value + ' is abbreviated as ' + key)

#### Assignment "unpacking"

- We can access each element of the `tuple` using indexing, e.g., `item[0]` or `item[1]`.  
- However, sometimes it's more convenient to **unpack** these elements directly in the `for` loop itself.

In [None]:
for code, name in courses.items():
    print(code)
    print(name)

#### Converting back to a `dict`

We can use the `dict` function to convert a list of **items** back to a `dict`.

In [None]:
items = courses.items()
print(items)

In [None]:
course_dict = dict(items)
print(course_dict)

### Check-in: Looping through values

Use the `.items()` function to loop through `fruits_dict` below. `print` out each item in a formatted string using `format`: 

`{fruit_name}: {price}`. 

In [None]:
fruits_dict = {'apple': 2, 'banana': 3}
### Your code here
for k, v in fruits_dict.items():
    print('There are ' + str(v) + ' ' + k)

### Check-in: Debug

Suppose someone writes a piece of code (see below) to loop through `fruits_dict`. Ultimately, they want to print out the price of each fruit. 

However, they keep running into an error. Can you figure out what they're doing wrong? And further, could you suggest a way to fix it?

In [None]:
### Why is this throwing an error?
for fruit in fruits_dict.keys():
    print(fruits_dict[fruit])

## Nested dictionaries

> A **nested dictionary** is a dictionary contained inside another dictionary, i.e., as a **value**.  

In principle, there is no limit on how many nested dictionaries can be contained in a `dict` (besides memory capacity on one's computer).

- A nested dictionary is useful when you want to store **complex information** in each entry.  
- So far, we've dealt mostly with very simple key/value entries.  
- But what if we wanted to represent more complicated information?

Example, for each student in CSS (or COGS, etc.), store:

- `username`.
- `Name`.  
- `Courses` (a `list`). 
- `College`
- `Major`. 

### Check-in (conceptual)

What would be a useful `dict` structure to represent information about instructors? For example, say we wanted to represent:

- `username` (e.g., `sstudent`)
- `Name` (e.g., `Smarty Student`)
- `Courses` (e.g., `['CSS 1', ...]`)
- `College` (e.g., `ERC`)
- `Major` (e.g., `Psychology`)

### A possible implementation

One approach is to use **nested dictionaries**.

- At the top level, each instructor is represented by their `username`.  
- Each PID then maps onto a nested dictionary, which contains their `Name`, `Email`, and any other info we need.

In [None]:
student = {
    'sstudent': {'Name': 'Smarty Student',
                'Courses': ['COGS 14A', 'CSS 1', 'CSS 2'],
               'College': 'ERC',
               'Major': 'Psychology'},
    'jdoe': {'Name': 'John Doe',
                'Courses': ['COGS 18', 'CSS 1'],
               'College': 'Revelle',
               'Major': 'Undeclared'},
    'jlopez': {'Name': 'Jane Lopez',
                'Courses': ['LING 6', 'LING 101'],
               'College': 'Revelle',
               'Major': 'Linguistics'},
}

### Indexing our nested `dict`

We can index into this `dict` as we would normally. Note that now, the **value** is itself a `dict`.

In [None]:
student['jlopez']

#### Check-in

How might we index the `College` of a particular student? I.e., what if we wanted to find out the `College` of `jdoe`?

In [None]:
### Your code here

#### Nested indices

Indexing into a **nested dictionary** follows the same logic––we can *chain together* index statements to retrieve a particular value.

In [None]:
student['jdoe']['College']

In [None]:
student['jlopez']['Courses'][1]

### Check-in

How would you retrieve the list of `username`s (i.e., keys) in this `dict`?

In [None]:
### Your code here

# Functions

## What is a function?

> A **function** is a re-usable piece of code that performs some operation (typically on some *input*), and then typically returns a result (i.e., an *output*). 

Breaking this down:

- **Input**: a variable defined by the user that is *passed into* a function using the `(input)` syntax.
   - Also called an **argument**.
   - Functions can have multiple **arguments**.
- **Output**: the variable **returned** by a function after this operation is performed.  
   - If a `return` value is not specified, a function will return `None`.

### A very simple function

We'll explore the syntax more in a bit, but this will give you a sense for what we're talking about.

In [None]:
def square(x):
    """Returns the square of X."""
    return x**2

In [None]:
square(1)

In [None]:
square(2)

## Why functions?

In principle, we could just rewrite the same code each time we want to execute that operation. So why bother defining functions at all?

The answer lies in **modular programming**.

- As operations become more and more complex, it becomes unwieldy (and just inefficient) to copy/paste the *same code* again and again.  
- In modular programming, we emphasize building **re-usable chunks of code**.
- Functions (and loops) are ways to re-use chunks of code that solve basic, recurring problems.

Learning to think in a modular way can be hard! But it's a helpful approach to **breaking down a problem into its sub-components**.

### Functions we've encountered

We've already encountered a number of functions in this course.

#### `print`

- Input: something to `print`.  
- Output: technically, `None`.  
- "Side effects": `print`s out input to designated log (by default, the terminal/Jupyter cell).

In [None]:
print("Hello!")

#### `sorted`

- Input: a `list` 
- Output: a sorted `list`.

In [None]:
unsorted = [2, 1, 5]
sorted(unsorted)

## Defining a function

In Python, a new function can be created or **defined** using the `def` keyword, followed by the name of the function.

See the `square` function definition below:

- Function name: `square`. 
- Function arguments: `x`.  
- Function `return`: `x ** 2`.  

In [None]:
def square(x):
    """Returns the square of X."""
    return x**2

### Executing a function

To **execute** a function, we can reference the function name (like a variable), followed by the parentheses `()` and any arguments/input for the function.

In [None]:
## Function name = square
## Input = 2
square(2)

In [None]:
## Function name = square
## Input = 4
square(4)

### What type is a function?

A function belongs to a special `type` in Python, called `function`.

In [None]:
type(square)

### A more complex function

What if we wanted a function that did the following:

- `if` the input `x` is **even**, we square it.  
- `if` the input `x` is **odd**, we just `return` that number.

In [None]:
def square_if_even(x):
    """Squares x if x is even; otherwise return x."""
    if x % 2 == 0: ## check if even
        return x ** 2 ## if so, return square
    else: ## otherwise..
        return x ## just return x

In [None]:
## 2 is even, so square it
square_if_even(2)

In [None]:
## 3 is odd, so just return it
square_if_even(3)

### Another more complex function

So far, our functions have only had a **single argument**. But functions can take in *many* arguments. 

Let's define a function with *two inputs*, which just adds those inputs together.

In [None]:
def add_two_numbers(num1, num2):
    """Adds num1 to num2."""
    return num1 + num2

In [None]:
add_two_numbers(1, 2)

In [None]:
add_two_numbers(5, 3)

### Check-in

What would the function below produce if the input `x` was `25`?

More generally: how would you describe what this function *does*? 

In [None]:
def mystery_func(x):
    if x % 5 == 0:
        return True
    return False

### Solution

`mystery_func` can be thought of as a binary "check" for whether a particular number is divisible by `5`. 

In [None]:
mystery_func(25)

In [None]:
mystery_func(28)

### Check-in

Write a function that takes a `name` as input and `return`s the formatted `str`: `"My name is {name}."`

The code below can get you started:

```
def hello(name):
### your code here
```

In [None]:
# Your code here

## Function arguments: the details

Beyond the basics, there are several other important things to know about the **arguments** for a function:

- It's important to be aware of what `type` your function expects as an argument.
- Arguments can have **default values**.  
- Some arguments can be accessed with a **keyword**, while others are **positional** arguments.

### Argument `type`

Some languages, like Java, require that you specify the `type` of an argument (and variable names, etc.).

Python doesn't require that, but it's still important to be aware of.

- Otherwise, you can run into a `TypeError`.
- If you're interested: Python uses something called [duck typing](https://en.wikipedia.org/wiki/Duck_typing). 

#### Example of a `TypeError`

Here, the `square` function performs an operation with `x` that requires `x` to be an `int`.

In [None]:
def square(x):
    return x ** 2
square("two")

#### How to avoid a `TypeError`?

In practice, the best way to avoid a `TypeError` is to **document your code**. 

- In the `docstring` under a function, you can write details about what the function expects, e.g., whether the input is an `int`, a `str`, etc.

In [None]:
def square(x):
    """
    Parameters
    ------
    x: int or float
      number to be squared
    
    Returns
    -------
    int or float
      square of x
    """
    return x ** 2

square(0.5)

#### Check-in

Will the function below result in an error if you called it on the input `"test"`? Why or why not?

In [None]:
def mystery_func(x):
    return x ** 3

### Default values

> A **default value** is the value taken on by an argument *by default*. If no other value is specified, this is the value assumed by the function.

In the function definition, a default value can be specified by setting: `arg_name = default_value`.

- In the example below, `name` is required.
- But `major` has a default value of `"COGS"`.

In [None]:
def my_info(name, major = "COGS"):
    return "My name is {name}, and my major is {major}.".format(name = name, major = major)

Even if we don't specify a value for `major`, the function will run just fine––it just uses the default value.

In [None]:
my_info("Mary")

#### Overriding a default value

A default value can be overridden in the call to the function itself. 

- Note that this can reference the argument name (`major`), or just occupy the correct **position** in the series of arguments. (More on this later.)

In [None]:
my_info("Umberto", major = "LIGN")

In [None]:
my_info(major = "LIGN", name = "Sean")

#### Arguments without a default must be referenced!

If an argument *doesn't* have a default, the function will throw an error if you don't pass in enough arguments.

In [None]:
my_info()

#### Check-in

Why does the following code not throw an error?

In [None]:
my_info("POLI")

### Positional vs. keyword arguments

An argument to a function can be indicated using either:

- Its **position**, i.e., in the list of possible arguments.
- A **keyword**, i.e., the *name* of that argument.

A **positional argument** uses the relative position of the arguments to determine which is which. 

In [None]:
def exponentiate(num, exp):
    return num ** exp

In [None]:
## Raise 2 ^ 3
exponentiate(2, 3)

In [None]:
## Raise 3 ^ 2
exponentiate(3, 2)

A **keyword argument** uses the *name* of the argument to determine which is which. 

- Even if the positions are swapped, the *keyword* will take priority. 
- (Note that the best practice is to keep the order consistent, however.)

In [None]:
## Raise 2 ^ 3
exponentiate(num = 2, exp = 3)

In [None]:
## Raise 2 ^ 3
exponentiate(exp = 3, num = 2)

#### Position before keyword

- Once you've used a keyword argument, you can't rely on **position** for any arguments coming after that keyword. This will throw a `SyntaxError`.
- However, a **positional argument** can come before a **keyword argument**.


In [None]:
## This is incorrect
exponentiate(num = 2, 3)

In [None]:
## This is fine
exponentiate(2, exp = 3)

## Conclusion

This concludes our initial introduction to **functions**. If there is time, there are also two more challenging practice problems below to work on.

## Practice problems

### Problem 1

Write a function called `fizzbuzz`. It should take in a single argument, `x`, and follow this behavior:

- If `x` is divisible by both `3` and `5`, return the `str` `"fizzbuzz"`. 
- If `x` is divisible by only `3` (and not `5`), return `"fizz"`).
- If `x` is divisible by only `5` (and not `3`), return `"buzz"`).

Note: this is part of a famous problem in **coding interviews**!

In [None]:
def fizzbuzz(x):
    pass

### Problem 2

Write a function called **product**, which takes a `list` (`lst`) as input, and returns the **product** of every item in the list.

In [None]:
L = [2, 3, 4, 5]

def product(lst):
    pass

## Returning multiple values

Functions can `return` multiple values, or even another function. 

This can be useful when:

- The goal of a `function` can't be distilled into a single value.  
- You want to `return` multiple bits of information about something, e.g., its `len`, its value, and so on.  

Multiple values can be separated with a `,`.

### Multiple `return` values: an example

Suppose we wanted a function that takes two numbers as input, and returns both:

- Their sum.  
- Their product.

In [None]:
def sum_product(a, b):
    sumvar = a + b
    prod = a * b
    L = [sumvar, prod]
    return L, sumvar, prod

In [None]:
l, s, p = sum_product(10, 200)
print(l)
print(s)
print(p)

### Check-in

What do you notice about the `type` of the object that gets returned when a function returns *multiple values*?

In [None]:
sum_product(5, 2)

### `return` and `tuple`s

By default, a `function` will package these multiple values into a `tuple`.

- It's possible to return them in another form, e.g., in a structured dictionary. 
- But if you use the `return a, b` syntax, `a` and `b` will returned like: `(a, b)`

## Namespaces

### What is a namespace?

> A [**namespace**](https://realpython.com/python-namespaces-scope/) is the "space" where a given set of variable names have been *declared*.

Python has several types of namespaces:

1. **Built-in**: Built-in objects within Python (e.g., **Exceptions**, **lists**, and more). These can be accessed from anywhere.  
2. **Global**: Any objects defined in the main program. These can be accessed anywhere in the main program once you've defined them, but not in another Jupyter notebook, etc.
3. **Local**: If you define new variables within a *function*, those variables can only be accessed within the "scope" of that function.

### The global namespace

So far, we've mostly been working with variables defined in the **global namespace**.

- I.e., once we define a variable in a notebook (and run that cell), we can reference it in another cell.

In [None]:
## define global variable
my_var = 2

In [None]:
## reference global variable
print(my_var)

### Functions have their own namespace

If you declare a variable **within** a function definition, that variable does *not* persist outside the scope of that function.

In the function below, we declare a new variable called `answer`, which is eventually `return`ed.

- However, the **variable itself** does not exist outside the function.

In [None]:
def exponentiate(num, exp):
    ### "answer" is a new variable 
    answer = num ** exp
    return answer

In [None]:
exponentiate(3, 2)
### This will throw an error
print(answer)

### Global variables *can* be referenced inside a function

If you've defined a variable in the global namespace, you *can* reference it inside a function.

- **Word of caution ⚠️**: this can make for confusing code. 

In [None]:
## define global variable
my_var = 2
## define function
def add_two(x):
    ## references my_var
    return x + my_var

add_two(2)

### Check-in

What would value of `new_var` be after running the code below?

What about `test_var`?

In [None]:
test_var = 2
def test_func(x):
    test_var = x ** 2
    return test_var

new_var = test_func(5)

### Using `whos`

Remember that you can check which variables are defined using `whos`.

**Warning**: It works on IPython and Jypyter Notebooks. If you open a python script in your computer, it is probably not going to work.

In [None]:
whos

## `lambda` functions

So far, we've focused on creating functions using the `def func_name(...)` syntax.

However, Python also has something called [**lambda functions**](https://www.w3schools.com/python/python_lambda.asp). 

- Syntax: `lambda x: ...`. 
- Main advantage: can be written in a single line, best if you want a **simple function**.  
   - Excellent for passing as *arguments* into other functions, such as `sorted`.

In [None]:
square = lambda x: x ** 2
print(square(2))
print(square(4))

In theory, `lambda` functions can have multiple arguments.

In [None]:
exp = lambda x, y: x ** y
print(exp(2, 3))

### Check-in

Convert the function below into a `lambda` function.

In [None]:
def add_one(x):
    ## Adds 1 to x
    return x + 1

### Your code here

### `lambda`: summary

- `lambda` is an easy, efficient way to define a simple function.  
- In practice, `lambda` is most useful when defining functions "on the fly".
   - As **arguments** to pass into another function.
   - As **nested functions** within another function. 

## Varying number of arguments

So far, we've assumed that we *know* how many arguments will be passed into a function at any given time. But this isn't always the case.

Fortunately, Python gives us two ways to handle an **arbitrary number** of arguments:

- `*args`: allows a `function` to receive an arbitrary number of (positional) arguments, which can be "unpacked" as needed. The function treats them as a `tuple`. 
- `**kwargs`: allows a `function` to receive a `dictionary` of (keyword) arguments, which can be "unpacked" as needed. 

### `*args` in practice

The `*args` syntax allows you to input an arbitrary number of arguments into a function.

In [None]:
def my_function(*fruits):
    print("The last fruit is " + fruits[-1] + ".")

In [None]:
my_function("strawberry")

In [None]:
my_function("strawberry", "apple")

#### Check-in

How exactly is this working? That is, what is `my_function` treating `*fruits` as? 

Try `print`ing out `fruits` to see what's going on.

In [None]:
### Your code here

### `**kargs` in practice

The `*kwargs` is similar to `*args`, but allows for an arbitrary number of **keyword arguments**.

- These are treated as a `dict` by the function.

In [None]:
def my_bad_function(*fruits):
    print('I have ' + str(fruits[1]) + ' ' + str(fruits[0]))

def my_function(**fruits):
    print('I have ' + str(fruits['amount']) + ' ' + fruits['name'])
    if (fruits['ripe']): print('And they are ripe!')

In [None]:
### Keyword and value are automatically placed into dictionary
my_function(amount = 5, name = "apple", ripe = False)
my_bad_function(5, "apple")

In [None]:
### The specific keyword can be altered as needed
my_function(name = "banana", cost = 10)

#### Why use this?

In general, `**kwargs` is useful when you want **flexibility**.

For example, suppose you have a website, in which people can (optionally) fill out the following information:

- `Name`. 
- `Email`. 
- `Phone number`.
- `Location`.

But because not everyone fills out *every field*, the function you use to store this information needs to be flexible about how many arguments it receives.

In [None]:
def store_user(**info):
    ## For now, this is just a placeholder to demonstrate
    for item in info.items():
        print(item)

In [None]:
store_user(Name = "John", Location = "San Diego", Email = 'john@ucsd.edu')

## Practice problems

One of the best ways to learn a new concept is to actually practice it. Thus, I'm including a number of practice problems at the end of this lecture, which we'll work through.

### Problem 1: find the maximum number of a `list`

Goal: write a function that takes in a `list` of numbers as input, and finds the **maximum** of the `list`.  

The catch: you can't use the operator `max`. 

Things to consider:

- If the input `list` is empty, you should return `None`.  
- Since you can't use `max`, you might consider using a `for` loop, checking the value of each number in turn.

In [None]:
### Your code here

### Problem 2: find the maximum number in a set of `*args`

Goal: write a function that takes in an arbitrary number of arguments (i.e., uses `*args`), and finds the maximum.

The catch: you can't use the operator `max`. 

Things to consider:

- If there are no arguments, you should return `None`.  
- Since you can't use `max`, you might consider using a `for` loop, checking the value of each number in turn.

In [None]:
### Your code here

### Problem 3: find the even numbers

Goal: write a function that takes in a `list` of numbers, and prints the even ones.

In [None]:
### Your code here

### Problem 4: find the tallest in a dictionary.

Suppose we want a `function` that takes in a `dict` of `Names` and `Heights`. That is, each *key* is a `Name`, and it maps onto a `Height`.

We want the function to return the `Name` of the person with the largest `Height`, *as well as* the `Height` itself.

In [None]:
## Can't just max...that'll return "Sean"
heights = {'Sean': 67, 'Ben': 72, 'Anne': 66}
### Your code here

## Conclusion

We learned so far:

- Operators, assignment, flow control (`if-elif-else`, `for`, `while`)
- `strings`, `lists` 
- `dictionaries`, `functions`

Next class we are going to learn:

- `classes` and object-oriented programming
- Reading files
- `numpy` basics