### Strings

*f-string* provides a simple way to substitute values into string

In [1]:
first_name = "Tucker"
last_name = "Carney"

In [4]:
full_name1 = first_name + " " + last_name #string addition
full_name2 = "{0} {1}".format(first_name, last_name) #string.format

print(full_name1)
print(full_name2)

Tucker Carney
Tucker Carney


but the f-string way is much less unwieldy

In [5]:
full_name3 = f"{first_name} {last_name}"
print(full_name3)

Tucker Carney


### Exceptions

In [6]:
try:
    print(0 / 0)
except ZeroDivisionError:
    print("cannot divide by zero")

cannot divide by zero


### Lists

In [9]:
integer_list = [1, 2, 3]
heterogenous_list = ["string", 0.1, True]
list_of_lists = [integer_list, heterogenous_list, []]

In [10]:
list_length = len(integer_list) # equals 3
list_sum = sum(integer_list) # equals 6

you can get the *n*th element of a list with square brackets

In [11]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

zero = x[0]         # equals 0, lists are 0-indexed
one = x[1]          # equals 1
nine = x[-1]        # equals 9, 'Pythonic' for last element
eight = x[-2]       # equals 8, 'Pythonic' for second to last element
x[0] = -1           # now x is [-1, 1, 2, ..., 9]

You can also use square brackets to *slice* lists 
* The slice i:j means all elements from i (inclusive) to j (not inclusive)
* If you leave off the start of the slice, you'll slice from the beginning of the list
* If you leave off the end of the slice, you'll slice to the end of the list

In [12]:
first_three = x[:3]              # [-1, 1, 2]
three_to_end = x[3:]             # [3, 4, ..., 9]
one_to_four = x[1:5]             # [1, 2, 3, 4]
last_three = x[-3:]              # [7, 8, 9]
without_first_and_last = x[1:-1] # [1, 2, ..., 8]
copy_of_x = x[:]                 # [-1, 1, 2, ..., 9]

You can similarly slice strings and other "sequential" types

A slice can take a 3rd argument to indicate its *stride* which can be negative:

In [13]:
every_third = x[::3]          # [-1, 3, 6, 9]
five_to_three = x[5:2:-1]     # [5, 4, 3]

Python has an **in** operator to check for list membership:

In [14]:
1 in [1, 2, 3]   # True
0 in [1, 2, 3]   # False

False

If you want to modify a list in place you can use **extend** to add items from another collection

In [15]:
x = [1, 2, 3]
x.extend([4, 5, 6])     # x is now [1, 2, 3, 4, 5, 6]

If you don't want to modify x, you can use list addition

In [16]:
x = [1, 2, 3]
y = x + [4 + 5 + 6]    # y is [1, 2, 3, 4, 5, 6], x is unchanged

It is often convenient to *unpack* lists when you know how many elements they contain:

In [17]:
x, y = [1, 2]     # now x is 1, y is 2

A common idiom is to use an underscore for a value you're going to throw away:

In [18]:
_, y = [1, 2]

### Tuples

Tuples are like immutable lists

In [1]:
my_list = [1, 2]
my_tuple = (1, 2)

my_list[1] = 3

try:
    my_tuple[1] = 3
except TypeError:
    print("cannot modify a tuple")

cannot modify a tuple


In [3]:
#multiple assignment
x, y = 1, 2
#pythonic way to swap elements
x, y = y, x

### Dictionaries

In [4]:
empty_dict = {}                  # Pythonic
empty_dict2 = dict()             # less Pythonic
grades = {"Joel": 80, "Tim": 95} # dictionary literal

check for the existence of a key using **in**

In [5]:
joel_has_grade = "Joel" in grades   # True
kate_has_grade = "Kate" in grades   # False

use **get** method to return a default value (instead of raising an exception)

In [6]:
joels_grade = grades.get("Joel", 0)  # equals 80
kates_grade = grades.get("Kate", 0)  # equals 0
no_ones_grade = grades.get("No one") # equals None

use dictionaries to represent structured data

In [9]:
tweet = {
    "user": "joelgrus",
    "text" : "Data Science is Awesome",
    "retweet_count": 100,
    "hashtags": ["#data", "#science", "#datascience", "#awesome", "#yolo"]
}

In [11]:
tweet_keys = tweet.keys()      # iterable for the keys
tweet_values = tweet.values()  # iterable for the values
tweet_items = tweet.items()    # iterable for the (key, value) tuples

"user" in tweet_keys           # True, but not Pythonic
"user" in tweet                # Pythonic way for checking for keys
"joelgrus" in tweet_values     # True (slow but the only way to check)

True

### defaultdict

like dictionaries, except when you try to look up a key it doesn't contain, it first adds a value to it using a provided argument

In [14]:
from collections import defaultdict
document = []

word_counts = defaultdict(int)      # int() produces 0
for word in document:
    word_count += 1

can also be useful with **list** or **dict** or custom functions

In [17]:
dd_list = defaultdict(list)              # list() produces an empty list
dd_list[2].append(1)                     # now dd_list contains {2: [1]}

dd_dict = defaultdict(dict)              # dict() produces an empty list
dd_dict["Joel"]["City"] = "Seattle"      # {"Joel": {"City": "Seattle"}}

dd_pair = defaultdict(lambda: [0, 0])
dd_pair[2][1] = 1                        # now dd_pair contains {2: [0, 1]}

### Counters

a **Counter** turns a sequence of values into **defaultdict(int)**-like object mapping keys to counts

In [18]:
from collections import Counter
c = Counter([0, 1, 2, 0])            # c is (basically) {0: 2, 1: 1, 2: 1}

gives a very simple solution to the word_counts problem:

In [19]:
word_counts = Counter(document)

In [20]:
# print the 10 most common words and their counts:

for word, count in word_counts.most_common(10):
    print(word, count)

### Sets

represents a collection of *distinct* elements

In [22]:
primes_below_ten = {2, 3, 5, 7}

In [23]:
s = set()     # empty set
s.add(1)      # s is now {1}
s.add(2)      # s is now {1, 2}
s.add(2)      # s is still {1, 2}

* sets have very fast **in** operations
* useful for finding all distinct items in a collection

In [24]:
item_list = [1, 2, 3, 1, 2, 3]
num_items = len(item_list)          # 6
item_set = set(item_list)           # {1, 2, 3}
num_distinct_items = len(item_set)  # 3

### Control Flow

In [25]:
#ternary if-then-else one liner
parity = "even" if x % 2 == 0 else "odd"

In [26]:
# for in loops
for x in range(10):                #range (10) is the numbers 0, 1, 2, ..., 9
    print(f"{x} is less than 10")

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


### Truthiness

the following are all "falsy" in Python
* False
* None
* [] (an empty list)
* {} (an empty dict)
* ""
* set()
* 0
* 0.0

In [28]:
all([True, 1, {3}])  # True, all are truthy
all([True, 1, {}])   # False, {} is falsy

any([True, 1, {}])   # True, True is truthy

all([])              # True, no falsy elements in the list
any([])              # False, no truthy elements in the list

False

### Sorting

* **sort** sorts a list in place
* **sorted** returns a new sorted list

In [29]:
x = [4, 1, 3, 2]
y = sorted(x)     # y is [1, 2, 3, 4], x is unchanged
x.sort()          # x is now [1, 2, 3, 4]

In [31]:
# sort the list by absolute value from largest to smallest
x = sorted([-4, 1, -2, 3], key=abs, reverse=True)
print(x)

[-4, 3, -2, 1]


In [32]:
# sort the words and counts from highest count to lowest
wc = sorted(word_counts.items(),
            key=lambda word_and_count: word_and_count[1],
            reverse=True)

### List Comprehensions

In [33]:
even_numbers = [x for x in range(5) if x% 2 == 0]   # [0, 2, 4]
squares      = [x * x for x in range(5)]            # [0, 1, 4, 9, 16]
even_squares = [x * x for x in even_numbers]        # [0, 4, 16]

you can also turn lists into dictionaries or sets

In [34]:
square_dict = {x: x * x for x in range(5)}    # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
square_set = {x * x for x in [1, -1]}         # {1}

In [35]:
zeros = [0 for _ in even_numbers]  # list of zeros with same length as even_numbers

a list comprehension can use multiple **for**s

In [36]:
pairs = [(x, y)
         for x in range(10)
         for y in range(10)]   #100 pairs (0,0), (0, 1), ... (9, 8), (9, 9)

and later **for**s can use the results of earlier ones

In [38]:
increasing_pairs = [(x, y)
                    for x in range(10)
                    for y in range(x+1, 10)] # only pairs with x < y, range(lo, hi) equals [lo, lo + 1, ..., hi - 1]
print(increasing_pairs)

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 6), (5, 7), (5, 8), (5, 9), (6, 7), (6, 8), (6, 9), (7, 8), (7, 9), (8, 9)]


### Automated Testing and assert

**assert** statements cause your code to raise an **AssertionError** if your specified condition is not truthy

In [39]:
assert 1 + 1 == 2
assert 1 + 1 == 2, "1 + 1 should equal 2 but didn't"

In [40]:
def smallest_item(xs):
    return min(xs)

assert smallest_item([10, 25, 5, 40]) == 5
assert smallest_item([1, 0, -1, 2]) == -1

### Object Oriented Programming

* like many other languages, Python allows you to define *classes* that encapsulate data and the functions that operate on them
* for this example, we'll construct a class representing a "counting clicker" the sort that is used at the door to track how many people have shown up for the "advanced topics in data science" meetup

* it maintains a **count**
* can be **clicked** to increment the count
* allows you to **read_count**
* can be **reset** back to zero

To define a class, you use the **class** keyword and PascalCase name:

In [45]:
class CountingClicker:
    """ a class can/should have a docstring, just like a function"""
    def __init__(self, count = 0):
        self.count = count

A class contains zero or more *member* functions. By convention, each takes a first parameter, **self**, that refers to the particular class instance

Normally, a class has a constructor, named **__init__**. It takes whatever parameters you ned to construct an instance of your class and does whatever setup you need.

Class methods that start with an underscore, or two ("dunder methods"), are--by convention--considered "private"

We construct instances of the clicker using just the class name:

In [46]:
clicker1 = CountingClicker()     # initialized to 0
clicker2 = CountingClicker(100)  # starts with count=100
clicker3 = CountingClicker(count=100) # more explicit way of doing the same

The **__repr__** method produces the string representation of class instance:

In [47]:
def __repr__(self):
    return f"CountingClickr(count={self.count})"

And finally we need to implement the *public API* of our class:

In [48]:
def click(self, num_times = 1):
    """Click the clicker some number of times"""
    self.count += num_times

def read(self):
    return self.count

def reset(self):
    self.count = 0

We'll also occasionally create *subclasses* that *inherit* some of their functionality from a parent class

In [49]:
# A subclass inherits all the behavior of its parent class
class NoResetClicker(CountingClicker):
    #This class has all the same methods as CountingClicker
    
    #Except that it has a reset method that does nothing
    def reset(self):
        pass

### Iterables and Generators

* a list of a billion numbers takes up a lot of memory
* if you only want the elements, one at a time, there's no good reason to keep them all around
* often all we need to is to iterate over the collection using **for** and **in**
* in this case we can create **generators**, which can be iterated over just like lists but generate their values lazily on demand

One way to create generators is with functions and the **yield** operator

In [50]:
def generate_range(n):
    i = 0
    while i < n:
        yield i     # every call to yield produces a value of the generator
        i += 1

In [51]:
for i in generate_range(10):
    print(f"i: {i}")

i: 0
i: 1
i: 2
i: 3
i: 4
i: 5
i: 6
i: 7
i: 8
i: 9


* **range** is itself a lazy generator

In [55]:
def natural_numbers():
    i = 0
    while True:
        yield i
        i += 1

The flip side of laziness is that you can only iterate through a generate once. If you need to iterate through something multiple times, you'll need to recreate the generator or use a list.

A second way to create generators is by using **for** comprehensions wrapped in parentheses:

In [56]:
evens_below_20 = (i for i in generate_range(20) if i % 2 == 0)

In [58]:
# None of these computations *does* anything until we iterate
data = natural_numbers()
evens = (x for x in data if x % 2 == 0)
even_squares = (x ** 2 for x in evens)
even_squares_ending_in_six = (x for x in even_squares if x % 10 == 6)

* **enumerate** lets us iterate over a list/generator with values and their indices

In [61]:
names = ["Alice", "Bob", "Charlie", "Debbie"]
for i, name in enumerate(names):
    print(f"name {i} is {names[i]}")

name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie


### Randomness

In data science we frequently need to generate random numbers, which we can do with the **random** module:

In [63]:
import random
random.seed(10) # this ensures we get the same results every time

four_uniform_randoms = [random.random() for _ in range(4)]
print(four_uniform_randoms)

[0.5714025946899135, 0.4288890546751146, 0.5780913011344704, 0.20609823213950174]


* the **random** module actually produces *pseudorandom* (that is, deterministic) numbers based on an internal state that you can set with **random.seed** if you want to get reproducible results

In [66]:
random.randrange(10) # choose randomly from range(10) = [0, 1, 2, ..., 9]
random.randrange(3, 6) # choose randomly from range(3,6) = [3, 4, 5]

5

* **random.shuffle** randomly reorders the elements of a list

In [68]:
up_to_ten = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
random.shuffle(up_to_ten)
print(up_to_ten)

[4, 5, 6, 7, 2, 9, 10, 8, 1, 3]


In [70]:
my_best_friend = random.choice(["Alice", "Bob", "Charlie"])
print(my_best_friend)

Alice


If you need to choose a sample of elements without replacement, use **random.sample**

In [71]:
lottery_numbers = range(60)
winning_numbers = random.sample(lottery_numbers, 6)
print(winning_numbers)

[38, 22, 24, 26, 18, 52]


* To choose a sample of elements *with* replacement, just make multiple calls to **random.choice**

### Regular Expressions

Examples:

In [77]:
import re

re_exapmles = [                             # all of these are True, because
    not re.match("a", "cat"),               # 'cat' doesn't start with 'a'
    re.search("a", "cat"),                  # 'cat' has an 'a' in it
    not re.search("c", "dog"),              # 'dog' doesn't have a 'c' in it
    3 == len(re.split("[ab]", "carbs")),    # Split on a or b to ['c', 'r', 's']
    "R-D-" == re.sub("[0-9]", "-", "R2D2")  # Replace digits with dashes
]

In [78]:
assert all(re_exapmles), "all the regex examples should be True"

### zip and Argument Unpacking

The **zip** functions transforms multiple iterables into a single iterable of tuples of corresponding function:

In [81]:
list1 = ['a', 'b', 'c']
list2 = [1, 2, 3]

#zip is lazy, so you have to do something like the following
pairs = [pair for pair in zip(list1, list2)]
print(pairs)

[('a', 1), ('b', 2), ('c', 3)]


* if the lists are different lengths, **zip** stops as soon as the first list ends

You can also "unzip" a list using a strange trick:

In [82]:
letters, numbers = zip(*pairs)

The asterisk (*) performs *argument unpacking*, which uses the elements of pairs as individual arguments to **zip**. It ends up the as if you called:

In [83]:
letters, numbers = zip(('a', 1), ('b', 2), ('c', 3))

### args and kwargs

if we want to specify a function that takes arbitrary arguemnts, we can do this with arguments unpacking using **args** and **kwargs**

In [84]:
def magic(*args, **kwargs):
    print("unnamed args:", args)
    print("keyword args:", kwargs)

In [85]:
magic(1, 2, key="word", key2="word2")

unnamed args: (1, 2)
keyword args: {'key': 'word', 'key2': 'word2'}


* when we define a function like this, **args** is a tuple of its unnamed arguments, and **kwargs** is a **dict** of its named arguments
* it works the other way too. if you want to use a **list** (or **tuple**) and **dict** to *supply* arguments to a function:

In [86]:
def other_way_magic(x, y, z):
    return x + y + z

In [88]:
x_y_list = [1, 2]
z_dict = {"z": 3}
assert other_way_magic(*x_y_list, **z_dict) == 6

Useful for creating higher-order functions whose inputs can accept arbitrary arguments:

In [89]:
def double_correct(f):
    """ works no matter what kind of input f expects """
    def g(*args, **kwargs):
        """ whatever arguments g is supplied, pass them along to f """
        return   2 * f(*args, **kwargs)
    return g

### Type Annotations

* Python is a *dynamically typed* language. That means in general it doesn't care about the types of objects we use, as long as we use them in valid ways
* Type hinting in Python is useful for a number of reasons, even if it doesn't affect the functionality of the code

For built-in types like **int** and **bool** and **float**, you just use the type itself as the annotation

In [92]:
def sample_annotation(x: int, y: float) -> bool:
    return x < y

For lists where you want the elements to be a specific type, use the **typing** module

In [94]:
from typing import List

def total(xs: List[float]) -> float:
    return sum(xs)

Sometimes we want to use inline type hints when declaring variables

In [95]:
from typing import Optional
values : List[int] = []
best_so_far: Optional[float] = None # allowed to be either a float or None

Other examples:

In [97]:
from typing import Dict, Iterable, Tuple

Finally, since Python has first-class functions, we need a type to represent those as well:

In [99]:
from typing import Callable

# The type hint says that repeater is a function that takes 
# two arguments, a string and an int, and returns a string.

def twice(repeater: Callable[[str, int], str], s: str) -> str:
    return repeater(s, 2)
          
def comma_repeater(s: str, n: int) -> str:
    n_copies = [s for _ in range(n)]
    return ', '.join(n_copies)

assert twice(comma_repeater, "type hints") == "type hints, type hints"

Since type annotations are just Python objects, we can assign them to variables:

In [101]:
Number = int
Numbers = List[Number]

def total(xs: Numbers) -> Number:
    return sum(xs)