# Lecture 9-1

# Pythonic Features

## Week 9 Monday

## Miles Chen, PhD

"Bunch of random things... see how much we get through."

## Named Tuples

Named tuples are a quick and simple way to define a new class if the Class definition only contains values and does not require its own methods.

Recall we defined a class Point with the following definition.

```
class Point:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y
    def __str__(self):
        return '(%g, %g)' % (self.x, self.y)
```

A named tuple can be created that functions in a nearly identical fashion. You will need to import `namedtuple` from the `collections` module

In [1]:
from collections import namedtuple

Once we have imported namedtuple, we can create a named tuple.

We'll create a Point Class named tuple that contains two values, `x` and `y`.

In [2]:
Point = namedtuple('Point', ['x', 'y'])

In [3]:
Point

__main__.Point

With our namedtuple defined, we can create instances of it like we would any other class.

In [4]:
p = Point(1, 2)

In [5]:
p

Point(x=1, y=2)

Now that we have created an instance of the named tuple, we can access the values using dot notation. We can also access values using indexed square-bracket notation as well because it is a tuple.

In [6]:
p.x

1

In [7]:
p.y

2

In [8]:
p[0]

1

Because is a tuple, supports tuple operations!

A named tuple will inherit all of the methods associated with tuples such as comparison and "addition"

In [9]:
p1 = Point(0, 1)
p2 = Point(3, 4)
p3 = Point(2, 2)

In [10]:
p1 > p2

False

First compares x components.

In [11]:
p1 + p2

(0, 1, 3, 4)

Concatenates the tuples

In [12]:
l = [p1, p2, p3]
l 

[Point(x=0, y=1), Point(x=3, y=4), Point(x=2, y=2)]

Now you have different Point instances in a list!

In [13]:
sorted(l)

[Point(x=0, y=1), Point(x=2, y=2), Point(x=3, y=4)]

If the class definition needs to become more complicated you can define a new class that inherits from the namedtuple.

In [14]:
class Vector(Point):
    """A class based on the named tuple Point"""
    def __add__(self, other):
        return Vector(x = self.x + other.x, y = self.y + other.y)

Changing `+` to add the `x` components, then add the `y` components

In [15]:
v1 = Vector(0, 1)
v2 = Vector(3, 4)
v3 = v1 + v2

In [16]:
v3

Vector(x=3, y=5)

As you can see `x=0+3`, `y=1+4`

## Counters

Counters are like dictionaries and are useful for quickly tallying elements. `collections` has things that allow you to [indirectly] inherit from dictionary and list class.

In [17]:
from collections import Counter

In [18]:
wordlist = ['red', 'blue', 'red', 'green', 'blue', 'blue']

In [19]:
tally = Counter(wordlist)

Creates a dictionary, tallying up every time a unique element was seen.

In [20]:
tally

Counter({'red': 2, 'blue': 3, 'green': 1})

In [21]:
tally.most_common(2) # 2 most common things

[('blue', 3), ('red', 2)]

In [22]:
tally['blue'] # how many times was "blue" seen?

3

In [23]:
Counter("she sell sea shells by the sea shore".split())

Counter({'she': 1,
         'sell': 1,
         'sea': 2,
         'shells': 1,
         'by': 1,
         'the': 1,
         'shore': 1})

Splits by space, counting words

## List comprehensions

List comprehensions allow us to create new lists concisely based on an existing collection

They take the form:

`[expr for val in collection if condition]`

This is basically equivalent to the following loop:

```
result = []
for val in collection:
    if condition:
        result.append(expr)
```

In [24]:
# make a list of the squares 
[x ** 2 for x in range(1, 11)]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [25]:
import numpy as np
np.array([x**2 for x in range(1, 11)])

array([  1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

Odd squares

In [26]:
# square only the odd numbers
[x**2 for x in range(1, 11) if x % 2 == 1]

[1, 9, 25, 49, 81]

Uppercase version for every word, given that the word has > 2 letters

In [27]:
# take a list of strings, and write the words that are over 2 characters long in uppercase.
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

You can create a list comprehension from any iterable (list, tuple, string, etc)

In [28]:
[x.upper() for x in "she sell sea shells by the sea shore".split()]

['SHE', 'SELL', 'SEA', 'SHELLS', 'BY', 'THE', 'SEA', 'SHORE']

In [29]:
# extract the digits from a string
string = "Hello 963257 World"
[int(x) for x in string if x.isdigit()]
# for x in string, will look at each character individually
# if x is a digit, then convert it using int()

[9, 6, 3, 2, 5, 7]

Note `int(x)`, making the output integers (otherwise output would be a list of string representations of the numbers). Could also do `int(x)**2`, squaring each number that meets the req.'s

In [30]:
# iterate over a dictionary's items
d = {'a':'apple', 'b':'banana', 'c':'carrots', 'd':'donut', 'e':'eggs'}

In [31]:
list(d.items())  # recall what dict.items() returns: a list of tuples

[('a', 'apple'),
 ('b', 'banana'),
 ('c', 'carrots'),
 ('d', 'donut'),
 ('e', 'eggs')]

In [32]:
['{} is for {}'.format(key, value) for key, value in d.items() if key not in ('b', 'd') ]

['a is for apple', 'c is for carrots', 'e is for eggs']

### Cartesian Products with List Comprehensions

In [33]:
cardranks = ["A","K","Q","J"]
cardsuits = ["clubs","diamonds","hearts","spades"]

Go the ranks first, then suits second:

In [34]:
# cartesian product
[(rank, suit) for rank in cardranks for suit in cardsuits]

[('A', 'clubs'),
 ('A', 'diamonds'),
 ('A', 'hearts'),
 ('A', 'spades'),
 ('K', 'clubs'),
 ('K', 'diamonds'),
 ('K', 'hearts'),
 ('K', 'spades'),
 ('Q', 'clubs'),
 ('Q', 'diamonds'),
 ('Q', 'hearts'),
 ('Q', 'spades'),
 ('J', 'clubs'),
 ('J', 'diamonds'),
 ('J', 'hearts'),
 ('J', 'spades')]

Aces first, then Kings, then Queens, and finally Jacks.

Below we do suits first then rank

In [35]:
# cartesian product
[(rank, suit) for suit in cardsuits for rank in cardranks ] # notice the change in order of for clauses

[('A', 'clubs'),
 ('K', 'clubs'),
 ('Q', 'clubs'),
 ('J', 'clubs'),
 ('A', 'diamonds'),
 ('K', 'diamonds'),
 ('Q', 'diamonds'),
 ('J', 'diamonds'),
 ('A', 'hearts'),
 ('K', 'hearts'),
 ('Q', 'hearts'),
 ('J', 'hearts'),
 ('A', 'spades'),
 ('K', 'spades'),
 ('Q', 'spades'),
 ('J', 'spades')]

Now you have Ace, King, Queen, and Jack, for each suit

# Q1: C

## Dictionary Comprehensions

A dict comprehension looks like this:

`dict_comp = {key-expr : value-expr for value in collection if condition}`

In [36]:
# create a dictionary, where the key is the word capitalized, and the value is the length of the word
fruits = ['apple', 'mango', 'banana', 'cherry']
{f.capitalize():len(f) for f in fruits}

{'Apple': 5, 'Mango': 5, 'Banana': 6, 'Cherry': 6}

Create a dictionary but putting the iterable inside `{}`

In [37]:
# create a dictionary where the key is the index, and the value is the string in the strings list.
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

In [38]:
list(enumerate(strings))  # enumerate produces a collection of tuples, with index and value

[(0, 'a'), (1, 'as'), (2, 'bat'), (3, 'car'), (4, 'dove'), (5, 'python')]

Recall: `enumerate()` zips a range object and the input argument of `enumerate`

In [39]:
index_map = {index:val for index, val in enumerate(strings)}
index_map

{0: 'a', 1: 'as', 2: 'bat', 3: 'car', 4: 'dove', 5: 'python'}

In [40]:
# note that enumerate returns tuples in the order (index, val)
# in the creation of a dictionary, you can swap those positions
# and even apply functions to them

# We create a dictionary where the key is the string, and the value is the index in the strings list.
loc_mapping = {val : index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

In [41]:
index_map['a']

KeyError: 'a'

In `index_map`, there's no key `a`, but `loc_mapping` does have that key.

In [42]:
loc_mapping['a']

0

In [43]:
# combine dictionaries with kwargs 
dd = {**loc_mapping, **index_map}
print(dd)

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5, 0: 'a', 1: 'as', 2: 'bat', 3: 'car', 4: 'dove', 5: 'python'}


In [44]:
# even better... use dict.update(). This modifies the dictionary in place
loc_mapping.update(index_map)
loc_mapping

{'a': 0,
 'as': 1,
 'bat': 2,
 'car': 3,
 'dove': 4,
 'python': 5,
 0: 'a',
 1: 'as',
 2: 'bat',
 3: 'car',
 4: 'dove',
 5: 'python'}

## Generator Expressions

Generator Expressions are similar to List comprehensions, with the key difference being that they are *lazy*. 

You create them with parentheses instead of square brackets.

The result is a generator object. You can access values in the generator using `next()`

In [45]:
g = (n**2 for n in range(12))

In [46]:
g

<generator object <genexpr> at 0x000001CAAAF33660>

Generator object only gives things in list when asked for

In [47]:
next(g)

0

In [48]:
next(g)

1

In [49]:
next(g)

4

In [50]:
next(g)

9

In [51]:
for val in g:
    print(val)

16
25
36
49
64
81
100
121


In [52]:
next(g) # calling next after it has run out of iterations will result in an error

StopIteration: 

Has reached the end of values in `g`

## List Comprehension vs Generator Expressions in Python

The big difference between a list comprehension and a generator is that the generator is **lazy**.

The list comprehension will evaluate the entire sequence of iterations. The generator will only generate the next value when it is asked to do so.

Depending on the expression that needs to be evaluated, you may prefer to use a generator over the list comprehension.

The following examples are from: https://code-maven.com/list-comprehension-vs-generator-expression

In [53]:
l = [n*2 for n in range(1000)] # List comprehension
g = (n*2 for n in range(1000))  # Generator expression

In [54]:
print(type(l))  # 'list'
print(type(g))  # 'generator'

<class 'list'>
<class 'generator'>


In [55]:
import sys
print(sys.getsizeof(l))  # more space in memory
print(sys.getsizeof(g))  # less space in memory

9016
112


Generator will not do any operation until asked for, while list comprehension does the operation for every single element. Less space in memory used for generator.

In [56]:
# cannot access values in a generator by index
print(l[4])   # 8
print(g[4])   # TypeError: 'generator' object is not subscriptable

8


TypeError: 'generator' object is not subscriptable

Nothing for 4th position in generator because haven't yet done [math] operation for that element.

In [57]:
# you can interate over lists and generators
for item in l:
    print(item)
    if item > 12:
        break

0
2
4
6
8
10
12
14


In [58]:
for item in g:
    print(item)
    if item > 12:
        break

0
2
4
6
8
10
12
14


Reaches same break point

Generator has not yet evaluated 16.

In [59]:
g

<generator object <genexpr> at 0x000001CAAAF33F20>

In [60]:
# sum demands that all elements of g be calculated so the generator evaluates them and provides the sum
# note that the first 8 values have already been evaluated, so the sum is the sum begins at n = 8
sum(g) 

998944

Generator picks up where it last left off. That is, above does not include numbers 0 to 14 (done 2 code blocks ago)

In [61]:
sum(l) # the list has all of the values in memory ready to be summed

999000

In [62]:
sum(l[8:]) # to get the equivalent sum, we can start it at 8

998944

In [63]:
# now that the generator has finished running, there are no more values left to evaluate
sum(g)

0

In [64]:
# the list is unaffected by calling sum on it.
sum(l)

999000

Using generators require planning. Thought "This will take a long time for using list comprehension. I don't need everything evaluated in list" -> use generator

# map and lambda functions

The `map(function, iterable)` function takes a particular function and maps it to each element of an iterable. The object it returns is a map object which itself is iterable.

A lambda function allows you to create and use a new short function without having to formally define it.

# REGEX

In [65]:
# the module re is used for regular expressions
import re

In [66]:
# re.sub substitutes one pattern of text with another.
# Here we define a function that replaces multiple instances of white space (\s+) with one space:
def replace_space(x):
    return(re.sub('\s+', ' ', x))

More than 1 space gets replaced with just 1 space

In [67]:
replace_space('Hello     Alabama ')

'Hello Alabama '

In [68]:
text = ['Hello     Alabama', 
        'Georgia!',
        'Georgia',
        'georgia', 
        'FlOrIda',
        'south  carolina##',
        'West virginia?']

In [69]:
map(replace_space, text)

<map at 0x1caaaf566a0>

Map object, like a generator expression

In [70]:
g2 = map(replace_space, text)

Now you can "run through" the generator

In [71]:
next(g2)

'Hello Alabama'

In [72]:
next(g2)

'Georgia!'

In [73]:
# we can use the map function to map the replace_space() function to each element of the list text
for item in map(replace_space, text):
    print(item)

Hello Alabama
Georgia!
Georgia
georgia
FlOrIda
south carolina##
West virginia?


Force to evaluate everything:

In [74]:
# we can also put the map results inside a list
list(map(replace_space, text))

['Hello Alabama',
 'Georgia!',
 'Georgia',
 'georgia',
 'FlOrIda',
 'south carolina##',
 'West virginia?']

## Lambda function: function "not worth" defining

because they're such simple functions

In [75]:
# however, because the code for the function is so short, it might be easier to just create
# a quick function without a formal name. These 'anonymous' functions are also known 
# as lambda functions
list(map(lambda x: re.sub('\\s+',' ', x), text))

['Hello Alabama',
 'Georgia!',
 'Georgia',
 'georgia',
 'FlOrIda',
 'south carolina##',
 'West virginia?']

Input argument `x`, using `re.sub()` on it

In [76]:
# here's a similar function that turns the text into title case.
list(map(lambda string: string.title(), text))

['Hello     Alabama',
 'Georgia!',
 'Georgia',
 'Georgia',
 'Florida',
 'South  Carolina##',
 'West Virginia?']

lambda functions are written in the form:

`lambda argument1, argument2, etc: expression to return`

In [77]:
# lambda functions can also accept multiple arguments
# if you use it with map, you'll need to provide a list for each argument
list(map(lambda x, y: x + y, [1, 2, 3], [100, 200, 300]))

[101, 202, 303]

Input arguments are `x` and `y`. We use it with `[1, 2, 3]` and `[100, 200, 300]`

Use `lambda` functions for simple things so it's easy to interpret.

# Q2, 3: A, D

## Sets

DON'T allow for duplicates

In [78]:
s1 = set("abcdefgabc")

In [79]:
s1

{'a', 'b', 'c', 'd', 'e', 'f', 'g'}

Defined with `{}`, but no `:`'s

In [80]:
s2 = set("fghij")

In [81]:
s2

{'f', 'g', 'h', 'i', 'j'}

### Set operations

In [82]:
s1 & s2 # intersection

{'f', 'g'}

In [83]:
s1 | s2 # union

{'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'}

In [84]:
s1 - s2 # set difference (in s1 but not s2)

{'a', 'b', 'c', 'd', 'e'}

In [85]:
s2 - s1 # set difference (in s2 but not s1)

{'h', 'i', 'j'}

In [86]:
s1 ^ s2 # xor = (in s1 but not s2) union (in s2 but not s1)

{'a', 'b', 'c', 'd', 'e', 'h', 'i', 'j'}

Above: everything but the union

In [87]:
"b" in s1

True

In [88]:
"x" in s2

False

In [89]:
s3 = set("abc")

In [90]:
s3 < s1  # is s3 a subset of s1

True

In [91]:
s1

{'a', 'b', 'c', 'd', 'e', 'f', 'g'}

In [92]:
s1.add("z") # add an element

In [93]:
s1

{'a', 'b', 'c', 'd', 'e', 'f', 'g', 'z'}

In [94]:
s1.discard("z") # remove an element

In [95]:
s1

{'a', 'b', 'c', 'd', 'e', 'f', 'g'}

Some of these operations are faster when using sets (as apposed to lists).

#### use cases of sets

In [96]:
# Filtering values:
data = [1, 2, 3, 4, 5]
exclude = {2, 4}
filtered_data = [x for x in data if x not in exclude]
filtered_data

[1, 3, 5]

Recall that `in` on list is slow than using it with sets

In [98]:
# unique items across multiple lists
list1 = [1, 2, 3]
list2 = [3, 4, 5]
list3 = [5, 6, 7]

# Turning lists into sets -> fastest way to get unique items 
# from all the lists, rather than calling unique
unique_items = set(list1) | set(list2) | set(list3)
unique_items

{1, 2, 3, 4, 5, 6, 7}