we’ll look at some more-advanced Python features that we’ll find useful for
working with data.

# Sorting
Every Python list has a sort method that sorts it in place. If you don’t want to mess
up your list, you can use the sorted function, which returns a new list:

In [1]:
x = [4,1,2,3]
y = sorted(x) # is [1,2,3,4], x is unchanged
x.sort() # now x is [1,2,3,4]
print(x)

# By default, sort (and sorted) sort a list from smallest to largest based on naively comparing the elements to one another.

[1, 2, 3, 4]


In [2]:
# If you want elements sorted from largest to smallest, you can specify a reverse=True parameter.

# sort the list by absolute value from largest to smallest
x = sorted([-4,1,-2,3], key=abs, reverse=True) # is [-4,3,-2,1]
print(x)

# sort the words and counts from highest count to lowest
wc = sorted(word_counts.items(), key=lambda (word, count): count, reverse=True)

SyntaxError: Lambda expression parameters cannot be parenthesized (2258353187.py, line 8)

# List Comprehensions
Frequently, you’ll want to transform a list into another list, by choosing only certain
elements, or by transforming elements, or both. The Pythonic way of doing this is list
comprehensions:

for this: even_numbers = [X for x in range(5) if x % 2 == 0] # [0, 2, 4]

The first x is just saying: “What should I add to the list?”

Python does:

Take x = 0 → even → put x (0) into list

Take x = 1 → odd → skip

Take x = 2 → even → put x (2) into list

Take x = 3 → skip

Take x = 4 → even → put x (4) into list

In [3]:
even_numbers = [x for x in range(5) if x % 2 == 0] # [0, 2, 4]
squares = [x * x for x in range(5)] # [0, 1, 4, 9, 16]
even_squares = [x * x for x in even_numbers] # [0, 4, 16]

In [4]:
# You can similarly turn lists into dictionaries or sets:
square_dict = { x : x * x for x in range(5) } # { 0:0, 1:1, 2:4, 3:9, 4:16 }
square_set = { x * x for x in [1, -1] } # { 1 }

# Generators and Iterators
A problem with lists is that they can easily grow very big. range(1000000) creates an
actual list of 1 million elements. If you only need to deal with them one at a time, this
can be a huge source of inefficiency (or of running out of memory). If you potentially
only need the first few values, then calculating them all is a waste.

A generator is something that you can iterate over (for us, usually using for) but
whose values are produced only as needed (lazily).

In [5]:
def lazy_range(n):
    """a lazy version of range"""
    i = 0
    while i < n:
        yield i
        i = i + 1

In [6]:
for i in lazy_range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


# Randomness
As we learn data science, we will frequently need to generate random numbers, which
we can do with the random module:

In [7]:
import random

four_uniform_randoms = [random.random() for _ in range(4)]
print(four_uniform_randoms)

[0.9521946004320879, 0.2289214391372455, 0.2987446107936993, 0.5312719050559703]


In [8]:
random.seed(10) # set the seed to 10
print(random.random()) # 0.57140259469
random.seed(10) # reset the seed to 10
print(random.random()) # 0.57140259469 again

0.5714025946899135
0.5714025946899135


In [9]:
random.randrange(10) # choose randomly from range(10) = [0, 1, ..., 9]
random.randrange(3, 6) # choose randomly from range(3, 6) = [3, 4, 5]

4

# Object-Oriented Programming

In [None]:
class Set:
    # these are the member functions
    # every one takes a first parameter "self" (another convention)
    # that refers to the particular Set object being used
    
    def __init__(self, values=None):
        """ This is the constructor.
        It gets called when you create a new Set.
        You would use it like 
        s1 = Set() # emptyset
        s2 = Set([1,2,3,4]) # initialize with values
        """
        self.dict = {} # each instance of Set has its own dict property which is what we'll use to track memberships
        
        if values is not None:
            for value in values:
                self.add(value)
        
    def __repr__(self):
        """ This is the string representation of a Set object 
            if you type it at the Python prompt or pass it to str()
        """
        return "Set: " + str(self.dict.keys())
        
    # we'll represent membership by being a key in self.dict with value True
    def add(self, value):
        self.dict[value] = True
            
    # value is in the Set if it's a key in the dictionary
    def contains(self, value):
         return value in self.dict
        
    def remove(self, value):
        del self.dict[value]

In [27]:
# Which we could then use like:
s = Set([1, 2, 3])
s.add(4)
print(s.contains(3))

s.remove(3)
print(s.contains(3))

True
False


## Functional Tools
we want to use it to create a function of one variable two_to_the whose input is a power and whose output is the result of exp(2, power). We can, of course, do this with def, but this can sometimes get unwieldy:

In [28]:
def exp(base, power):
    return base ** power

In [29]:
def two_to_the(power):
    return exp(2, power)

A different approach is to use functools.partial:

In [30]:
from functools import partial
two_to_the = partial(exp, 2)
print(two_to_the(3))

8


We will also occasionally use map, reduce, and filter, which provide functional
alternatives to list comprehensions:

In [37]:
def double(x):
    return 2 * x
xs = [1, 2, 3, 4]

twice_xs = [double(x) for x in xs] # [2, 4, 6, 8]
twice_xs = map(double, xs) # same as above

list_doubler = partial(map, double) # *function* that doubles a list
twice_xs = list_doubler(xs) # again [2, 4, 6, 8]

In [41]:
def multiply(x, y): return x * y

products = map(multiply, [1, 2], [4, 5]) # [1 * 4, 2 * 5] = [4, 10]

Similarly, filter does the work of a list-comprehension if:

In [47]:
def is_even(x):
    return x % 2 == 0

x_evens = [x for x in xs if is_even(x)] # [2, 4]
x_evens = filter(is_even, xs) # same as above

list_evener = partial(filter, is_even) # *function* that filters a list
x_evens = list_evener(xs) # again [2, 4]

In [48]:
from functools import reduce


x_product = reduce(multiply, xs) # = 1 * 2 * 3 * 4 = 24
list_product = partial(reduce, multiply) # *function* that reduces a list
x_product = list_product(xs) # again = 24

## enumerate
Not infrequently, you’ll want to iterate over a list and use both its elements and their
indexes:

In [52]:
documents = ['hi', 'my', 'name', 'is', 'john']

# not Pythonic
for i in range(len(documents)):
    document = documents[i]
    print(i, document)

0 hi
1 my
2 name
3 is
4 john


In [53]:
# The Pythonic solution is enumerate, which produces tuples (index, element):
for i, document in enumerate(documents):
    print(i, document)

0 hi
1 my
2 name
3 is
4 john


In [54]:
# Similarly, if we just want the indexes:
for i in range(len(documents)): print(i) # not Pythonic
for i, _ in enumerate(documents): print(i) # Pythonic

0
1
2
3
4
0
1
2
3
4


## zip and Argument Unpacking
Often we will need to zip two or more lists together. zip transforms multiple lists
into a single list of tuples of corresponding elements:

In [56]:
list1 = ['a', 'b', 'c']
list2 = [1, 2, 3]

zip(list1, list2) # is [('a', 1), ('b', 2), ('c', 3)]

<zip at 0x13cc0b6bc00>

If the lists are different lengths, zip stops as soon as the first list ends.

You can also “unzip” a list using a strange trick:

In [58]:
pairs = [('a', 1), ('b', 2), ('c', 3)]
letters, numbers = zip(*pairs)
print(letters)
print(numbers)

('a', 'b', 'c')
(1, 2, 3)


## args and kwargs
Let’s say we want to create a higher-order function that takes as input some function f
and returns a new function that for any input returns twice the value of f:

In [59]:
def doubler(f):
    def g(x):
        return 2 * f(x)
    return g

In [None]:
# This works in some cases:
def f1(x):
    return x + 1

g = doubler(f1)
print(g(3))     # 8 (== ( 3 + 1) * 2)
print(g(-1))    # 0 (== (-1 + 1) * 2)

# However, it breaks down with functions that take more than a single argument:
def f2(x, y):
    return x + y
g = doubler(f2)
print(g(1, 2))      # TypeError: g() takes exactly 1 argument (2 given)

8
0


TypeError: doubler.<locals>.g() takes 1 positional argument but 2 were given

In [69]:
""" What we need is a way to specify a function that takes arbitrary arguments. We can
do this with argument unpacking and a little bit of magic:"""

def magic(*args, **kwargs):
    print("unnamed args:", args)
    print("keyword args:", kwargs)
    
magic(1, 2, key="word", key2="word2")
# prints
# unnamed args: (1, 2)
# keyword args: {'key2': 'word2', 'key': 'word'}

unnamed args: (1, 2)
keyword args: {'key': 'word', 'key2': 'word2'}


In [72]:
""" That is, when we define a function like this, args is a tuple of its unnamed arguments
and kwargs is a dict of its named arguments. It works the other way too, if you want
to use a list (or tuple) and dict to supply arguments to a function: """
def other_way_magic(x, y, z):
    return x + y + z

x_y_list = [1, 2]
z_dict = { "z" : 3 }
print(other_way_magic(*x_y_list, **z_dict)) # 6

6
