# EECS 486 Discussion 1: Python Tutorial

<small>Adapted from *Python for Data Analysis by Wes McKinney (O’Reilly)*. Copyright 2017 Wes McKinney, 978-1-491-95766-0.</small>

Yumou Wei

yumouwei@umich.edu

M F 10:00 am - 11:30 am, 1695 BBB

## Python Language Basics

### Language Semantics

a.k.a "executable pseudocode" for its human-readability.

In [1]:
greetings = "hEllO, WoRLd!"

for letter in greetings:
    if letter.islower():
        print(letter.upper(), end="")
    else:
        print(letter.lower(), end="")

HeLLo, wOrlD!

But, **not** an excuse for not writing comments!

#### Comments

Triple quotes """ for long comments and # for short ones. 

In [None]:
"""
The below code takes a string of mixed
upper- and lower-case letters, and 
prints the string with the cases flipped. 
"""
greetings = "hEllO, WoRLd!"
for letter in greetings:
    # if the current letter is in lower case
    if letter.islower():
        print(letter.upper(), end="")
    # or it is in upper case
    else:
        print(letter.lower(), end="")


#### Indentation, not braces

Recommended practice: use four **spaces**, not tabs

In [None]:
greetings = "hEllO, WoRLd!"
for letter in greetings:
    if letter.islower():
        print(letter.upper(), end="")
    else:
        print(letter.lower(), end="")

#### Imports

In [None]:
# import some_module (as shorthand_name)
import pprint as pp
# from some_module import some_function
from copy import deepcopy
# DON'T: from some_module import *, can be very slow

### Scalar Types

#### Numeric types

In [None]:
count = 10 # int
pi = 3.14 # float
interest_rate = 4.2e-3 # scientific notations

In [None]:
count += 1 # count = count + 1
count -= 1 # count = count - 1
pi_sq = pi ** 2

In [2]:
3 / 2 # normal division

1.5

In [3]:
3 // 2 # integer division

1

#### Strings

In [None]:
a = 'one way of writing a string'
b = "another way"
c = """
Yet another way for a longer string 
that spans multiple lines
"""

In [4]:
a = 'this is a string'
d = a # assignment by reference
d is a # check if the same reference

True

Strings are immutable.

In [6]:
a = 'this is a string'
# a[10] = 'f' # not allowed
b = a.replace('string', 'longer string') # returns a modified copy, OK
b

'this is a longer string'

String functions attempting to modify a string always return a modified **copy** of the original string. 

In [7]:
greetings = "hEllO, WoRLd!"
greetings_lower = greetings.lower()
greetings is greetings_lower

False

In [8]:
a = 'this is the first half '
b = 'and this is the second half'
c = a + b # concatenate
c.split() # simplest tokenization?

['this',
 'is',
 'the',
 'first',
 'half',
 'and',
 'this',
 'is',
 'the',
 'second',
 'half']

#### Booleans

In [None]:
True and False  # False
False or True   # True
not True        # False

In [10]:
# any, all
x, y, z = 2, 3, 1
if any((x > 0, y > 0, z > 0)):
    print("At least one coordinate is positive")

if all((x > 0, y > 0, z > 0)):
    print("All coordinates are positive")

At least one coordinate is positive
All coordinates are positive


#### None

In [11]:
a = None
a is None
b = 5
b is not None

True

# Built-in Data Structures

## Sequences

### String

Already covered.

### Tuple

In [12]:
tup = 4, 5, 6 # parentheses are optional: tup = (4, 5, 6)
tup

(4, 5, 6)

In [13]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

In [14]:
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

In [15]:
tup[0]

's'

Tuples are immutable. 

In [16]:
tup[0] = "a"

TypeError: 'tuple' object does not support item assignment

#### Unpacking tuples

In [17]:
tup = (4, 5, 6)
a, b, c = tup
b

5

In [18]:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d

7

In [19]:
a, b = 1, 2
print(a, b)

1 2


In [20]:
b, a = a, b
print(a, b)

2 1


In [21]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
print(a, b)
print(rest)

1 2
[3, 4, 5]


In [22]:
# throw undesired values to _
a, b, *_ = values
print(a, b)

1 2


### List

In [23]:
a_list = [2, "foo", (3, 4), None]
a_list[1]

'foo'

#### Adding and removing elements

Good for adding/removing items at the **back**

In [24]:
b_list = []
b_list.append('blue')
b_list.append('yellow')
b_list

['blue', 'yellow']

In [25]:
b_list.pop()
b_list

['blue']

Need to add items to both **front** and **back**? Use ``` deque ```: https://docs.python.org/3/library/collections.html#collections.deque

Search is not very efficient with ```list``` . Better use ```set```. 

In [26]:
'red' in b_list

False

In [27]:
'red' not in b_list

True

#### Assignment by Reference

In [28]:
a = [1, 2, 4]
b = a
b[1] = 7
print(a) # ???

[1, 7, 4]


In [30]:
a = [1, 2, 4]
b = a.copy()
b[1] = 7
print(a) # ???

[1, 2, 4]


#### Concatenating and combining lists

In [None]:
[4, None, 'foo'] + [7, 8, (2, 3)]

In [31]:
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

#### Slicing

In [32]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [33]:
seq[:5]

[7, 2, 3, 7, 5]

In [34]:
seq[3:]

[7, 5, 6, 0, 1]

In [35]:
seq[-4:]

[5, 6, 0, 1]

In [36]:
seq[-6:-2]

[3, 7, 5, 6]

In [37]:
seq[::2]

[7, 3, 5, 0]

In [38]:
seq[::-1] # have a guess? 

[1, 0, 6, 5, 7, 3, 2, 7]

### Built-in Sequence Tools

#### Loops

In [39]:
my_list = list("abcde") # takes a string and produces a list

for item in my_list:
    print(item, end=" ")

a b c d e 

In [40]:
for index in range(len(my_list)):
    my_list[index] = my_list[index].upper()
    print(my_list[index], end=" ")

A B C D E 

In [41]:
for index, item in enumerate(my_list):
    print("{} has an index of {}".format(item, index))

A has an index of 0
B has an index of 1
C has an index of 2
D has an index of 3
E has an index of 4


#### sorted

In [42]:
sorted('horse race')

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

In [43]:
sorted([7, 1, 2, 6, 0, 3, 2], reverse=True)

[7, 6, 3, 2, 2, 1, 0]

#### zip

In [44]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

#### reversed

In [45]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

### set

In [46]:
{2, 2, 2, 1, 3, 3}
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [47]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [48]:
a.union(b)
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [49]:
a.intersection(b)
a & b

{3, 4, 5}

Good for quick search

In [50]:
stop_words = {"a", "an", "the", "in", "on"}

In [51]:
"that" in stop_words

False

In [52]:
"up" not in stop_words

True

### dict

Essentially a hash table. O(1)-time search, insertion and deletion

In [53]:
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [54]:
'b' in d1 # check key existence

True

In [55]:
d1[5] = 'some value'
d1['dummy'] = 'another value'
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 5: 'some value',
 'dummy': 'another value'}

In [56]:
del d1[5] # delete key "5"
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 'dummy': 'another value'}

In [57]:
ret = d1.pop('dummy')
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

#### Loop through dict

In [58]:
for key, value in d1.items():
    print("{}: {}".format(key, value))

a: some value
b: [1, 2, 3, 4]


In [59]:
for key in d1.keys():
    print(key)

a
b


In [60]:
for value in d1.values():
    print(value)

some value
[1, 2, 3, 4]


#### Creating dicts from sequences

In [61]:
mapping = dict(zip("abcde", range(5)))
mapping

{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}

#### Default values

Task: count each word in a dataset

In [63]:
dataset = ['apple', 'orange', 'apple', 'banana', 'orange', 'apple']

In [64]:
# Initial idea
word_count = {} # my word counter

for word in dataset:
    # if the word is already encountered
    if word in word_count:
        word_count[word] += 1
    # or this is a new word
    else:
        word_count[word] = 1
print(word_count)

{'apple': 3, 'orange': 2, 'banana': 1}


Better idea: use ```defaultdict``` https://docs.python.org/3.7/library/collections.html#collections.defaultdict

In [65]:
from collections import defaultdict
word_count = defaultdict(int) # each entry now has a default value of 0
print(word_count["apple"])

0


In [66]:
for word in dataset:
    word_count[word] += 1
print(word_count)

defaultdict(<class 'int'>, {'apple': 3, 'orange': 2, 'banana': 1})


#### Valid dict key types

What can be dict keys? **Immutable** types!

In [67]:
my_dict = {}
my_dict[(1, 2)] = 5
my_dict["abc"] = 6
print(my_dict)

{(1, 2): 5, 'abc': 6}


### List, Set, and Dict Comprehensions

In [68]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [69]:
{len(x) for x in strings} # unique lengths

{1, 2, 3, 4, 6}

In [70]:
{x: len(x) for x in strings}

{'a': 1, 'as': 2, 'bat': 3, 'car': 3, 'dove': 4, 'python': 6}

## Functions

In [71]:
# a function with default arguments
def add(x, y=1):
    return x + y

In [72]:
add(5) # y = 1 by default

6

In [73]:
add(5, 6)

11

### Returning Multiple Values

In [74]:
def linear_transform(x, y):
    """
    This function takes as input a point (x, y)
    and applies a linear transformation to it. 
    """
    new_x = 2 * x + 3 * y
    new_y = 5 * x - 2 * y
    return new_x, new_y

In [75]:
linear_transform(3, 5)

(21, 5)

### Anonymous (Lambda) Functions

Sometimes we don't care about the name of a function.

In [76]:
def short_function(x):
    return x * 2

In [None]:
equiv_anon = lambda x: x * 2

In [77]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']
sorted(strings) # sort by alphabet

['aaaa', 'abab', 'bar', 'card', 'foo']

In [78]:
sorted(strings, key=lambda x: len(x)) # sort by length

['foo', 'bar', 'card', 'aaaa', 'abab']

In [79]:
# Harder example: functions as objects
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [80]:
# Map: applies a function to a sequence
list(map(lambda x: x * 2, ints)) # map(some_function, sequence)

[8, 0, 2, 10, 12]

In [81]:
# Filter: returns items based on the condition specified by the function
list(filter(lambda x: x > 3, ints)) # filter(some_condition_as_function, sequence)

[4, 5, 6]

# Files

## Read

In [82]:
sample_text = []
with open("sample.txt", "r") as file_handle:
    curr_line = file_handle.readline()
    while curr_line:
        # do something on each line
        print(curr_line, end="")
        sample_text.append(curr_line)
        # read the next line
        curr_line = file_handle.readline()

This course will cover traditional material, as well as recent advances in Information Retrieval (IR), the study of indexing, processing, querying, and classifying data. 
Basic retrieval models, algorithms, and IR system implementations will be covered. 

## Write

In [None]:
with open("output.txt", "w") as file_handle:
    for line in sample_text:
        file_handle.write(line)