# Python Essentials II

Today, we will continue exploring foundational Python concepts:

* Data Structures: Sequences and Collections
* Functions

Friendly reminders:

* DataCamp modules for Functions and Packages; Logic, Control Flow, and Filtering; and Loops are due tonight by 11:59 p.m.

## Data Structures

There are many types of data structures in Python. Each type has some distinction from all others, but there is some overlapping functionality across multiple types. It is helpful to think about the core data structures in terms of two categories:

* Sequences are ordered data structures, and include types such as lists, tuples, and NumPy arrays (later), and also strings. Lists are mutable, whereas tuples are immutable.
* Collections are unordered data structures, and include types such as sets and dictionaries. Both dictionaries and sets are mutable.

The various types can be nested. For example, you can create lists of lists, lists of tuples, sets of tuples, etc., depending on your need. In addition, they are all (functionally) iterable. There are variations on many of these structures that we will not discuss, but for most applications, these types will be sufficient.

### Sequences

#### Creation, Assignment, and Deletion

In [1]:
# Lists are created using [] NO certain type
L = [1,'two',3]
L

[1, 'two', 3]

In [2]:
# Lists are mutable, so we can update with assignment
L[2] = 4
L

[1, 'two', 4]

In [3]:
# Or, delete
del L[2]
print(L)
L = [1, 'two', 4] # Reset L

[1, 'two']


In [4]:
# Tuples are created using ()
T = (1,2,3)
T

(1, 2, 3)

In [5]:
# You can also create tuples using comma separate values
T = 1,2,3
T

(1, 2, 3)

In [6]:
# You can unpack items from a sequence into individual variables
one, two, three = T
print(one, two, three)

1 2 3


#### Membership

Testing for membership allows you to determine whether a specific item is contained in a sequence or collection. The keyword for testing for membership is **in**.

In [7]:
3 in L

False

In [8]:
3 in T

True

#### Indexing and Slicing

Indexing means to access a specific item in a sequence. You can index into a sequence using [ ].

Remember, **Python is zero-indexed**, meaning that index 0 corresponds to the first item in a sequence, 1 to the second item, and so on. The index of the last item in a sequence of length *n is n-1*. Negative indices are also valid, and are equivalent to *n-i*, where *-i* is the desired index.

In [9]:
# Indexing
L[1]

'two'

In [10]:
# Negative indices
L[-2]

'two'

Slicing means to access a range of items in a sequence. Similar to indexing, we use [ ], but for slicing we use the : operator(s). For any slicing operation, the default expression is [start=0:stop=n:step=1], where *n* is the length of the sequence. Any input (i.e., start, stop, or step) that is not explicitly stated will assume the default value.

In [11]:
# Slicing - All items
T[:]

(1, 2, 3)

In [12]:
# Slicing - All items
T[::]

(1, 2, 3)

In [13]:
# Slicing - Range of items
T[1:3]

(2, 3)

In [14]:
# Slicing - Multiple step size
T[::2]

(1, 3)

In [15]:
# Slicing - Negative step size
T[2::-1]

(3, 2, 1)

#### Concatenation and Replication

Similar to strings, sequences support concatenation (+) and replication (*).

In [16]:
L + L

[1, 'two', 4, 1, 'two', 4]

In [17]:
T * 3

(1, 2, 3, 1, 2, 3, 1, 2, 3)

#### Casting

In [18]:
tuple(L)

(1, 'two', 4)

In [19]:
list(T)

[1, 2, 3]

#### Other Common Sequence Methods

In [20]:
# Length
len(L)

3

In [21]:
# Min, max, sum
print('Min:', min(T))
print('Max:', max(T))
print('Sum:', sum(T))

Min: 1
Max: 3
Sum: 6


In [22]:
# Any and all
B = [True, False, False]
print('Any:', any(B))
print('All:', all(B))

Any: True
All: False


#### Sequence Functions and Generators

There are several standard functions for generating or modifying sequences. Many of these functions create **iterators**, which are essentially objects that return each item of a sequence one at a time, instead of storing the entire sequence in memory. As expected, iterators support iteration. They can also be cast to a sequence.

In [23]:
# Range function - Similar syntax as slicing (:) operator, returns a 
# list-like object
ra = range(0,10,2)
ra

range(0, 10, 2)

In [24]:
# List-like behavior of range object
print(len(ra))
print(ra[3])
print(ra[1:4])
print(list(ra))

5
6
range(2, 8, 2)
[0, 2, 4, 6, 8]


In [25]:
# Enumerate - Generates a list of (index, value) tuples, returns an iterator
en = enumerate(L)
print(en)

<enumerate object at 0x1091d30d8>


In [26]:
# Zip - Generates a list of paired tuples, returns an iterator
zi = zip(L, T)
zi

<zip at 0x1091d5f08>

In [27]:
# Reversed - Returns the items from a sequence in reverse order, returns an iterator
rev = reversed(T)
rev

<reversed at 0x1091d4d30>

In [28]:
# Sorted - Returns a sorted copy of the list
import numpy as np
R = list(np.random.randint(0,10,10))
print(R, sorted(R, key=None, reverse=False))

[6, 1, 4, 6, 0, 8, 9, 8, 1, 0] [0, 0, 1, 1, 4, 6, 6, 8, 8, 9]


In [29]:
# All of the previous objects are iterable - range, enumerator, zip, reversed, sorted
iter_obj = ra
for item in iter_obj:
    print(item)

0
2
4
6
8


#### List Methods

The list is probably the most versatile data structure in Python. There are many list-specific methods that we can utilize:

* L.append(x): Appends an object x to the end of L (in place)
* L.extend(M): Appends each element of M to the end of L (in place)
* L.count(x): Count occurrences of x in L
* L.index(x): Returns smallest index i where L[i] == x 
* L.insert(i,x): Inserts x at index i (in place)
* L.pop([i]): Returns the ith element of L and removes; if i is omitted, the last element is popped
* L.remove(x): Removes the first instance of x from L  
* L.reverse(): Reverses items of L (in place)
* L.sort(): Sorts (in ascending order) items of L (in place)

In [30]:
# Append
L.append(6)
L

[1, 'two', 4, 6]

In [31]:
# Extend
L.extend([10,10])
L

[1, 'two', 4, 6, 10, 10]

In [32]:
# Count
L.count(10)

2

In [33]:
# Index
L.index(10)

4

In [34]:
# Insert
L.insert(0,100)
L

[100, 1, 'two', 4, 6, 10, 10]

In [35]:
# Pop
print(L.pop(0), L)

100 [1, 'two', 4, 6, 10, 10]


In [36]:
# Remove
L.remove('two')
L

[1, 4, 6, 10, 10]

In [37]:
# Reverse
L.reverse()
L

[10, 10, 6, 4, 1]

In [38]:
# Sort
L.sort(key=None, reverse=False)
L

[1, 4, 6, 10, 10]

### Collections

#### Sets

We will not use sets very often, but they are useful for evaluating and comparing membership. The **set** casting function is also a quick way to determine the **unique** elements in a sequence.

In [39]:
# Sets are created using {} or set(sequence)
S = {1,4,5,6,6,8} # or set([1,4,5,6,6,8])
S

{1, 4, 5, 6, 8}

In [40]:
# Sets are unordered, they do not support indexing
S[1]

TypeError: 'set' object is not subscriptable

In [41]:
# Sets are mutable
S.add(9)
S

{1, 4, 5, 6, 8, 9}

In [42]:
# Testing for membership
7 in S

False

In [43]:
# Set operations - .union (|), .intersect (&), .difference (-), .symmetric_difference (^)
op = '^'
print(S,T)
T = set(range(1,10,2))
eval('S' + op + 'T')

{1, 4, 5, 6, 8, 9} (1, 2, 3)


{3, 4, 6, 7, 8}

In [48]:
# Sets are also iterable, even though they are unordered
for x in S:
    print(x)

1
4
5
6
8
9


#### Dictionaries

In addition to lists, dictionaries are also very commonly used. A dictionary is an unordered collection of key:value pairs. Keys must be immutable, such as a scalar (e.g., int, float, string, date/time) or immutable sequence (e.g., tuple). Values can be any Python object.

Items in the dictionary are accessed via the keys, as opposed to an index as in a sequence.

In [45]:
# Create dictionary - Comma separated list of key:value pairs in {}
D = {1:'a', 2:'b', 3:'c', 4:'d', 5:'e'}
D

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

In [46]:
# Create dictionary - Using dict and zip functions
D = dict(zip(range(1,6), 'abcde'))
D

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

In [47]:
# Indexing - Input key, return value
D[3]

'c'

In [49]:
# Alternative indexing - .get method
D.get(40, 'There are only 26 letters')

'There are only 26 letters'

In [50]:
# Dictionaries are mutable - Update via assignment
D[6] = 'f'
D

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f'}

In [51]:
# Deletion works too
del D[6]
D

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

In [52]:
# Dictionaries have length
len(D)

5

In [53]:
# Access keys
D.keys()

dict_keys([1, 2, 3, 4, 5])

In [54]:
# Access values
D.values()

dict_values(['a', 'b', 'c', 'd', 'e'])

In [55]:
# Access items
D.items()

dict_items([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')])

In [56]:
# Although not technically iterable (because they are unordered), you can iterate over the keys
for k in D: # D.keys() works too
    print(k,D[k])

1 a
2 b
3 c
4 d
5 e


In [57]:
# Or, the items
for key, item in D.items():
    print(key, item)

1 a
2 b
3 c
4 d
5 e


In [None]:
# Explore dictionary methods using tab completion
D.

### Comprehensions and Generator Expressions

**Comprehensions** and **generator expressions** are convenient ways of generating one iterable from another (they do not have to be the same type). These types of statements are excellent examples of what the text refers to as *syntactic sugar*, which means that it is a very convenient and concise way of writing code (I call this Pythonic). You should consider using a comprehension or generator expression in place of creating an iterable using a **for** loop. They are much more efficient!

The basic idea is that you can create an iterable by looping through another iterable. The primary syntax follows the same logic as the initial statement in a **for** loop:

*expr* **for** item **in** iterable

where *expr* represents what you want to return for each item in the new iterable. You can return the item as is, or a function of the item (e.g., computation, transformation, etc.). Although you can use comprehensions and generator expressions to perform computation, this is not their primary use. You should use NumPy arrays for computation, as they are designed specifically for that purpose.

You can also add conditions for whether you want to include a specific item in your new iterable. 

*expr* **for** item **in** iterable **if** *cond*

where *cond* is a boolean object, most often returned by a specific comparison (e.g., **if** item < 10).

You can also created nested comprehensions and generator expressions

*expr* **for** item1 **in** iterable1 **for** item2 **in** iterable2 **if** *cond1* **if** *cond2*

You can also leverage functions such as **range**, **enumerate**, **zip**, **reversed**, and **sorted** to form your iterables.

The type of iterable that you create depends on the specific syntax:

* List comprehensions are created using [ ]
* Set comprehensions are created using { }, where *expr* is anything other than a key:value pair
* Dictionary comprehensions are created using { }, where *expr* is a key:value pair
* Generator expressions are created using ( ), which create generators that only produce one item at a time using the .next() method. Once a generator has been iterated through, it is exhausted (whereas iterables can be iterated through multiple times).

In [58]:
%%time
# Traditional list construction
L = [] # empty list
for x in range(10):
    L.append(x)
print(L)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
CPU times: user 146 µs, sys: 56 µs, total: 202 µs
Wall time: 157 µs


In [59]:
# List comprehension
%time [x for x in range(10)]

CPU times: user 9 µs, sys: 1 µs, total: 10 µs
Wall time: 13.8 µs


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [60]:
# List comprehension with conditional
[x for x in range(10) if x % 2 == 0]

[0, 2, 4, 6, 8]

In [61]:
# Combine list comprehension with ternary expression
[x if x % 2 == 0 else 0 for x in range(10)]

[0, 0, 2, 0, 4, 0, 6, 0, 8, 0]

In [62]:
# Nested list comprehension
[(a, b, (a ** 2 + b ** 2) ** 0.5) for a in range(1,6) for b in range(1,6)]

[(1, 1, 1.4142135623730951),
 (1, 2, 2.23606797749979),
 (1, 3, 3.1622776601683795),
 (1, 4, 4.123105625617661),
 (1, 5, 5.0990195135927845),
 (2, 1, 2.23606797749979),
 (2, 2, 2.8284271247461903),
 (2, 3, 3.605551275463989),
 (2, 4, 4.47213595499958),
 (2, 5, 5.385164807134504),
 (3, 1, 3.1622776601683795),
 (3, 2, 3.605551275463989),
 (3, 3, 4.242640687119285),
 (3, 4, 5.0),
 (3, 5, 5.830951894845301),
 (4, 1, 4.123105625617661),
 (4, 2, 4.47213595499958),
 (4, 3, 5.0),
 (4, 4, 5.656854249492381),
 (4, 5, 6.4031242374328485),
 (5, 1, 5.0990195135927845),
 (5, 2, 5.385164807134504),
 (5, 3, 5.830951894845301),
 (5, 4, 6.4031242374328485),
 (5, 5, 7.0710678118654755)]

In [63]:
# Set comprehension
{x for x in range(5)}

{0, 1, 2, 3, 4}

In [64]:
# Dictionary comprehension
{key:value for key,value in zip(range(1,6), 'abcde')}

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

In [65]:
# Generator expression
G = (x for x in range(100))
G

<generator object <genexpr> at 0x1136a4570>

In [66]:
# .next method
next(G)

0

In [75]:
# Generators as iterables
sum([x for x in G])

0

In [68]:
# Generator exhaustion
next(G)

StopIteration: 

## Functions

Functions are essentially blocks of code that you can reference by name. You should define functions for steps that you anticipate using multiple times. If you only perform a series of steps once or twice, you probably do not need to define a function. Functions may also be used as inputs to other functions (e.g., mapping, sorting).

Functions are created using the **def** statement, following the same indented code structure as a conditional or a loop.

```
def func_name(inputs):
    statements
    [return object(s)] # optional
```

Functions can receive any number of inputs (zero and up) and return any number of outputs (zero and up). Inputs can be entered in order (*positional*) or entered by referencing the specific name of the argument (*keyword*). Inputs can be required (i.e., the function will return an exception if the input is not supplied) or optional (i.e., the function will still work if the input is not supplied). Default values must be assigned for optional inputs. 

Functions that do not return any objects may print results, generate visualizations, save results to a file, or modify objects via reference (see Pass By Reference subsection below).

In [69]:
# Define function that does not return a result
def add(x, y):
    print(x + y)

add(5,10) # positional arguments

15


In [70]:
# Define function that returns a result
def add(x, y):
    return x + y

res = add(y=10,x=5) # keyword arguments
res
print(res)

15


In [71]:
# Define function with default values
def add(x=0, y=0):
    return x + y

# Test cases
print('No inputs:', add())
print('Positional input:', add(1))
print('Keyword input:', add(y=5))
print('All inputs:', add(1,2), add(x=1, y=2), add(y=2, x=1)) # All cases, x = 1, y = 2

No inputs: 0
Positional input: 1
Keyword input: 5
All inputs: 3 3 3


In [72]:
# Define function with multiple outputs
def add(x=0, y=0):
    return (x, y), x + y
add(5,10)

((5, 10), 15)

### Functions vs. Methods

For our purposes, functions and object methods are essentially the same...

* One or more bundled steps performed on some input object(s)
* In some cases, there will be a function and an object method that do the same thing (e.g., sum)

...BUT, they sometimes differ in how they are used.

* Functions are called on zero or more objects and may return result(s) that can be assigned to a variable
* Object methods are called by an object (and the calling object is often input to the method), which can either update the calling object or return result(s) that can be assigned to a variable

In Python, most functions are still methods of a particular module (library).

In [73]:
# Function approach to summing an array
import numpy as np
arr = np.arange(1,11,2)
print(arr)
np.sum(arr)

[1 3 5 7 9]


25

In [74]:
# Method approach to summing an array
arr.sum()

25

### Global vs. Local Variables

When working with functions, you should be very aware of which variables are defined within global and local scopes. **Global variables** are defined throughout your notebook or script, and are accessible within functions even if they have not been input or defined within the function. **Local variables** are either input to the function, or defined within the function. They do not exist outside of the function. Be very careful if you use the same variable name globally and locally, unexpected things can happen!

In [76]:
# Define function with local variable
def add_to_y(x):
    y = 5
    return x + y

In [77]:
# Define function that utilizes global variable
w = 10
def add_to_w(x):
    return x + w

add_to_w(5)

15

### Pass By Reference

Objects that are input to functions are **passed by reference**, which means there is a possibility that the object will be modified by the function. Be very careful when passing mutable objects to functions, unexpected things can happen! Use the **copy** module to create a copy of a mutable object if you do not want the original object to be modified by the function.

In [78]:
# Define function to append a number to a list
def append_num(L, num):
    L.append(num)

In [79]:
# Test case
num = 5
N = [1,2,3]
append_num(N, num)
N

[1, 2, 3, 5]

In [80]:
# Use copy module
import copy
def append_num(L, num):
    return copy.copy(L) + [num]

In [81]:
# Test case
num = 5
N = [1,2,3]
print(N, append_num(N, num))

[1, 2, 3] [1, 2, 3, 5]


### Lambda Functions

Lambda functions are essentially lightweight functions that you can write in a single line. They are very useful when using a simple function as an input to another function. They can be assigned to a variable or input directly.

In [128]:
# Define lambda function
lf = lambda s: s.split(' ')[1] # Split string using ' ' as delimiter, return second item from resultant list
lf

<function __main__.<lambda>(s)>

In [133]:
# Sort list of names by last name
names = ['George Washington', 'John Adams', 'Thomas Jefferson', 'James Madison', 'James Monroe']
sorted(names, key=lf)

<function <lambda> at 0x10c9a6510>


['John Adams',
 'Thomas Jefferson',
 'James Madison',
 'James Monroe',
 'George Washington']

In [130]:
# Return last name only
[last for last in map(lf, names)]

['Washington', 'Adams', 'Jefferson', 'Madison', 'Monroe']

## Python Essentials Wrap Up

This concludes our initial coverage of foundational Python concepts, which will be complemented by what you are learning via DataCamp. For the purposes of this course, you should consider these constructs as tools in your toolbox for processing and analyzing data:

* Comments and **print** statements
* Importing modules
* Variable assignment
* Data structures: scalars, sequences, and collections
* Indexing, selection, and filtering
* Computation and comparisons
* Control flow: conditionals and loops
* Functions

## Next Time: Python Essentials Lab