# Python for Data Science - after lunch

##### In this section, we will introduce lists and logic and control flow

### 3. Data types: lists
Ints, floats and strings are the most basic data structures (think of them as atoms). Next, we'll look at data types that combine those atoms. Lists, tuples and dictionaries are compound data structures that group together other items.

|Data structure | Properties| Syntax|
|----|----|----|
|List | Ordered, mutable sequence | mylist = [1,2,3] |
|Tuple | Ordered, immutable sequence | mytuple = (1,2,3) |
|Set | Unordered set of unique values | myset = set(1,2,3) |
|Dictionary | Mutable set of key, value pairs | mydict = {'first_value':1, 'second_value:2} |


#### 3.1 Manipulating lists
Lists are collections of items. To create one, put a series of items in square brackets, separated by commas. Lists can contain items of different types. They are helpful when your data has an order and may need to be changed in place.

In [None]:
# create a list of strings
weekdays = ['monday','tuesday','wednesday','thursday','friday']
weekdays

In [None]:
# Lists are ordered collections; they are indexed starting at zero.
# To get an individual item, use square brackets and the appropriate index value.
weekdays[0]

In [None]:
# select slices (eg. the first to the fourth item) with square bracket notation

weekdays[0:3]

In [None]:
# check the type of items in a list
type(weekdays[3])

In [None]:
# a list is a mutable, you can change its contents in-place
weekdays[3] = 'thursday - practice Python!'
weekdays

In [None]:
# add an item to a list with append()
weekdays.append('saturday')
weekdays

In [None]:
# test for a value in your list
'saturday' in weekdays

In [None]:
# use .remove() to clean up the weekdays list

weekdays.remove('saturday')
weekdays

In [None]:
# concatenate two lists
odds = [1,3,5]
evens = [2,4,6]
all_nums = odds + evens
all_nums

In [None]:
# the built-in function len() also applies to lists
len(all_nums)

### 4. Logic and control flow

Definition of **control flow**:
* In a simple script, program execution starts at the top and executes each instruction in order. 
* **Control flow** statements can cause the execution to loop and skip instructions based on conditions.

#### 4.1 Loops and iterables
Definition: an **iterable** is an object capable of returning its members one at a time. Strings, lists and dictionaries are all iterables.

A **for loop** runs a block of code repeatedly "for" each item in an iterable. End the declaration with : and indent the subsidiary code.

In [None]:
for color in ['red','green','blue']:
    print("I love " + color)

In [5]:
# or characters in a string
for letter in 'abcd':
    print(letter.upper())

A
B
C
D


In [6]:
# the range() function produces a helpful iterator
for n in range(5):
    print("I ate {} donuts".format(n + 1))

I ate 1 donuts
I ate 2 donuts
I ate 3 donuts
I ate 4 donuts
I ate 5 donuts


#### 4.2 Logic operators

We test conditions using logic operators.

To build up operations on larger datasets, more control flow tricks are helpful: including Boolean logic.

The statement `a == b` asks Python to evaluate whether variable a equals variable b; the interpreter will return True or False.

Similar statements would be `a > b`, `a >= b`, or `a != b`. 

| Symbol | Task Performed |
|----|---|
| == | True, if it is equal |
| !=  | True, if not equal to |
| < | less than |
| <= | less than or equal to |
| > | greater than |
| >= | greater than or equal to |

In [None]:
# But compare them using '=='
a == b

In [None]:
# Test whether a does not equal b
a != b

In [None]:
# Logic expressions evaluate to True or False (datatype: Boolean)

test = b > a

test

In [None]:
type(test)

#### 4.3 Conditional statements with if

My pet Python is a vegetarian. She will test whether variable 'food' is 'burger', 'chicken' or 'veg', then decide whether to eat.

Do this with 'if', 'elif' (else if), and 'else'.

In [None]:
food = 'veg'

In [None]:
if food == 'veg':
    print ('yum')
elif food == 'chicken':
    print ('hmm maybe')
elif food == 'burger':
    print ('no thanks')
else:
    pass

NOTE: Here's how the structure works:
* start with an 'if' statement, specifying the logical test to apply
* make sure your 'if' statement ends with :
* **indent the conditional code block.** Whatever code should be executed if the condition is true, indent it with a tab.
* test additional actions using 'elif', and any other actions with 'else'.

The `and`, `or` and `not` operators check whether combinations of statements are true at the same time.

In [1]:
month = 'July'
hour = 14

In [2]:
(month == 'July') and (hour < 12)         # is it a morning in July?

False

In [3]:
(month == 'July') or (hour < 12)          # is it either a morning, or in July?

True

In [4]:
not(hour < 12) and (month == 'July')     # is it not a morning, and in July?

True

#### 4.4 Testing conditions inside a loop
Combining loops with logic allows you to build more sophisticated code structures:

In [None]:
days = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']

for day in days:
    if day == 'Sat':
        location = '--> Beach!'
    elif day == 'Sun':
        location = '--> My sofa!'
    else:
        location = '--> MC5-215B'
    print(day, location)

In [8]:
# EXAMPLE 2: is your pet allowed?

authorized_pets = ['small dog', 'cat', 'hamster','budgerigar']

print("Welcome to Nick's Apartment Block.")
my_pet = input("Type your pet's breed to see if it's accepted:")

if my_pet in authorized_pets:
    print("Congratulations, your {} is welcome here!".format(my_pet))
else:
    print("\nSorry your {} is NOT ACCEPTED".format(my_pet))

Welcome to Nick's Apartment Block.


Type your pet's breed to see if it's accepted: bat



Sorry your bat is NOT ACCEPTED


### 5. Indexing and slicing
Several data types are defined as 'sequences.' They share a common approach to selecting their elements using square bracket notation. This powerful notation works across strings, lists, arrays and DataFrames:

In [None]:
# to get one character from a string, put the index number in square brackets directly after the variable name
language = 'Python'
language[0] 

In [None]:
# index values can be negative.
language[-1]

Index values point between characters. The left edge of the first character is 0. Python has six characters, so the right edge of the last character is index 6.

In [None]:
#       +---+---+---+---+---+---+
#       | P | y | t | h | o | n |
#       +---+---+---+---+---+---+
#       0   1   2   3   4   5   6
#      -6  -5  -4  -3  -2  -1

# credit: www.python.org/3/tutorial

You can 'slice' strings and other sequences, using the start and end index

In [None]:
# slices give you all elements from the start index, up to (but not including) the end index

language[0:4]

What happens if you leave out the start or end index while slicing? Python will use default values instead. Take a sequence of length `n`. For start position, it will default to 0. For end position, it will default to `n`.

In [None]:
# everything up to fourth index
language[:4]

In [None]:
# fourth position onwards
language[4:]

In [None]:
# fourth from last, up to end of string
language[-4:]

Sneak preview: You can use the same index notation with higher dimensional datastructures, eg. a 3D array (eg. a stack of rasters: latitude, longitude, time, temperature).

### 6. An aside: extracting data from messy strings

In [None]:
# say you get a large column of ZIP codes, but in a messy format like this
zip_code1 = "Fred: ZIP 20022-0049"
zip_code2 = "Margaret: ZIP 20009-0132"

In [None]:
# we're interested in this part:
zip_code1[10:15]

In [None]:
# how could we systematicaly pull out the key 5 digits, to create a clean list of zips?

To crack a problem like this, you could:
* Use tab completion to list available string methods
* Ask StackOverflow
* Check the [documentation](https://docs.python.org/2/library/stdtypes.html#string-methods)

In [None]:
# Get help on the .split() method

zip_code1.split?

In [None]:
# a quick solution: split each string twice (using a different separator):

answer = zip_code1.split()[2].split('-')[0]

In [None]:
# then make it an integer:

int(answer)

Operating on data at scale requires more firepower: eg. list comprehensions and functions.

### 7. Classes and methods (applied to lists)

__Sneak preview of classes:__ Python (as an object oriented language) lets you define __classes__ of objects. A class of objects has in-built functions that can be summoned up quickly (these are called __methods__).
    
Example:
* I define a class `road_network`.
* Each time I type `my_network = road_network(parameter1, parameter 2...)`, I create a new instance of the class.
* Helpful functionality might be (i) calculate total length of roads; (ii) calculate shortest path between two points.
* I write a method `find_shortest_path(start_point, end_point`. This could be accessed from any instance of class `road_network` in future.
* Like this: `my_network.find_shortest_path(my_start_point, my_end_point)`

__List methods.__ Lists (and other data types) are implemented as classes. So check out the helpful methods that are at your disposal.

In [None]:
cubes = [1, 8, 27]
cubes

In [None]:
cubes.insert(0,0)    # insert element at given index

In [None]:
cubes.append(65)     # add element to end of list

In [None]:
cubes.remove(65)    # remove element if it exists

In [None]:
cubes.extend([64,125,216])   # add all the elements of an iterable 

In [None]:
cubes

In [None]:
cubes.pop()         # return the last element (and remove it)

In [None]:
cubes.count(64)    # count how many times an element appears

In [None]:
print(cubes)
cubes.clear()       # delete all elements
print(cubes)

Remember tab complete or question mark to list available methods of a object. Other object types that we'll use extensively: NumPy arrays, Pandas Series, Pandas DataFrames. Each has its own (pretty amazing) set of methods.