### Review of Data structures and control flow - towards big data
This notebook goes into more depth on strings, conditionals, and error handling. It also begins show us how to handle data at scale through loops and list comprehensions. 

### Strings and escape characters
Strings can be defined several ways, such as with single quotes or double quotes. Why? Strings themselves may include quotes. The resulting ambiguity can break your code. All programming languages have tricks to get around this issue:

In [None]:
# One string, two ways

print('Yes, they said.')
print("Yes, they said.")

In [None]:
# An easy way to trip up

print(''Yes', they said.')

In [None]:
# If your string contains one type of quote, define it with the other type

print("'Yes', they said")

In [None]:
# backlash is an 'escape character'. It negates any special properties of the character that follows:

print('\'Yes\', they said')

In [None]:
# ... if followed by n, it creates a new line
print("They said:\nYes")

In [None]:
# ... or a tab if followed by t
print("They said:\tYes")

In [None]:
# careful with unintended escape characters in filenames!

string_will_fail = 'C:\Users\charl\Documents\CE\RAM\OneDrive_1_3-6-2019\QXN\RN'

In [None]:
# adding r denotes 'raw string'

string_will_work = r'C:\Users\charl\Documents\CE\RAM\OneDrive_1_3-6-2019\QXN\RN'

###  Control the `print` statement
`print()` is a built-in function that echos objects to the console. When printing strings, use the .format() method. Putting this at the end of a string let's you:
* substitute variables into the string;
* control how they're formatted (eg. decimal places).

In [7]:
print("A string: that was easy")

A string: that was easy


In [8]:
print(42, "that was also easy")

42 that was also easy


In [10]:
print(20000/365, "that's not ideal")

54.794520547945204 that's not ideal


In [11]:
# include variables inside strings with {}, then (after the string), .format()
x = 42

print("Here's a number: {}. It's less than 50".format(x))

Here's a number: 42. It's less than 50


In [15]:
# you can include multiple substitutions, and they don't have to be variables (operations are fine)

print("Is {} really less than 50? Answer: {}.".format(x, x<50))

Is 42 really less than 50? Answer: True.


Note: `.format()` actually has a mini-language associated with it, check the documentation [here](https://pyformat.info/).

In [17]:
# just memorize this one for now:

print("Daily salary is approximately {:.2f} (two decimal places)".format(20000/365))

Daily salary is approximately 54.79 (two decimal places)


### Building programs with `if`, `elif` and `else`

We already met `if` constructions. Get the indentation right, and build more sophisticated rules that test multiple conditions.

In [None]:
# an if statement executes the indented code only if some condition is true

my_value = 11

if my_value > 10:
    print("Number is greater than 10")

In [None]:
# use Boolean operators to test multiple conditions

if (my_value > 10) and (my_value < 15):
    print("Number is between 10 and 15")

In [None]:
# if the first if statement evaluates to false, elif executes a code block if its condition is true
# else executes a code block if no preceding if or elif evaluated to true

if (my_value > 0) and (my_value < 10):
    print("Number is positive and less than 10")
    
elif my_value > 10:
    print("Number is greater than 10")
        
else:
    print("Must be negative or not a number.")

### Error handling

In [None]:
# What happens if we run the cell above with 'penguin' instead of a number?


In [None]:
# try-except is one method to catch and handle errors

my_list = [5,6,'Sally',10]

for obj in my_list:
    try:
        print('{}'.format(obj + 1))
    except:
        print("I am not a number, I am a free woman!")

### Iterables and range()

In [None]:
# strings and lists are examples of iterables: they can return their members one-by-one.

for meal in ['Breakfast', 'Snack', 'Dinner']:
    print("{} has {} letters".format(meal, len(meal)))

In [None]:
# range() is another way to generate iterables; it returns an arithmetic series

print("NUMBERS AND THEIR CUBES")
for i in range(5):
    print(i, i**3)

In [None]:
# as usual, you get all numbers from start point (included) to end point (not included)

print("NUMBERS AND THEIR CUBES")
for i in range(-5, 5):
    print(i, i**3)

In [None]:
# using range(n+1) often makes sense

print('Give me numbers up to n, where n = 3')

n = 3
for i in range(n+1):
    print(i)

In [None]:
# what happens when you print a range() item?
range(5)

In [None]:
# the list() function turns any iterable into a list.   (not ideal if you're counting to 1 million!)
list(range(5))

In [None]:
# enumerate() lets you loop through an iterable,  keeping track of where you are

meals = ['Breakfast', 'Post-Breakfast Snack','Elevenses', 'Lunch','Tea','Dinner','Bedtime snack']

for n, meal in enumerate(meals):
    print("Meal {} today: {}".format(n + 1, meal))

### List comprehensions
Return to our zip code example. We have seen many ways to operate on strings or numbers. But how to scale these operations across several hundred (or thousand) examples?

List comprehensions are a concise way to build lists using rules. They apply an operation to a series of items, and package the result in a list.

In [None]:
# first a squares example
[x**2 for x in range(10)]

Steps:
* First write the expression to evaluate.
* Then add a `for` statement
* And the sequence to perform the operation on.

In [None]:
# re-write the following as a list comprehension

absolute_cubes = []
for n in range(-100, 101):
    absolute_cubes.append((n**3))

In [None]:
# code here:


__Data wrangling example__

In [None]:
# here's our messy input data
input_data = ["Alex: ZIP 20022-0049", "Margaret: ZIP 20009-0132", "Hermione: ZIP 10009-3214"]

In [None]:
# as a for loop

clean_list = []
for i in input_data:
    clean_list.append(i.split()[2].split('-')[0])

In [None]:
[x.split()[2].split('-')[0] for x in input_data]