----
# Python Basics Review
----
Although this course requires familiarity with foundational programming concepts, it's useful to review those concepts in Python specifically. Even experienced programmers will benefit from a refresher on Python's data types, control flow syntax, package handling, and other basic concepts. 

**(Still, if these concepts are completely foreign to you, this course is probably not for you.)**

We'll start with the most basic function of any programming language: 

In [1]:
print("Hello world")

Hello world


---
## Variables
---

In Python, variables are declared using the **`=`** sign. You don't need to write **`def`** or **`dim`** or specify the datatype.

In [2]:
#By the way, comments are preceded by the pound sign (aka hashtag) in Python
my_number = 6
my_fruit = "apple"
my_boolean = True

In [3]:
print(my_number)
print(my_fruit)
print(my_boolean)

6
apple
True


Some rules about variable names:
- There's no limit to how long a variable's name can be. 
- Variable names can contain uppercase letters, lowercase letters, and numbers, **but** they cannot _start_ with a number.
- The underscore character (\_) can also appear in a variable name and is often used in names with multiple words, such as **`my_name`** or **`airspeed_of_unladen_swallow`**. 

Some examples of variable names that violate these rules:

In [4]:
# 76trombones is illegal because it begins with a number. 
76trombones = 'big parade'

SyntaxError: invalid syntax (<ipython-input-4-aa09822d0ee5>, line 2)

In [5]:
# more@ is illegal because it contains an illegal character
more@ = 1000000

SyntaxError: invalid syntax (<ipython-input-5-55c60701b68f>, line 2)

In [6]:
class = 'Introduction to Programming'

SyntaxError: invalid syntax (<ipython-input-6-e4f934285711>, line 1)

What's wrong with the last example? It turns out that **`class`** is one of Python’s **reserved keywords**. 
Python has 33 reserved keywords, which are listed below and are also [available in the Python manual](https://docs.python.org/3/reference/lexical_analysis.html#keywords). It a very common mistake for beginners to use some of the reserved keywords below as their variable names, so make sure you're familiar with them. The Jupyter IDE will help you, as reserved keywords will be colored in green.

----
## Primitive Data Types
----
The three primitve data types in Python are **strings**, **numbers**, and **booleans**. Let's start with **strings.**

## Strings
**Strings** are the data type we use to store textual data, which can range from single words to entire news articles or even HTML pages.

In [7]:
string1 = 'You can declare strings with single quotes...'
string2 = "...or double quotes"
print(string1)
print(string2)

You can declare strings with single quotes...
...or double quotes


In declaring and printing strings, some characters take on a special meaning:
- **`\n`** adds a new line

In [8]:
print("This is the first line \n and this is the second line")

This is the first line 
 and this is the second line


- **`\t`** makes an indentation

In [9]:
print("Daily Total:\t$5,000 \nMonthly Total:\t$123,000")

Daily Total:	$5,000 
Monthly Total:	$123,000


- The backslash (**`\`**) _escapes_ the following character, meaning that the character won't perform its normal function and will instead be treated as normal text

In [10]:
print("For example, to print the backslash, use two backslashes: \\")

For example, to print the backslash, use two backslashes: \


In [11]:
print("Or maybe I want to write \\t or \\n instead of adding a tab or new line")

Or maybe I want to write \t or \n instead of adding a tab or new line


In [12]:
print("Or maybe I want to use \"quotes\" within the string itself")

Or maybe I want to use "quotes" within the string itself


- Prefixing a string with **`r`** makes it a **raw** string, meaning that _all_ of the characters are "escaped". This is very useful for declaring filepaths and using **regular expressions**, which we'll encounter later in this course.

In [13]:
print(r"C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Anaconda3 (64-bit)")

C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Anaconda3 (64-bit)


In [14]:
#An example of a regular expression:
print(r"(?:(?:(\s*\(?([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\*)|([2-9]1[02-9]|[29][02-8]1|[2-9][02-8][02-9]))\)?\s*(?:[-]\s*)?)([2-9]1[02-]|[2-9][02-9]1|[2-9[02-9]{2})\s*(?:[.-\s*)?([0-9]{4})")
#(This regular expression is used to identify phone numbers)

(?:(?:(\s*\(?([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\*)|([2-9]1[02-9]|[29][02-8]1|[2-9][02-8][02-9]))\)?\s*(?:[-]\s*)?)([2-9]1[02-]|[2-9][02-9]1|[2-9[02-9]{2})\s*(?:[.-\s*)?([0-9]{4})


To write a string over multiple lines, use **triple quotes**.

This can be useful in storing large strings of text, like articles or speeches:

In [15]:
string1 = '''
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. 

We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.
'''
print(string1)


Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. 

We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.



There are many useful **functions** and **operations** that we can perform on strings, even in base Python, without any language-processing libraries:
- You can **add** (or "concatenate") strings together, the same way you would with numbers

In [16]:
string1 = "hello"
string2 = "world"
print(string1+string2)

helloworld


In [17]:
#Let's fix that...
print(string1 + " " + string2)

hello world


- Using brackets **`[]`** you can **index** a string to find particular letters or sets of letters

In [18]:
string1 = 'apple'
#Since Python's index begins at zero, this returns the first letter of the string:
string1[0]

'a'

The **index operator** also allows other, more dynamic uses:

In [19]:
print("First character: " + string1[0])
print("Last character: " + string1[-1])
print("First three characters: " + string1[:3])
print("Last three characters: " + string1[-3:])
print("All characters after the third character: " + string1[2:])
print("You get the idea...")

First character: a
Last character: e
First three characters: app
Last three characters: ple
All characters after the third character: ple
You get the idea...


- The **`len`** function counts the number of characters in the string

In [20]:
len("sesquipedalian")

14

- The **`upper`** and **`lower`** operators change the case of the string

In [21]:
print(string1.upper())
print("ALLCAPS".lower())

APPLE
allcaps


### Exercise
Using the **`len`** function and the **`string1`** variable, print the sentence: 
> "apple" is 5 characters long

In [22]:
#Don't forget to to convert the resulting number to a string!
print("\""+ string1 + "\"" + " is " + str(len(string1)) + " characters long.")

"apple" is 5 characters long.


- The **`startswith`** and **`endswith`** assess whether or not a string starts with a particular character:

In [23]:
string1.startswith('a')

True

In [24]:
string1.endswith('q')

False

- When combining strings and numbers, the **`.format`** operator can help format the number

In [25]:
number = 2334.4324
print("The raw number is {num}".format(num=number))
print("And the rounded number is {num:.0f}".format(num=number))

The raw number is 2334.4324
And the rounded number is 2334


This about covers the basics of strings and textual data in Python. Of course, there are far more complicated functions, enabled by libraries, that we can perform on textual data, but we'll save that for later in the course. For more on string operations in base Python, here are some useful resources:

- [Full List of String Operators in Base Python](https://docs.python.org/2/library/string.html)
- [Number Formatting in Python](https://mkaz.tech/code/python-string-format-cookbook/)

## Numeric
The second data type, **numeric**, is used to store numbers and quantities. Numberic variables in Python fall into two main categories:
- **Integers**: Whole numbers like - 1, 2, 3, -5, 1000, 100000000 etc. 
- **Floats**: Decimals and fractions like 0.1, 1.1, -0.0004, 23432.2 etc.

The distinciton between the two is important because there are certain programming concepts that apply to integers but not floats. Depending on the number, Python will automatically assume the datatype of a number, but they're easily changed:

In [170]:
#The type() function returns the datatype of a variable
my_int = 6
type(my_int)

int

In [171]:
my_float = 2.3
type(my_float)

float

In [172]:
my_int = float(my_int)
my_int

6.0

In [29]:
#In changing floats to integers, Python rounds to the nearest whole number
my_float = int(my_float)
my_float

2

Whether integer or float, though, we can basic arithmetic functions on numeric datatypes:

In [174]:
2 + 2

4

In [175]:
4 - 1

3

In [176]:
3*3.5

10.5

In [182]:
-4*23

-92

In [183]:
x = 3 * 100 / 24

In [184]:
print(x)

12.5


Don't forget [PEMDAS](https://en.wikipedia.org/wiki/Order_of_operations) - it applies in Python too:

In [178]:
24*8/(2-4)

-96.0

Don't make the mistake of using commas in declaring numeric variables:

In [180]:
1,000,000

(1, 0, 0)

You can also use **`%`** to get find the **remainder** in division.

In [181]:
25%4

1

In [37]:
301%100

1

In case you ever need to use **imaginary or complex numbers** in Python, you can declare them by writing **`j`** after them.

In [185]:
1j

1j

In [186]:
1j*1j

(-1+0j)

Many mathematical functions, however, aren't available in base Python. To use trigonometric functions, logarithms, or exponents, we'll need to import the **`math`** library: 

In [188]:
import math

print("12! = " + str(math.factorial(12)))
print("12^2 = " + str(math.pow(12,2)))
print("log(12) = " + str(math.log(12)))

12! = 479001600
12^2 = 144.0
log(12) = 2.4849066497880004


(More on this in the [math library documentation](https://docs.python.org/2/library/math.html).)

We can also assess **equality** and **inequality** between numbers, using some basic operators:
+ **`==`**: equality of values
+ **`<`**: less than
+ **`<=`**: less than or equal to
+ **`>`**: greater than
+ **`>=`**: greater than or equal to
+ **`!=`**: not equal to, different than

In [189]:
3 == 3

True

In [191]:
3 == 4

False

In [193]:
3 != 4

True

In [44]:
500 > 1000

False

In [45]:
-3 < 3

True

### Exercise
Determine which is greater: 10 factorial or 12 to the 4.5th power.

In [46]:
math.factorial(10) > math.pow(12,4.5)

True

In [47]:
math.factorial(10)

3628800

In [48]:
math.pow(12,4.5)

71831.61109149648

In fact, the topic of numeric inequalities segues perfectly into the third data type.
### Booleans 
The last data-type, **booleans**, simply store **`True`** and **`False`** values. They're simple, but they're integral to the functioning of the language.

Of course, we can declare variables as such, using **`True`** or **`False`** (again, some of Python's reserved keywords):

In [49]:
my_boolean = True
my_boolean

True

But more often booleans will be generated using other inequalities, like ones discussed above.

In [50]:
my_boolean = 3 == 3
my_boolean

True

In [51]:
my_boolean = 3 + 3 == 25
my_boolean

False

We can evaluate equality between strings, too:

In [52]:
"apple" == "orange"

False

In [194]:
"apple" == "apple"

True

In [196]:
'cat' == 'CAT'

False

There are a few important operators that can be used on booleans:
- **`not`** returns the _opposite_ of the boolean value
- **`or`** assesses whether _any_ of several booleans are **`True`**
- **`and`** assesses whether _all_ of several booleans are **`True`**

In [197]:
not False

True

In [198]:
not not True

True

In [199]:
not not not False

True

In [200]:
True and False and True

False

In [58]:
True or False and False and False

True

Remember, parentheses matter:

In [204]:
False and True or True

True

In [205]:
True or True and False

True

In [60]:
False and (True or True)

False

Since using **inequality signs** like **`==`**, **`!=`**, **`>`**, etc. produces boolean values, we can use them in conjunction with these operators to assess compound logic:

In [61]:
"cat" == "dog" or 3 == 3

True

In [206]:
not("cat" == "dog") and 4 != 3

True

In [208]:
'cat' != 'dog' and 4 != 3

True

In [209]:
4 > 3 or 3 < 4

True

**Note:** As with most other programming languages, booleans can also be treated as numbers, with **`True = 1`** and **`False = 0`**

In [210]:
True + 3

4

In [211]:
False == 0

True

----
## Data Structures
----
The three **primitive** data types store _single_ values - 12, "dog", `True`, `False`, etc.

Storing and organizing _multiple_ values requires [**data structures**](http://docs.python.org/2/tutorial/datastructures.html), which store multiple individual variables adherent to certain properties, depending on the particular data structure.

There are four main data structures in Python - **lists**, **sets**, **tuples**, and **dictionaries**. Each follows slightly different rules and serves a slightly different purpose.

## Lists
A **list** (aka **array** or **vector**) is a collection of values, following these rules:
1. Lists **can** contain duplicate values.
2. Lists **can** contain multiple data types.
3. Lists **are** ordered.
4. Lists are **mutable**, meaning that they can be changed, updated, shortened, and elongated.

Lists are declared using brackets: **`[ ]`** or using the **`list()`** function.

In [66]:
my_list = [0,1,2]
print(my_list)

[0, 1, 2]


In [67]:
my_list = [1, 2, 2, 3, 3, 3]
my_list

[1, 2, 2, 3, 3, 3]

In [213]:
my_list = ["apple", "banana", 3, 4, 5]
my_list

['apple', 'banana', 3, 4, 5]

Each value in a list has a unique **index**, which can be accessed much in the same exact way that we index letters in strings.

In [216]:
my_list[1]

'banana'

In [215]:
my_list[1:]

['banana', 3, 4, 5]

In [214]:
my_list[-2:]

[4, 5]

Like other datatypes, there are several functions and operators that are unique to lists:
- The **`.sort()`** operator sorts the list

In [220]:
#With lists of numbers, the operator automatically sorts least to greatest
my_list = [1,2,3,124,-45,-1000,-14]
my_list.sort()
my_list

[-1000, -45, -14, 1, 2, 3, 124]

In [218]:
#With lists of strings, the operator automatically sorts A to Z
my_list = ['banana','cat','apple']
my_list.sort()
my_list

['apple', 'banana', 'cat']

In [219]:
#But we can't sort a list with both datatypes!
my_list = [1,2,3,124,-45, 'banana','cat','apple']
my_list.sort()
my_list

TypeError: '<' not supported between instances of 'str' and 'int'

- The **`.reverse()`** operator reverses the order of the list

In [221]:
my_list = [1,2,3,124,-45, 'banana','cat','apple']
print(my_list)
my_list.reverse()
print(my_list)

[1, 2, 3, 124, -45, 'banana', 'cat', 'apple']
['apple', 'cat', 'banana', -45, 124, 3, 2, 1]


- As with strings, the **`len()`** function returns the length of the list (i.e. the number of individual values it contains)

In [222]:
print(my_list)
print("This list is " + str(len(my_list)) + " datapoints long.")

['apple', 'cat', 'banana', -45, 124, 3, 2, 1]
This list is 8 datapoints long.


- We can **add** and **multiply** lists

In [77]:
my_list = ['a','b','c']
my_list * 5

['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']

In [78]:
list_A = [1, 2, 3]
list_B = ['a','b','c']
list_A + list_B

[1, 2, 3, 'a', 'b', 'c']

In [79]:
(list_A + list_B) * 3

[1, 2, 3, 'a', 'b', 'c', 1, 2, 3, 'a', 'b', 'c', 1, 2, 3, 'a', 'b', 'c']

- The **`.index()`** operator finds the index of a a particular value within a string

In [80]:
my_list = ['a','b','c']
my_list.index('b')

1

In [225]:
#If a value appears multiple times in a list, the .index() operator returns the first instance
my_list = ['dog','cat','cat','fish']
my_list.index('cat')

1

In [5]:
['a','a','a'].index('a')

0

- Similarly, **`.count()`** operator counts the number of instances of a particular value within a string

In [6]:
my_list.count('cat')

NameError: name 'my_list' is not defined

In [227]:
[1,1,1,2,3,4,5].count(1)

3

- The **`.insert(index, x)`** operator inserts an element **x** into the list at the specified **index**. _(Elements to the right of this index are shifted over.)_

In [85]:
my_list = [1,2,3]
my_list.insert(2, "dog")
my_list

[1, 2, 'dog', 3]

There are are a variety of basic statistical functions that apply to lists, too:

- **`sum()`** adds up all the (numeric) elements of a list
- **`max()`** returns the maximum element of a list
- **`min()`** returnsthe minimum element of a list

In [228]:
print(max([1,2,3,4,5]))
print(min([1,2,3,4,5]))
print(sum([1,2,3,4,5]))

5
1
15


### Exercise
Write code that computes the **average** and the **median** value of a list of numbers below.

In [7]:
values = [248,3012,986,100,12]

#### Average:

In [11]:
[248,3012,986,100,12].average()

AttributeError: 'list' object has no attribute 'average'

#### Median:

In [10]:
values.median()

AttributeError: 'list' object has no attribute 'median'

## Sets
A **set** is a collection of values following these rules:
1. Sets **cannot** contain duplicate values.
2. Sets **can** contain multiple data types.
3. Sets **are not** ordered.
4. Sets are **immutable**, meaning that their values cannot change without re-declaring the variable.

Sets are declared using curly braces: **`{ }`** or the **`set()`** function.

Some of the functions and operators that apply to lists apply to sets, too. The **`len()`** function, for example, applies to both:

In [90]:
my_set = {1,2,3, "banana", "cat", "dog"}
len(my_set)

6

However, many of the functions and operators that work on lists no longer apply to sets. Because sets are **unordered** and **immutable**, we can't use functions that alter the set or index it:

In [91]:
my_set[1]

TypeError: 'set' object does not support indexing

In [92]:
my_set.sort()

AttributeError: 'set' object has no attribute 'sort'

In [93]:
set_A = {1,2,3}
set_B = {3,4,5}
set_A + set_B

TypeError: unsupported operand type(s) for +: 'set' and 'set'

However, there are a handful of useful functions that can _only_ be used on sets. For example:
- We can assess the **union** of two sets, which are all the unique elements in _either_ sets, using the **`.union()`** operator

In [94]:
set_A = {1,2,3}
set_B = {3,4,5}
set_A.union(set_B)

{1, 2, 3, 4, 5}

In [95]:
#The | character is shorthand for Union
{1,2,3} | {2,3,4}

{1, 2, 3, 4}

- We can assess the **intersection** of two sets, which are all of the unique sets in _both_ sets, using the **`.intersection()`** operator

In [96]:
set_A = {1,2,3}
set_B = {3,4,5}
set_A.intersection(set_B)

{3}

In [97]:
#The & character is shorthand for Intersection
{1,2,3} & {2,3,4}

{2, 3}

Other, similar operators include:
- **Subset**: Tests whether every element _s_ is in _t_
- **Superset**: Tests whether every element _t_ is in _s_
- **Difference**: Returns elements in _s_ but not in _t_
- **Symmetric Difference**: Returns elements in either set, but not both

In [98]:
set_A = {1,2,3,4,5}
set_B = {1,2,3}

In [99]:
#Subset:
set_A.issubset(set_B)

False

In [100]:
#Subset Shorthand:
set_A <= set_B

False

In [101]:
#Superset:
set_A.issuperset(set_B)

True

In [102]:
#Superset Shorthand:
set_A >= set_B

True

In [103]:
#Difference:
set_A.difference(set_B)

{4, 5}

In [104]:
#Difference Shorthand:
set_A - set_B

{4, 5}

In [105]:
#Symmetric Difference:
set_A.symmetric_difference(set_B)

{4, 5}

In [106]:
#Symmetric Difference Shorthand:
set_A ^ set_B

{4, 5}

More on set operators [here](https://docs.python.org/2/library/sets.html).

### Exercise
Write code that finds the [Jaccard Similarity](https://en.wikipedia.org/wiki/Jaccard_index) between the following two sets:

In [107]:
set_A = {1,2,3,4,"cat","dog","banana"}
set_B = {3,4,5,6,"banana","apple","orange"}

In [108]:
len(set_A & set_B) / len(set_A | set_B)

0.2727272727272727

## Tuples
A **tuple** (aka **sequence**) is a collection of values following these rules:
1. Tuples **can** contain duplicate values.
2. Tuples **can** contain multiple data types.
3. Tuples **are** ordered.
4. Tuples are **immutable**, meaning that their values cannot change without re-declaring the variable.

Tuples are declared using parentheses: **`( )`** 

In [109]:
my_tuple = (1,2,3,4)

Since they are **ordered** but **immutable**, some of the functions that apply to lists also apply to tuples, but others don't.

For example, we can index tuples...

In [110]:
my_tuple[1]

2

... but we can't append, sort, or reverse them ...

In [111]:
my_tuple.sort()

AttributeError: 'tuple' object has no attribute 'sort'

... or assess the union between them.

In [112]:
(1,2,3) | (3,4,5)

TypeError: unsupported operand type(s) for |: 'tuple' and 'tuple'

We can, however, add, subtract, and multiply tuples, the way we would with lists.

In [113]:
(1,2,3) + (3,4,5)

(1, 2, 3, 3, 4, 5)

In [114]:
(1,2,3) * 3

(1, 2, 3, 1, 2, 3, 1, 2, 3)

Before we go onto the final data type - dictionaries, which operate very differently from the other three - let's review the differences between lists, sets, and tuples.
- **[ Lists ]** are **mutable**, **ordered**, collections of **non-unique** values
- **{ Sets }** are **immutable**, **un-ordered**, collections of **unique** values
- **( Tuples )** are **immutable**, **ordered**, collections of **non-unique** values

## Dictionaries
The fourth data type, **dictionaries** (aka maps or hashes) are unique in that they involve two different sets of values: the **keys** and then the **values themselves**. Unlike lists or tuples, which are either unordered or indexed by a range of **numbers**, dictionaries are indexed by **keys**, which can be any immutable datatype, most commonly strings.

Here's an example:

In [115]:
my_dict = {"a": 1,
          "b": 2,
          "c": 3,
          "d": 4}
my_dict

{'a': 1, 'b': 2, 'c': 3, 'd': 4}

Instead of indexing with numbers, we index dictionaries with _values_, like this:

In [116]:
my_dict["a"]

1

In [117]:
my_dict["b"]

2

There are a few operators that are specific to dictionaries:
- **`.keys()`** returns a list containing all of the keys of a dictionary 

In [118]:
my_dict.keys()

dict_keys(['a', 'b', 'c', 'd'])

In [229]:
list(3,4,5)

TypeError: list() takes at most 1 argument (3 given)

- **`.values()`** returns a list of all the values in a dictionary

In [119]:
my_dict.values()

dict_values([1, 2, 3, 4])

- **`pop()`** removes a particular key and its value from the dictionary

In [120]:
my_dict.pop("a")
my_dict

{'b': 2, 'c': 3, 'd': 4}

Dictionaries are **mutable**, so we can also add new keys and values to a dictionary like so:

In [121]:
my_dict["e"] = 5
my_dict

{'b': 2, 'c': 3, 'd': 4, 'e': 5}

Generally speaking, the _keys_ of a dictionary describe a variable, and the _values_ contain the measurement of the variable itself. To put it another way, think of a dictionary's keys as the _fields_ or _columns_ of a data table and the dictionary's values as the data the occupy the _rows_ of the data table.

In [122]:
my_movie = {"Title": "Star Wars",
            "Box Office Gross": 307263857,
            "Genre" : "Sci-Fi",
            "Release Date" : "May 25 1977",
            "Director" : "George Lucas"}

Moreover, dictionaries are often _nested_ into other data structures, like lists. To extend the metaphor, each dictionary in a list will behave like a new _row_ in the datable. For  example:

In [123]:
my_movies = [
            {"Title": "Star Wars",
            "Box Office Gross": 307263857,
            "Genre" : "Sci-Fi",
            "Release Date" : "May 25 1977",
            "Director" : "George Lucas"},
            {"Title": "Jaws",
            "Box Office Gross": 260000000,
            "Genre" : "Thriller",
            "Release Date" : "June 20 1965",
            "Director" : "Steven Spielberg"},
            {"Title": "The Godfather",
            "Box Office Gross": 133698921,
            "Genre" : "Drama",
            "Release Date" : "March 15 1972",
            "Director" : "Francis Ford Coppola"}
            ]

Alternatively, we might store the data above using a _dictionary_ of dictionaries:

In [124]:
my_movies = {"Star Wars": {
                "Box Office Gross": 307263857,
                "Genre" : "Sci-Fi",
                "Release Date" : "May 25 1977",
                "Director" : "George Lucas"},
            "Jaws": {
                "Box Office Gross": 260000000,
                "Genre" : "Thriller",
                "Release Date" : "June 20 1965",
                "Director" : "Steven Spielberg"
                    },
            "The Godfather": {
                "Box Office Gross": 133698921,
                "Genre" : "Drama",
                "Release Date" : "March 15 1972",
                "Director" : "Francis Ford Coppola"
                    }
            }

This way, we can more intuitively index the data structure to find particular values:

In [125]:
my_movies["The Godfather"]["Director"]

'Francis Ford Coppola'

There are advantages and disadvantages to either data structure, and you should think carefully about both before structuring your data.

### Exercise
Create a dictionary that contains data on several EY employees and index it to find a particular employee's rank.

In [126]:
EY_employees = {
    "Lavinia Seow":{
        "Rank":"Manager",
        "Office":"FSO",
        "Location":"NYC"
    },
    "Carl Case":{
        "Rank":"Senior Manager",
        "Office":"FSO",
        "Location":"NYC"
    },
    "Suryaa Ramaswamy":{
        "Rank":"Senior Manager",
        "Office":"FSO",
        "Location":"NYC"
    }
}

In [127]:
EY_employees["Carl Case"]["Rank"]

'Senior Manager'

----
## Mathematical Operations
----
One of Python's most distinguishing features is its ability to support mathetical and statistical operations. The two popular add-ons to perform such numerical and scientific operations in Python are "Numpy" and "Scipy." We will go through the basics of Numpy below. Scipy, or Scientific Python, is commonly used for statistical calculations, providing algorithms like regression. Since it covers a wide gamut of functions, we will simply preface its use here for a more detailed application later on in the course.


## Numpy
Numpy, or Numerical Python, helps users manipute matrices and arrays, select random numbers, and perform basic numerical operations. 

Arrays are similar to lists in Python, except that every element of an array must be of the same type, typically a numeric type like **float** or **int**.

In [128]:
import numpy as np
firstarray = np.array([1, 2, 3, 4, 5], float)
firstarray

array([ 1.,  2.,  3.,  4.,  5.])

In [129]:
type(firstarray)

numpy.ndarray

The function array takes two arguments: the list to be converted into the array and the
type of each member of the list. Array elements are accessed, sliced, and manipulated just like
lists.

In [130]:
firstarray[:2]

array([ 1.,  2.])

In [131]:
firstarray[3]

4.0

In [132]:
firstarray[0] = 6

In [133]:
firstarray

array([ 6.,  2.,  3.,  4.,  5.])

You can also use numpy to set and manipulate matrices and use **in** to see if a value is present in the array/matrix.

In [134]:
twodarray = np.array([[1, 2, 3], [4, 5, 6]], float)
twodarray

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [135]:
5 in twodarray

True

We can also reshape arrays into matrices with **.reshape**.

In [136]:
secondarray = np.array(range(12), float)
secondarray

array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.])

In [137]:
secondarray = secondarray.reshape((6, 2))
secondarray

array([[  0.,   1.],
       [  2.,   3.],
       [  4.,   5.],
       [  6.,   7.],
       [  8.,   9.],
       [ 10.,  11.]])

Converting arrays back into lists is done with **.tolist**.

In [138]:
firstarray.tolist()

[6.0, 2.0, 3.0, 4.0, 5.0]

### Exercise
Create a 5x5 matrix of ints that are all zeros.

In [139]:
# Example Solution
matrix = np.array([[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]], int)
matrix

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

Another useful feature of numpy is the ability to create all zero arrays/matrices.

In [140]:
shortcut = np.zeros((5,5), int)
shortcut

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

List of Useful Numpy Features:
* Copy an array/matrix - **.copy**
* Concatenate multiple arrays - **.concatenate**
* Add, subtract, multiply, divide **within** and **across** arrays/matrices (must be valid operations)

Numpy is also used to generate **random numbers** with the random module.

Some Common **np.random** Arguments:
* **rand(d0, d1, ..., dn)**	- Random values in a given shape.
* **randn(d0, d1, ..., dn)** - Return a sample (or samples) from the “standard normal” distribution.
* **randint(low[, high, size, dtype])**	- Return random integers from low (inclusive) to high (exclusive).

In [141]:
from numpy import random
np.random.rand(3,2)

array([[ 0.53676762,  0.44640937],
       [ 0.07071364,  0.36662902],
       [ 0.85426471,  0.61085082]])

In [142]:
np.random.randint(3, 9)

4

In [12]:
for i in range(12):
    print (np.random.randint(0,10))

NameError: name 'np' is not defined

For more information on Numpy, visit: https://docs.scipy.org/doc/numpy-1.14.0/

----
## Control Flow Statements & Functions
----

So far, we've written only code that flows linearly and without interruption from left to right, top to bottom - code that carries out one and only one set of instructions. **Control flow statements**, however, allow us to repeat, withold, or alter instructions based on data that we encounter. 

## If/Then Statements
The most common and essential control flow statement is a simple **if/then** statement, which are exactly what they sound like. They work like this:

In [144]:
value_A = 4
value_B = 3
if value_A > value_B:
    print("Value A is greater!")

Value A is greater!


An **if/then** statement assesses a boolean variable or value and then runs the indented code beneath if and only if the boolean is true.

If the boolean isn't true, however, we can also add an **else** clause.

In [145]:
value_A = 3
value_B = 4
if value_A > value_B:
    print("Value A is greater!")
else:
    print("Value A is not greater.")

Value A is not greater.


Or, instead of using a catch-all else clause, we can introduce _another_ **elif** clause to assess a different boolean before defaulting to the **else** clause.

In [146]:
value_A = 4
value_B = 4
if value_A > value_B:
    print("Value A is greater!")
elif value_A == value_B:
    print("The two values are equal.")
else:
    print("Value A is not greater.")

The two values are equal.


Remember, Python creates boolean values in a variety of different ways using a variety of different data types, so there are many different expressions we can use in if/then statements.

In [147]:
my_grocery_list = ("apples", "bananas", "cranberries")

if "apples" in my_grocery_list:
    print("Don't forget apples!")
else:
    print("Don't buy any apples.")

Don't forget apples!


In [148]:
quarterly_revenue = 10000
quarterly_cost = 50000
if quarterly_revenue > quarterly_cost:
    print("We made money this quarter.")
else:
    print("We lost money this quarter.")

We lost money this quarter.


### Exercise
Write an if/then statement that, using your **EY Employees** variable from last exercise, tells us whether two employees work in the same office. 

In [151]:
if EY_employees["Carl Case"]["Office"] == EY_employees["Suryaa Ramaswamy"]["Office"]:
    print('Carl and Suryaa work in the same office!')
else:
    print('Carl and Suryaa don\'t work in the same office.')

Carl and Suryaa work in the same office!


## Loops
Another essentially important control flow statement is the **loop**, the most common form of which is the **for loop**. A **loop** instructs the program to carry out a set of instructions repeatedly. More specificially, a **for loop** instructs the program to carry out the instructions once _for_ each element in a data structure. 

In [152]:
my_numbers = [1,2,3,4,5]
for number in my_numbers:
    print(number * 5)

5
10
15
20
25


For loops can be used on any data _iterable_ data structure, such as the ones we've learned about so far:

In [153]:
for i in (1,2,3):
    print(i)

1
2
3


In [154]:
for i in {1,2,3}:
    print(i)

1
2
3


In [155]:
for i in [1,2,3]:
    print(i)

1
2
3


The **`range()`** function is very useful in constructing for loops. It works by taking in three arguments - the **start**, **stop**, and **increment** - and returning a **range object** with all of the integers between the start and stop, by the increment.

For example, if we wanted to print all of the numbers between 0 and 100, going by increments of 5:

In [156]:
for i in range(0,100,5):
    print(i)

0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95


The second type of loop is the **while loop**, which assesses whether or not a condition is true before executing the code. Before every itereation of a while loop, the program will stop and assess whether or a certain boolean value is true. If it is, then it will execute the code. If it isn't, then it will terminate the program. 

It works like this:

In [157]:
index = 0

while index < 10:
    print(str(index) + " is less than 10, so I'll keep going.")
    index += 1
    
print(str(index), "isn't less than 10 so I need to stop.")

0 is less than 10, so I'll keep going.
1 is less than 10, so I'll keep going.
2 is less than 10, so I'll keep going.
3 is less than 10, so I'll keep going.
4 is less than 10, so I'll keep going.
5 is less than 10, so I'll keep going.
6 is less than 10, so I'll keep going.
7 is less than 10, so I'll keep going.
8 is less than 10, so I'll keep going.
9 is less than 10, so I'll keep going.
10 isn't less than 10 so I need to stop.


### Exercise
Write code that finds all of the unique, unordered combinations of three digits (e.g 1-1-1, 1-1-2, 1-1-3, etc.). 

In [158]:
possibilities = []
nums = range(10)
for i in nums:
    for j in nums:
        for k in nums:
            addition = {i,j,k}
            if addition not in possibilities and len(addition) == 3:
                possibilities.append(addition)
                
possibilities

[{0, 1, 2},
 {0, 1, 3},
 {0, 1, 4},
 {0, 1, 5},
 {0, 1, 6},
 {0, 1, 7},
 {0, 1, 8},
 {0, 1, 9},
 {0, 2, 3},
 {0, 2, 4},
 {0, 2, 5},
 {0, 2, 6},
 {0, 2, 7},
 {0, 2, 8},
 {0, 2, 9},
 {0, 3, 4},
 {0, 3, 5},
 {0, 3, 6},
 {0, 3, 7},
 {0, 3, 8},
 {0, 3, 9},
 {0, 4, 5},
 {0, 4, 6},
 {0, 4, 7},
 {0, 4, 8},
 {0, 4, 9},
 {0, 5, 6},
 {0, 5, 7},
 {0, 5, 8},
 {0, 5, 9},
 {0, 6, 7},
 {0, 6, 8},
 {0, 6, 9},
 {0, 7, 8},
 {0, 7, 9},
 {0, 8, 9},
 {1, 2, 3},
 {1, 2, 4},
 {1, 2, 5},
 {1, 2, 6},
 {1, 2, 7},
 {1, 2, 8},
 {1, 2, 9},
 {1, 3, 4},
 {1, 3, 5},
 {1, 3, 6},
 {1, 3, 7},
 {1, 3, 8},
 {1, 3, 9},
 {1, 4, 5},
 {1, 4, 6},
 {1, 4, 7},
 {1, 4, 8},
 {1, 4, 9},
 {1, 5, 6},
 {1, 5, 7},
 {1, 5, 8},
 {1, 5, 9},
 {1, 6, 7},
 {1, 6, 8},
 {1, 6, 9},
 {1, 7, 8},
 {1, 7, 9},
 {1, 8, 9},
 {2, 3, 4},
 {2, 3, 5},
 {2, 3, 6},
 {2, 3, 7},
 {2, 3, 8},
 {2, 3, 9},
 {2, 4, 5},
 {2, 4, 6},
 {2, 4, 7},
 {2, 4, 8},
 {2, 4, 9},
 {2, 5, 6},
 {2, 5, 7},
 {2, 5, 8},
 {2, 5, 9},
 {2, 6, 7},
 {2, 6, 8},
 {2, 6, 9},
 {2, 7, 8},
 {2,

## Functions
The final concept that this review will cover is the **function**. Functions are a hugely important programming concept, as they allow us to save entire chunks of code and re-use them based on predefined inputs.

In fact, we've already encountered a number of built-in Python functions, like `print()`, `min()`, `max()`, and `len()`. These functions come pre-loaded with Python and are unchangeable, but we can also write our own **user-defined functions** that will execute a pre-written block of code based on inputs.

For example, if we wanted to write a function that returned the **square** of two numbers, we'd do it like this:

In [159]:
def get_square(x):
    square = x * x
    return square

We can now use the function on whatever inputs we want:

In [160]:
get_square(5)

25

In [161]:
get_square(12)

144

In defining functions, we use the **`def`** keyword to name the function; then we specify which arguments the function will take in; and finally we tell the function what to `return`.

In the case above, the function takes only one argument - `x`, or the number we want to square. It then performs a simple calculation, multiplying the input by itself, and returns the result.

Very commonly, functions will internally employ employ control flow statements. For example, if we wanted to compute the [factorial](https://en.wikipedia.org/wiki/Factorial) of a number, we might need a for loop within our function:

In [162]:
def get_factorial(x):
    factorial = 1
    #Remember to exclude zero from the range!
    for i in range (1,x):
        factorial = factorial * i
    return factorial

In [163]:
get_factorial(10)

362880

If/else statements are also extremely common in functions. Try using them in the following exercise.

### Exercise
Write a function `divisible_by` that returns a boolean to determine whether one number, x, is divisible by another, y. The function should take x and y as inputs.

In [164]:
def is_divisible(x,y):
    if x % y == 0:
        return True
    else:
        return False

In [165]:
is_divisible(10,5)

True

In [166]:
is_divisible(13,3)

False

_Note: Actually, you could write this function without an if/else statement. Give it a try:_

In [167]:
def is_divisible(x,y):
    return x % y == 0

In [168]:
is_divisible(10,5)

True

In [169]:
is_divisible(13,3)

False

This concludes our review of the basics in Python. Even experienced programmers forget the nuanced differences between the datatypes and the various built-in functions and operators, so we encourage you to consult this workbook and the cheatsheets below whenever you're confused. 

But again, if these concepts seemed unfamiliar, or if you're not quite sure about how programming languages work or interpret code, it's a good idea to brush up on your programming fundamentals, using the resources below.

### Other Useful Resources
- [Simon Allardice - Programming Fundamentals](https://www.lynda.com/Programming-Foundations-tutorials/Foundations-Programming-Fundamentals/83603-2.html)
- [Python Basics Cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/PythonForDataScience.pdf)
- [Python for Data Science Cheatsheet](http://datacamp-community.s3.amazonaws.com/50d31142-3de0-4159-89b9-18b718a728ef)