# Lesson 2: Experimental design and pandas
## Starter code for guided practice & demos

## Anyone not installed pre-requisites?
- Go to tmpnb.org.
- Select New > Python 2 to create a Python notebook.
```
>>> import pandas
>>> import sklearn
>>> import matplotlib
```

## Intro to Anaconda, IPython and notebooks

### What is Anaconda?

Anaconda is "Completely free enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing." What this really means:

* It's a prepackaged version of Python
* It includes (mostly) everything that a data scientist would need from Python
* It's free!

### What is IPython and Notebook?
Anaconda comes with two "versions" (repls) of python: ipython qt (cutie) console, and Jupyter (which has replaced Notebook). We'll primarily learn and use Notebook today, which is a great tool for data exploration and analysis, and keeping your thoughts organized. 

#### Starting Notebook

Find Anaconda's Launcher app, and open it. It'll open something like the following:

<img src='https://s3.amazonaws.com/f.cl.ly/items/2Y051r3v3a3Y2l0J3u0c/Screen%20Shot%202015-04-12%20at%202.11.32%20AM.png' style='width: 100%' />

Click on the ipython-notebook "launch button," which will open a shell window that will then start the notebook. Please leave this window open until you are done with the notebook!

<img src='https://s3.amazonaws.com/f.cl.ly/items/0z0e0L0R1m1M191j013M/Screen%20Shot%202015-04-12%20at%202.13.57%20AM.png' style='width: 100%' />

Notebook runs as a _server_ on your computer, so it will open a window in your internet browser. Under the "new" button, click "Python 2" and you're good to go!

**Note, if you installed Python 3, that should be okay, but be aware that some of today's material may not work.**

Finally, you'll come to your notebook for today:

<img src='https://s3.amazonaws.com/f.cl.ly/items/0u1b3D24281K1R0V3j1k/Screen%20Shot%202015-04-12%20at%202.16.45%20AM.png' style='width: 100%' />

We'll be using the following practices to take advantage of the various parts of notebook:

* the "code" cell allows us to run python code. It'll allow us to write multiple lines of code at a time.
* the "markdown" cell allows us to save text, add images, etc. 

With these two cell types we'll take notes for the workshop today, using the following steps:

1. Always include a markdown cell above code. Write notes how you'd usually write notes related to code below.
2. With code, for those completely fresh to programming, we'll be "commenting through the code." That just means for each line of code there will be a python comment that breaks down each line.

#### Check to see if you're ready to go!
    1. Run each block of code
    2. Check for errors
    3. When you think you're error free, flag down a teaching team member to confirm

In [None]:
####This is what an error looks like
print a

## Objectives
1. How to start Jupyter Notebook
2. Get comfortable with Jupyter Notebook
3. How to read data into pandas
4. How to do simple manipulations on pandas dataframes


## Start a notebook
For each class, we'll be using a set of common data science libraries and tools, like the Jupyter Notebook. You can start a Jupyter notebook from Anaconda Navigator, or by running

```
jupyter notebook
```

## Try it yourself!
Read and run the block of code below by: 
1. Clicking on it and pressing the play button above or
2. Using a short cut (help --> keyboard shortcuts)

In [None]:
#import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

## Basic Data Types: Strings, Lists, Tuples
Try the same thing on your own to learn about the following python objects: strings, lists, and tuples.

In [None]:
some_string1 = 'apples'
some_string2 = 'and'
some_string3 = 'bananas'
print some_string1, some_string2, some_string3
print some_string1 + some_string2 + some_string3
print some_string1[0:5]

In [None]:
a = [3 for i in range(10)]
print a

## Dictionaries

In [None]:
some_dict = {
    'title': 'Data Science',
    'start_time': 18,
    'instructor': {'name': 'John', 'location': 'London'},
    'coordinates': [51.52018, -0.07078]
}
some_dict

In [None]:
some_dict['coordinates'][0]

In [None]:
some_dict['campus'] = 'Shoreditch'

In [None]:
some_dict

## If-else

In [None]:
x = False
y = True

if x:
    print 'apple'
elif y:
    print 'banana'
else:
    print 'sandwich'

## For loops

In [None]:
for k in range(4):
    print k**2

## While loops

In [None]:
x = 0
while True:
    print "Hello!"
    x += 1
    if x >= 3:
        break

## Functions

In [None]:
def add(x, y):
    """Returns x + y"""
    return x + y


In [None]:
add(1, 3)

In [None]:
?add

## Review Python Basics
Test your skills by answering the following questions:

#### Question 1.  Divide 10 by 20 and set the result to a variable named "A"

In [None]:
### Insert your code here and then uncomment | print A | when you are ready to test it. 

#print A

In [None]:
#### If you did not get a float (decimals), alter your equation to get the desired result (0.5) 


#### Question 2. Create a function called division that will divide any two numbers and prints the result (with decimals). 
Call your function. Confirm that the results are as expected.

In [None]:
#add your function here

#### Question 3. Using .split() split my string into separate words in a variable named words

In [None]:
my_string = "the cow jumped over the moon"
#put your code here it should return ['the', 'cow', 'jumped', 'over', 'the', 'moon']


#print words

#### Question 4. How many words are in my_string?

#### Question 5. Use a list comprehension to find the length of each word

result: [3, 3, 6, 4, 3, 4]

#### Question 6. Put the words back together in a variable called sentence using .join()
result:
the cow jumped over the moon

#### Bonus question: Add a "||" between each word
result: 
the||cow||jumped||over||the||moon

## Demo: 311 Service Requests
Check first that you have the CSV in the same folder as the .ipynb notebook.

In [None]:
# Read the data
data = pd.read_csv('311-service-requests.csv', parse_dates=['Created Date'], low_memory=False)

In [None]:
# Take a look
data.head()

In [None]:
# Let's view all cols
pd.set_option('display.max_columns', 100)
data.head()

In [None]:
# Visualise data geographically using lon/lat
pd.set_option('display.width', 4000)
data.plot(x='Longitude',
          y='Latitude',
          kind='scatter',
          s=1)

## Graph the number of noise complaints each hour in New York

In [None]:
complaints = data[['Created Date', 'Complaint Type']]
complaints.head()

In [None]:
noise_complaints = complaints[complaints['Complaint Type'] == 'Noise - Street/Sidewalk']
noise_complaints.head()

In [None]:
# This is a bit fancy
noise_complaints.set_index('Created Date').sort_index().resample('H').count().plot()