# 03. Making the most of a Jupyter Notebook

### Objectives

+ Know where Jupyter Notebooks excel and where they fail
+ Know the most common keyboard shortcuts
+ Use the five step process when doing a data exploration
+ Limit the amount of code you write in a single cell
+ Continuously verify that each line of code is working as you would expect
+ Use **alt + enter** to execute cells at the end of the notebook

## Minimal sufficiency at Jupyter Notebooks
To get the most out of this course, you need to have a minimum level of sufficiency with the Jupyter Notebooks programming environment. All programming environments have tools and commands that will provide for a much better and efficient experience. Even something as simple as your email client has many tools available to you to ease the process of composing an email. 

## No programming environment is best for all situations for every single person
One of the first things you need to be aware of is that the Jupyter Notebook has limitations just like any tool. It is not the best tool for every situation. There are certain situations that Jupyter Notebooks excel at and there are others where it fails. Depending on what you are doing, Jupyter may or may not be the appropriate tool.

## Where Jupyter Notebooks Excel
Jupyter Notebooks are excellent for constructing data explorations that incorporate code, visualizations, text, images, videos, and math text. Essentially, they are great at building a report that someone can read and get information from. They are great for blog posts, teaching, formal reports, and interview assignments.

These notebooks are pretty good at prototyping solutions or writing short snippets of code. There are also many projects and additional tools that have been built to support the notebooks.

### Iterative and interactive workflow
One of the best features of the notebook is that you can run one line of code, get output, and then use this output to run the next line of code. For many scientists, it will be absolutely necessary to process their thinking one line at a time. Writing multiple lines of code in a row just doesn't make sense because there needs to be some analysis or decision on the output of the previous line before continuing.

This type of workflow is **iterative**, because we are constantly running code, looking at output, modifying it and then repeating this process. It is **interactive** because we are getting results immediately upon running the code.

### Google Colaboratory
One of the most popular supporters of Jupter Notebooks is Google. They have built a product called Google Colaboratory where you can make Juptyer Notebooks in the cloud all for free. This makes them easy to collaborate on with others. You can actually open up any Jupyter Notebook stored in a public GitHub repository on Google Colaboratory.

### Opening up a public GitHub Jupyter Notebook on Google Colaboratory
Navigate to the [Google Colaboratory][1] home page. You should see an orange pop-up window. Click on the **GitHub** tab and type in **dunerdata** in the top line and press enter. You should see a list of all the public notebooks for the DunderData organization. Open up one of the notebooks now.

### Using Jupyter as a light-weight Interactive Dashboard
There are a number of [interactive widgets][2] that you can embed into the Notebook to give a user control. There are also interactive visualization libraries such as [bokeh][3] and [plotly][4] that both provide powerful tools for dashboards.
 
## Where Jupyter Notebooks Fail
You might not like notebooks. Don't fear, you will not be alone. There is no mandatory requirement to use Jupyter Notebooks to do data science. Traditional software development, such as building an application, would not be a time where Jupyter Notebooks would be a good choice. Other programming environments, such as PyCharm or Visual Studio Code, provide much better tools to organize, test, write, refactor, debug and version control code.

### Notebooks suffer from messiness
As you have probably already noticed, it is incredibly easy to make a complete mess of your Jupyter Notebook. They can become very difficult to follow especially if you do not document what is happening. There will be lots and lots of scratch work and cells that are broken.

### Out of order code execution
One built-in feature/problem with notebooks is that you can run any cell at any place at any time. In a normal computer program, control flows from one line to the next. With notebooks, you control which cell gets executed, so it is very easy to execute cells in a different order than their natural ordering. Furthermore, you can easily lose track of variable values.

## Know where Jupyter Notebook development ends and traditional software development begins
I only use Jupyter Notebooks whenever I am exploring data or need to write a formal report to someone. I typically use two notebooks, one for scratch work and another for my formal report. I ensure the formal notebook has the same behavior as a normal program by executing all the cells in order from the top. You can do this by using the option **Restart & Run All** from the **Kernel** menu or with the **Run All** option in the **Cell menu**.

# Must practice the commands
In this notebook we will practice some commands that will make your life programming in this environment much smoother.

## The three most common keyboard shortcuts A (above), B (below), DD (delete)
There are three keyboard shortcuts that you will use often and save you from reaching for your mouse. These are only possible while in **command mode**. Tap (don't hold) **ESC** to enter command mode. You will know you are in command mode when the outline of the cell is blue and there is no blinking cursor inside the cell.

* Press **A** to add a cell above
* Press **B** to add a cell below
* Press **DD** to delete a cell

## Press Enter - the simplest keyboard shortcut
While in command mode, simply press **enter**. This will switch you into edit mode in the current cell. The outer highlight will now turn **green**.

## Press h to see all the keyboard shortcuts
While in command mode, press **h** to pop up the menu of keyboard shortcuts. They are categorized by what mode they are used in - command or edit.
    
## Practice these shortcuts now 
Add and delete cells and enter them
    
[1]: https://colab.research.google.com/
[2]: http://jupyter.org/widgets
[3]: https://bokeh.pydata.org/en/latest/
[4]: https://plot.ly/products/dash/

In [2]:
import pandas as pd

In [5]:
bikes = pd.read_csv('../data/bikes.csv')
bikes.head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy
4,13168,Subscriber,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0,partlycloudy


In [7]:
bikes = bikes.set_index

AttributeError: 'function' object has no attribute 'set_index'

# The five step process when doing data exploration
To help increase your understanding of data and lessen your frustration fighting with Jupyter Notebooks, I recommend the following five step process:
1. Write and execute a single line of code to explore your data. Usually you are doing something to a DataFrame or a Series
1. Verify that this line of code works by inspecting the output
1. Assign the result to a variable
1. Within the same cell, in a second line output the head of the DataFrame or Series
1. Continue to the next cell. Do not add more lines of code to the cell

A major pain point for beginners is writing too many lines of code in a single cell. When you are learning, you need to get feedback on every single line of code that you write and verify that it is in fact correct. Only once you have verified the result should you move on to the next line of code.

# An example of this in action
We will do three separate tasks below:
* Read in the data
* Set the index
* Select the gender and tripduration columns

In [8]:
import pandas as pd 

In [10]:
bikes = pd.read_csv('../data/bikes.csv')

In [11]:
bikes.head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy
4,13168,Subscriber,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0,partlycloudy


In [13]:
bikes.set_index('trip_id')

Unnamed: 0_level_0,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
trip_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.881050,-87.616970,11.0,Michigan Ave & Oak St,41.900960,-87.623777,15.0,73.9,10.0,12.7,-9999.00,mostlycloudy
7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.883380,-87.641170,31.0,Wells St & Walton St,41.899930,-87.634430,19.0,69.1,10.0,6.9,-9999.00,partlycloudy
10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.881320,-87.629521,23.0,73.0,10.0,16.1,-9999.00,mostlycloudy
12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.631890,31.0,72.0,10.0,16.1,-9999.00,mostlycloudy
13168,Subscriber,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.00,partlycloudy
13595,Subscriber,Male,2013-07-01 12:37:00,2013-07-01 12:48:00,660,California Ave & 21st St,41.854016,-87.695445,15.0,Clark St & Wrightwood Ave,41.929546,-87.643118,15.0,73.0,10.0,17.3,-9999.00,mostlycloudy
18880,Subscriber,Male,2013-07-02 17:47:00,2013-07-02 17:56:00,565,Clark St & Randolph St,41.884576,-87.631890,31.0,Ravenswood Ave & Irving Park Rd,41.954690,-87.673930,19.0,66.0,10.0,15.0,-9999.00,cloudy
19689,Subscriber,Male,2013-07-03 09:07:00,2013-07-03 09:16:00,505,State St & Van Buren St,41.877181,-87.627844,27.0,Franklin St & Jackson Blvd,41.877708,-87.635321,27.0,64.0,7.0,5.8,-9999.00,cloudy
21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.883380,-87.641170,31.0,Wood St & Division St,41.903320,-87.672730,15.0,71.1,8.0,0.0,-9999.00,cloudy
23558,Subscriber,Female,2013-07-04 15:00:00,2013-07-04 15:16:00,922,Lakeview Ave & Fullerton Pkwy,41.925858,-87.638973,19.0,Racine Ave & Congress Pkwy,41.874640,-87.657030,19.0,81.0,10.0,12.7,-9999.00,mostlycloudy


In [14]:
bikes.head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy
4,13168,Subscriber,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0,partlycloudy


# Details
When coding this live, you would not assign the results to a variable immediately. For instance, in the second step, you might first write `bikes.set_index('trip_id')`. This would produce some output which you would verify. Once verified, then the assignment statement would be created. And finally the head of DataFrame would be outputted.

# When to assign the result to a variable
Not all operations on our data will need to be assigned to a variable. We might just be interested in seeing the results. But, for many operations, you will want to continue with the new transformed data. By assigning the result to a variable, you have immediate access to the previous result.

# When to create a new variable name
In the second cell, `bikes` was reassigned to itself. We did this because we no longer needed the original DataFrame. In the third cell, we created an entire new variable. This was done because we wanted to keep the `bikes` DataFrame. Creating new variables also makes it easier to trace the flow of work. Debugging is easier as well, since we will have preserved the result of the cell in its own variable (assuming we did not overwrite it in a later cell).

# Continuously verifying results
Regardless of how adept you become at doing data explorations, it is good practice to verify each line of code. Data science is difficult and it is easy to make mistakes. Data is also messy and it is good to be skeptical while proceeding through an analysis. Getting a visual verification that each line of code is producing the desired result is important. Doing this also provides feedback to help you think about what avenues to explore next.

# Execute cells with alt + enter to insert a new cell when doing exercises
The exercises at the bottom of each notebook only have a single cell between them. This is usually not enough room to complete it. Normally we we execute cells with **shift + enter**. This will move you to the next cell, even if it has content in it and put you in command mode as well. Since the next cell will be another exercise problem this won't be the position you desire.

Instead, execute the cell by pressing **alt + enter** or **option + enter**. This will execute the cell and insert a new cell after it. It will also keep you in edit mode. This is exactly what you want when doing the exercises.

Alternatively, you can press **shit + enter** and then press **A** to insert a new cell above and finally press enter to go into edit mode. Either way will work.

## Practice alt + enter below

### Problem 1
<span  style="color:green; font-size:16px">Assign some number to `a` in one cell. In the next 5 cells, add some number to it and assign the result to a new variable. Make sure to output the value of the new variable.</span>

In [None]:
# your code here

### Problem 2
<span  style="color:green; font-size:16px">Some problem here that is in the way!</span>