# Tutorial 1.2: IPython & Jupyter Notebooks
Python for Data Analytics
Module 1

In order to write code you are going to need what's called a "development environment". Practically speaking, a development environment is just where a human being (that's you!) writes and executes his/her code.

What differentiates various development environments is the number of features that they have. Some people prefer to use nothing more than a text editor, while others prefer to use fairly complex programs that offer all sorts of tools to speed up certain aspects of the program development workflow.

In this course we are going to use probably the most popular development environment for data science, [**Jupyter Notebooks**](http://jupyter.org).  The Jupyter project is an amazing piece of software that allows you to write and execute code inside of a web browser. 

You can combine your code with HTML, Javascript, YouTube videos, and pretty much anything else that works in a web browser. Once you are finished, sharing your work is as simple as sharing a link to your notebook.

The Jupyter Notebook project grew out of another project called [IPython](http://ipython.org/). IPython, short for *Interactive Python*, is an "enhanced" interpreter that provides a number of useful additions to the language to speed up development work.

<div class="alert alert-block alert-info">
An interpreter is a program that runs Python commands. Python's standard interpreter can be invoked from the command line with `python`. Experiment with both the `ipython` interpreter and the standard interpreter to see the differences.
</div>

IPython is embedded inside of Jupyter notebooks and is responsible for actually running your code. For most practical purposes, you can think of IPython and Jupyter notebooks as synonymous terms at this stage of your learning.

## Jupyter Notebook Server(s)
In order to create Jupyter notebooks, you must have a program that will make them accessible to your web browser. Such a program is often called a server/daemon.

This program could run on your local machine, on a machine at your company, or by a 3rd party provider like Amazon or Google.

There are *many* options on this front, with new ones appearing almost monthly.

In our particular case, we will be using Vocareum's machines to create & execute our notebooks.

## Handy Features of IPython inside Jupyter
In this section, we will quickly cover a few of the many features of IPython that are available to you inside of your Jupyter notebooks.

### Notebooks are Comprised of "Cells"
Jupyer notebooks are built by combining "cells" together. *A cell is nothing more than a container for different types of content. *

For our purposes there are two types of cells: 
* Text/HTML
* Code

Currently, you are reading a Text/HTML cell. 

You can always tell what cell is currently selected because it will be given a slight box shadow effect in the browser.

In [0]:
# The second type of cell contains code that you can execute.
# This is an example of that type of cell. 

You can always tell the difference between a code cell and a text/HTML cell quickly because code cells will have a `In [some number]` prefix on their left hand side which shows the order in which the code cells have been executed. 

Just take a look at the cell above this one to see an example of this.

### You can Execute Cells in any Order
To execute a given code cell, you press `shift+enter` while on it is the selected. Once you do a number will be inserted inside of the `In [ ]:` cell prefix showing the order in which that cell was executed.

**There is no requirement that you execute cells in the order they appear.** If you wanted to, you could execute the cells at the bottom of the page and then come back and execute the cells at the top of the page. The only thing to keep in mind is that if the later cells require the output of the earlier cells, you'll generate errors.

**Likewise, you can execute a given cell multiple times.** Almost always, you will find programming in a Jupyter notebook to be an iterative process, where you will make multiple small changes to your cells. When you do, you'll find yourself running the same cell over and over again. 

### The `?` operator
Whenever you want to access some basic information for a given Python object in `IPython/Jupyter`, you can simply add a `?` before or after its name. In the following two cells, I will create a simple variable and then use the `?` operator to demonstrate it's basic usage.

In [1]:
# This creates what is called a list. It holds three strings.
a_simple_list = ['Hi', 'Budding', 'Pythonista!']

In [2]:
# Now run this cell to see what the `?` does.
# It will pop up a little dialogue box at the bottom of the 
# screen with some information about the object.
a_simple_list?

While we just used the `?` operator on a variable, we can also use it on functions/methods. Let's say that we wanted to see what the `str.partition` method does. We'd just type `str.partition?` into IPython and would get the following output.

```
In [1]: str.partition?
Docstring:
S.partition(sep) -> (head, sep, tail)

Search for the separator sep in S, and return the part before it,
the separator itself, and the part after it. If the separator is not
found, return S and two empty strings.
Type:      method_descriptor
```

You can see here that you get a description both of the *method signature* (don't worry about not knowing what this is right now) and a description of what the method does.

You might be asking yourself: where does this helpful information come from?
The answer is that Python has something called *docstrings* that programmers 
assign to the objects that they write. It is these docstrings that are returned
when you use the `?` operator.

One note of importance here is that if a programmer does not write docstrings
for their objects, IPython will have little information to provide to a user.
So, remember to always write docstrings in your objects!

### The `??` operator
After you start to gain some experience with Python, you might want to get more details about how something works.  The `??` operater works in a similar fashion to the `?` operator, but actually displays the source code of the object you are interested in so you can see exactly how it works.

<div class="alert alert-block alert-info">
"Source code" just means the code that makes up the object you are inspecting.
</div>

This is *very* helpful when trying to learn the details of something you are not familiar with or you are interacting with code that the original author didn't add docstrings to (and this, unfortunately, is pretty common).

As an example, let's say that a certain lazy Python instructor created a function call `big_or_small` with no docstring.  When you use the `?` operator you only get the following: 

```
In [28]: big_or_small?
Signature: big_or_small(number)
Docstring: <no docstring>
File:      ~/workspace/class-materials/<ipython-input-27-f74533bc17d5>
Type:      function
```

**The big `<no_docstring>` isn't very helpful is it?** Well, you've got `??` as your backup plan. Let's try it now.

```python
In [29]: big_or_small??
Signature: big_or_small(number)
Source:   
def big_or_small(number):
    if number > 7:
        print("big")
    else:
        print("small")
File:      ~/workspace/class-materials/<ipython-input-27-f74533bc17d5>
Type:      function
```

**Bingo.** Now you can see exactly how this function works. This example is, of course, a little contrived, but this technique quite valuable in the real world.

### Tab Completion
On of the most valuable features available to you inside of Jupyter notebooks is *tab completion*. Essentially, when you don't remember something fully, you can use the `tab` key on your keyboard, and IPython will try to provide you with some helpful options.

This works in a variety of contexts. The next couple cells will demonstrate some examples.

In [5]:
# Scenario 1: You have an object and you want to see what 
# attributes/methods are available.

# For instance, what methods are available on a `str` object?
# Just hit tab after moving your cursor just after the period below
str.

SyntaxError: ignored

In [0]:
# Scenario 2: You know the first part of the name of the attribute/method
# you are looking for, but still need some help.

# Hit tab after the end of the code below and it will show 
# all attributes/methods on the `str` object that begin with "is"
str.is

In [0]:
# Scenario 3: You would like to know what keyword arguments are
# available for a given method.

# Hit tab inside the parenthesis '()' of a function/method 
# invocation to see what keyword arguments
# if any, are available for that function/method.
print()

As we continue in our course, I will demonstrate additional contexts in which you can use tab completion.

## IPython Magic Commands
Let's discuss some of the additional enhancements that IPython adds on top of the normal Python syntax known as *magic commands*.

Magic commands come in two flavors: 
* *Line magics*, which are denoted by a single `%` prefix and operate on a single line of input
* *Cell magics*, which are denoted by a double `%%` prefix and operate on multiple lines of input. When using this type of command, the "magic statement" will usually appear as the first thing in the cell.

Let's demonstrate and discuss a few brief examples here.

### Running External Code: ``%run``
It won't take long for you want to save the scripts that you are writing so that you don't have to type things over and over again. Once you do this, you'll need a way to run them inside your Jupyer notebook. This is where the ``%run`` magic command comes in.

Let's imagine you've created a ``myscript.py`` Python file with the following contents:

```python
#-------------------------------------
# file: myscript.py

def square(num):
    """square a number"""
    return num ** 2

for number in range(1, 4):
    print(number, "squared is", square(number))
```

<div class="alert alert-block alert-info">
<h5>For Python Newbies: What does myscript.py do?</h5>
This example script creates a function and executes a `for` loop.

<p>In Python, functions (and something similiar called methods) are declared with the `def` keyword.
In this case the `square` function is being defined and you can see it takes a single parameter
called `num`.</p>

<p>The line that says `for number in range(1, 4)` means for each number in the range from 1 to 4
and then the next line says what to do with that number. In this case, the 'what to do' is to print the original value and its value after it has been squared.</p>

<p>A couple of important notes:</p>
<ul>
<li>The name given to the `square` function parameter, `num`, is just a placeholder. You don't have to pass a variable called `num` into the function when you call it. Whatever you pass in will become `num` as far as the function is concerned. That is why were are able to pass in our variable `number` to the `square` function on the last line of the script and it still works.</li>
<li>Close observers may have noticed that there are only three lines of output for our script, where you might have expected four because of the `range(1, 4)` statement. The `range` function is a little bit surprising in its behavior to new programmers in that the first parameter given (1 in our case) is included in our results, but the last parameter is not (4 in our case). It only delivers integer numbers up to - but not including - the second parameter.</li>
</ul>
</div>

You can execute this from your IPython session as follows:

```ipython
In [6]: %run myscript.py
1 squared is 1
2 squared is 4
3 squared is 9
```

Note also that after you've run this script, any functions defined within it are available for use in your IPython session:

```ipython
In [7]: square(5)
Out[7]: 25
```

There are several options to fine-tune how your code is run; you can see the documentation in the normal way, by typing **``%run?``** in the IPython interpreter.

### Timing Code Execution: `%timeit` & `%%timeit`
In my opinion, this is the most valuable magic command. 

When dealing with large amounts of data, you are going to want to learn how to make your code run fast. To be able to make it faster, you have to be able to see how long the various parts of your code take to execute.

IPython makes this extremely easy to do with the `%timeit` magic command.
Check out how easy it is to time the execution of a one line Python statement (called a *list comprehension* if you are curious):

In [4]:
%timeit example_list = [n ** 2 for n in range(1000)]

1000 loops, best of 3: 310 µs per loop


You'll notice that one of the benefits of ``%timeit`` is that for short commands it will automatically perform multiple runs in order to attain more robust results.

For multi line statements, adding a second ``%`` sign before `timeit` will turn this into a cell magic that can handle multiple lines of input.
For example, in the following code cell, we will create a identical list of numbers using a different approach and then compare our timing results to see which approach is faster.

In [3]:
# After you execute this, you'll be able to see that 
# this approach is slower than the first approach.
%%timeit
example_list_2 = []
for n in range(1000):
  example_list_2.append(n ** 2)

1000 loops, best of 3: 374 µs per loop


### <a name="magicHelp"></a> Help on Magic Functions: ``?``, ``%magic``, and ``%lsmagic``

IPython magic functions have docstrings, just like any well written object in Python.
You can, therefore, access this documentation in the standard manner:

In [0]:
# Obtain help documentation on the %timeit magic command
%timeit?

To access a general description of available magic functions, including some examples, you can type this:

In [0]:
%magic

For a quick and simple list of all available magic functions, type this:

In [9]:
# `ls` is a linux command which used to "list directory contents". 
# It is used all the time when working from a command line and the 
# IPython developers therefore adopt it here knowing that most users
# will immediately understand its meaning.
%lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %shell  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%bigquery  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  