# Introduction to Jupyter notebooks and Pandas

## 1. Opening and Navigating the IPython Notebook

We will start today with the interactive environment that we will be using in the tutorial: the [Jupyter Notebook](http://jupyter.org).

We will walk through the following steps together:

1. (If you don't yet have anaconda) Download [miniconda](https://conda.io/miniconda.html) (be sure to get Version 3.7) and install it on your system (hopefully you have done this before coming to class)

2. Use the ``conda`` command-line tool to update your package listing and install the IPython notebook:

    Update ``conda``'s listing of packages for your system:
    >```
    conda update conda
    conda install jupyter notebook
    ```
   
3. Use git to clone our repo: 

    >```
    git clone https://github.com/scottyhq/dssg2018-geopandasSQL-tutorial
    ```
   
4. Navigate to the directory containing the tutorial material

    >```
    cd ~/dssg2018-geopandasSQL-tutorial/
    git checkout 2019-update
    ```

    You should see a number of files in the directory (0-Jupyter-notebooks.ipynb, 1-intro-geopandas.ipynb...):
    >```
    ls
    ```

5. Type ``jupyter notebook`` in the terminal to start the notebook

    >```
    jupyter lab
    ```

    If everything has worked correctly, it should automatically launch your default browser

   
6. Click on ``0-Jupyter-numpy.ipynb`` to open the notebook containing the content for this lecture.

With that, you're set up to use the Jupyter notebook on your own computer!

If it didn't work, go ahead and open an interactive notebook on the [mybinder.org service](https://gke.mybinder.org/):
https://mybinder.org/v2/gh/scottyhq/dssg2018-geopandasSQL-tutorial/2019-update?urlpath=lab

## About Jupyter

- Combines the features of an IPython terminal with a fancy web interface
- Every notebook has one python instance running with it
- Also supports markdown and html embedding
- Easy to insert $\LaTeX$ formulas
- Be ware of VIM-style keyboard shortcuts

## Python refresher
- Components with the same capabilities are of the same *type*. 
  - For example, the numbers 2 and 200 are both integers.
  - This is called "duck typing" (if it looks like a duck, quacks like a duck...)
  
- A type is defined recursively. Some examples.
  - A list is a collection of objects that can be indexed by position.
  - A list of integers contains an integer at each position.
  
- A type has a set of supported operations. For example:
  - Integers can be added
  - Strings can be concatented
  - A table can find the name of its columns
    
- In python, members (components and operations) are indicated by a '.'
  - If `a` is a list, the `a.append(1)` adds `1` to the list.

In [4]:
a = 1

In [6]:
# Python lists can store data of different types at the same time
a_list = [1, 'a', [1,2]]

In [7]:
# What if we forgot what functions a list has?
a_list.pop?

In [8]:
a_list.append(2)

In [10]:
1

1

In [14]:
a_list

[1, 'a', [1, 2], 2]

In [17]:
1

1

In [15]:
# What are the full set of functions available for our list
a_list.

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [None]:
# Count how many times the integer '1' occurs
a_list.count(1)

## The IPython kernel/client model


In [18]:
%connect_info

{
  "shell_port": 57896,
  "iopub_port": 57897,
  "stdin_port": 57898,
  "control_port": 57899,
  "hb_port": 57900,
  "ip": "127.0.0.1",
  "key": "d35b7ce7-38529646d0d2edcc15288ed9",
  "transport": "tcp",
  "signature_scheme": "hmac-sha256",
  "kernel_name": ""
}

Paste the above JSON into a file, and connect with:
    $> jupyter <app> --existing <file>
or, if you are local, you can connect with just:
    $> jupyter <app> --existing kernel-71ec3129-90c4-4637-af36-85e0771b93d4.json
or even just:
    $> jupyter <app> --existing
if this is the most recent Jupyter kernel you have started.


We can connect automatically a Qt Console to the currently running kernel with the %qtconsole magic, or by typing ipython console --existing <kernel-UUID> in any terminal:

In [3]:
%qtconsole magic

## Python's Data Science Ecosystem

There are also many often-used third-party modules that are core tools for doing data science with Python.
Some of the most important ones are:

#### [``numpy``](http://numpy.org/): Numerical Python

Numpy is short for "Numerical Python", and contains tools for efficient manipulation of arrays of data.
If you have used other computational tools like IDL or MatLab, Numpy should feel very familiar.

#### [``scipy``](http://scipy.org/): Scientific Python

Scipy is short for "Scientific Python", and contains a wide range of functionality for accomplishing common scientific tasks, such as optimization/minimization, numerical integration, interpolation, and much more.

#### [``pandas``](http://pandas.pydata.org/): Labeled Data Manipulation in Python

Pandas is short for "Panel Data", and contains tools for doing more advanced manipulation of labeled data in Python, in particular with a columnar data structure called a *Data Frame*.
If you've used the [R](http://rstats.org) statistical language , much of the functionality in Pandas should feel very familiar.

#### [``matplotlib``](http://matplotlib.org): Visualization in Python

Matplotlib started out as a Matlab plotting clone in Python, and has grown from there in the 15 years since its creation. It is the most popular data visualization tool currently in the Python data world (though other recent packages are starting to encroach on its monopoly).

In [4]:
import pandas as pd


In [5]:
pd.read_csv?

In [6]:
df = pd.read_csv('data/Places_Full.csv')

In [7]:
df

Unnamed: 0,name,address,city,lat,lng,place_id,rating,class,type
0,Trader Joe's,1700 East Madison Street,Seattle,47.615866,-122.309913,ChIJx0M1ztNqkFQRtgspEllQxk8,4.5,supermarket,
1,Hillcrest Market,110 Summit Avenue East,Seattle,47.618850,-122.325005,ChIJt5emCTMVkFQR-BCgFDTvu9o,3.5,supermarket,
2,Uwajimaya,600 5th Avenue South,Seattle,47.596843,-122.326929,ChIJq9nX27xqkFQRu05rxkrN7f4,4.5,supermarket,
3,Kress IGA Supermarket,1427 3rd Avenue,Seattle,47.609396,-122.337822,ChIJbdIhoLNqkFQRMTiHCN6W4nU,3.9,supermarket,
4,Double Dorjee,1501 Pike Street # 511,Seattle,47.608822,-122.339570,ChIJkX3s-LJqkFQRQRexqCN0jQY,5.0,supermarket,
5,Whole Foods Market,2210 Westlake Avenue,Seattle,47.618344,-122.338097,ChIJyz9g9EkVkFQRutvXu56_BLk,4.3,supermarket,
6,Grocery Outlet Bargain Market,1702 4th Avenue South,Seattle,47.587870,-122.328625,ChIJIWM_pp5qkFQRsh-E4oUVcIM,4.3,supermarket,
7,Metropolitan Market Uptown,100 Mercer Street,Seattle,47.624805,-122.354842,ChIJsY2q3UMVkFQRBAbqlwfTqdE,4.5,supermarket,
8,Trader Joe's,1916 Queen Anne Avenue North,Seattle,47.636582,-122.356759,ChIJn-z5HhMVkFQR0APZKI1twvM,4.6,supermarket,
9,Foulee Market,2050 South Columbian Way,Seattle,47.559965,-122.305132,ChIJmbwZA99BkFQRIACApE5YiTQ,4.2,supermarket,


### What to do next?

Now that we've loaded our .csv file into a pandas data frame, we have some powerful tools available to us. In the next segment we'll see a sample of operations we can do with pandas and geopandas.