# Introduction to Jupyter notebooks and Pandas

## 1. Opening and Navigating the IPython Notebook

We will start today with the interactive environment that we will be using in the tutorial: the [Jupyter Notebook](http://jupyter.org).

We will walk through the following steps together:

1. (If you don't yet have anaconda) Download [miniconda](https://conda.io/miniconda.html) (be sure to get Version 3.6) and install it on your system (hopefully you have done this before coming to class)
   ```
   ```

2. Use the ``conda`` command-line tool to update your package listing and install the IPython notebook:

   Update ``conda``'s listing of packages for your system:
   ```
   $ conda update conda
   ```
   
   Install IPython notebook and all its requirements
   ```
   $ conda install jupyter notebook
   ```
   
4. Use git to clone our repo: 

```
git clone https://github.com/uwescience/dssg2018-geopandasSQL-tutorial
```
   
3. Navigate to the directory containing the tutorial material

   ```
   $ cd ~/dssg2018-geopandasSQL-tutorial/
   ```
   
   You should see a number of files in the directory, including these:
   
   ```
   $ ls
   ...
   0-Jupyter-notebooks.ipynb
   1-intro-geopandas.ipynb
   ...
   ```

4. Type ``jupyter notebook`` in the terminal to start the notebook

   ```
   $ jupyter notebook
   ```
   
   If everything has worked correctly, it should automatically launch your default browser
   ```
   ```
   
5. Click on ``0-Jupyter-numpy.ipynb`` to open the notebook containing the content for this lecture.

With that, you're set up to use the Jupyter notebook!

## About Jupyter

- Combines the features of an IPython terminal with a fancy web interface
- Every notebook has one python instance running with it
- Also supports markdown and html embedding
- Easy to insert $\LaTeX$ formulas
- Be ware of VIM-style keyboard shortcuts

## Python refresher
- Components with the same capabilities are of the same *type*. 
  - For example, the numbers 2 and 200 are both integers.
  
- A type is defined recursively. Some examples.
  - A list is a collection of objects that can be indexed by position.
  - A list of integers contains an integer at each position.
  
- A type has a set of supported operations. For example:
  - Integers can be added
  - Strings can be concatented
  - A table can find the name of its columns
    - What type is returned from the operation?
    
- In python, members (components and operations) are indicated by a '.'
  - If `a` is a list, the `a.append(1)` adds `1` to the list.

In [None]:
a = 1

In [None]:
# Python lists can store data of different types at the same time
a_list = [1, 'a', [1,2]]

In [None]:
# What if we forgot what functions a list has?
a_list.append?

In [None]:
a_list.append(2)

In [None]:
a_list

In [None]:
# What are the full set of functions available for our list
dir(a_list)

In [None]:
# Count how many times the integer '1' occurs
a_list.count(1)

## The IPython kernel/client model


In [None]:
%connect_info

We can connect automatically a Qt Console to the currently running kernel with the %qtconsole magic, or by typing ipython console --existing <kernel-UUID> in any terminal:

In [None]:
%qtconsole magic

## Python's Data Science Ecosystem

There are also many often-used third-party modules that are core tools for doing data science with Python.
Some of the most important ones are:

#### [``numpy``](http://numpy.org/): Numerical Python

Numpy is short for "Numerical Python", and contains tools for efficient manipulation of arrays of data.
If you have used other computational tools like IDL or MatLab, Numpy should feel very familiar.

#### [``scipy``](http://scipy.org/): Scientific Python

Scipy is short for "Scientific Python", and contains a wide range of functionality for accomplishing common scientific tasks, such as optimization/minimization, numerical integration, interpolation, and much more.

#### [``pandas``](http://pandas.pydata.org/): Labeled Data Manipulation in Python

Pandas is short for "Panel Data", and contains tools for doing more advanced manipulation of labeled data in Python, in particular with a columnar data structure called a *Data Frame*.
If you've used the [R](http://rstats.org) statistical language , much of the functionality in Pandas should feel very familiar.

#### [``matplotlib``](http://matplotlib.org): Visualization in Python

Matplotlib started out as a Matlab plotting clone in Python, and has grown from there in the 15 years since its creation. It is the most popular data visualization tool currently in the Python data world (though other recent packages are starting to encroach on its monopoly).

In [None]:
import pandas as pd
df = pd.read_csv('data/Places_Full.csv')

In [None]:
df

### What to do next?

Now that we've loaded our .csv file into a pandas data frame, we have some powerful tools available to us. In the next segment we'll see a sample of operations we can do with pandas and geopandas.