# Welcome to Lab 0!

In this lab, we'll:
- Explore the jupyter notebook environment
- Understand the different types of cells
- Run a cell and edit some code
- See some handy jupyter shortcuts

## Important !
**Always save your work in jupyter.** Either via `File`> `Save and Checkpoint` or `Command` + `S`
**Jupyter cells need to run in sequence.** Running cells out of order (ie if you run a cell at the bottom of the page first) may cause code errors.

## Why do we use Jupyter?
The main reason we are using Jupyter notebooks is that they offer interactivity with code, without forcing you to learn to code right away. You can run pre-written blocks of code, and then, over time, as you gain confidence, modify those blocks of code, enventually working up to writing whole new blocks of code.

In addition, Jupyter offers a unified environment to work in, and the possibility of easy collaboration. 

### Code cells
Instead of writing a long script for the computer to compile and run, jupyter has **cells** which allow you to run parts of your code, edit and fix your code quickly, and see results immediately from a cell.

In Jupyter, a "cell" is each block of code or text.  You can easily edit each cell by double clicking on it, making any changes, and then go to `Cell` in the top menu, then click `Run Cells`

Test this out. Double click _this_ cell, change the below text to say "jupyter is an interactive coding environment" and then run the cell. 

jupyter is a coding environment

Now try running the previous cell again, but instead of clicking on `Cell` in the top menu, try running the cell using a keyboard shortcut: `Shift + Enter`

### _Markdown_ cells and _R_ cells
All cells in a notebook are code, but some (all the ones you have read so far included) are in a simple text format called Markdown, which enables you to enter nicely formatted text.

**This is a Markdown cell.**

You can think of Markdown cells as text, where you can type freely. Markdown is not code, but there are little things you have to do in order to make things **bold** or *italic* or 
- bulleted 
- like 
- this

Markdown is frequently used, and you might see files that have a `.md` ending to them. Those are markdown files. 

If you double click on this cell, can you see/guess how the word **bold** is made bold?

Now try running the cell below. What happens?

This is my map. This is for assignment 1

To have a cell run as a Markdown cell, you need to tell the notebook that it is a Markdown cell. By default because this notebook is an R notebook, all new cells created are `code` cells and are interpreted by the notebook as R (a programming language) code.

To turn a cell into a Markdown cell, go to `Cell` > `Cell Type` > `Markdown`.
The shortcut to doing this is hitting `Esc` to enter command mode, and typing `m`.

Try this for the above cell.

## Finally some R code

OK, so nicely formatted text is all very well, but we could do that in a word processor, now we come to the real point of using a notebook, which is to run blocks of code. Below is an R code cell.

In [None]:
# Anything with a # in front of it is not executed by the computer, it's a comment

# Let's first make a variable called a
a <- 8

The above cell set the value of a variable called `a` to be 8. We can check that the notebook remembers this by asking for the value of `a`

In [None]:
a

In [None]:
# Now let's make another variable called b
b <- 10

In [None]:
# What's a + b?

a + b

We can perform other arithmetic functions easily in R. For example, according to the R documents, we can do multiplication using `x * y`. Add a cell below and try mutiplying variables a and b

In [None]:
# An important thing to note is that variables are "global", that is, 
# their values are carried over from one cell to another. For example:

d <- a+b

In [None]:
d + b 
#Should be 28

## Some quick map plotting

Let's try reading a simple shape file. For those who haven't run into shape files before they are a popular file format for storing geographical data. First, we need to import an R package that will help us do that. 

You can think of R packages as bundles of commands that enable us to do particular specialized things. They aren't part of the basic functionality of R because including commands for all the things that you might want to do in R would make it a very large and unwieldy platform.

In [None]:
library(rgdal)

In [None]:
auckland <- readOGR("data/tb_0006_bycau06.shp")

The result tells us that we successfully read a file that contains 103 features (i.e. geographical things), and that each of those features has 19 'fields' of information associated with it.

We can see a list of the field names using the `names` function.

In [None]:
names(auckland)

We can see the first 5 rows of the data table with the `head` command.

In [None]:
head(auckland)

Or, we can see the data nicely formatted by viewing it as a `data.frame`.

In [None]:
#Here's the expanded view
auckland_df <- data.frame(auckland)
auckland_df

We can use the plot function to plot the data, and, since these data are geographical, we will get a map.

In [None]:
plot(auckland, col='grey', lwd=0.25)

Note how we specify a color for the regions (`col='grey'`) and a line width (`lwd=0.25`). You don't have to specify these, or you can change them. Try modifying the code and making a different map.

# Chloropleth maps

We can also make simple chloropleth maps using plot. There's column in our data called `TB_RATE`, or tuberculosis rate, so let's make a simple chloropleth map of that. Choropleth maps are those where regions are colored according to underlying data values (think of the election results maps from 2016).

## Exploring the data
Since choropleth maps are maps of data, it is worth first familiarizing ourselves with the data in question, independent of the geography.

Since we are concerned with the `TB_RATE` variable, let's see what it looks like in terms of the distribution across the 103 areas in the map.

In [None]:
summary(auckland$TB_RATE)



**Question**:

- What's the lowest TB_RATE?
- What's the highest TB_RATE?

From this result, you can see that the date are skewed, with only a small number of higher values, since the median is 88, meaning that half the rates are that level or lower. More visually, we can make a histogram:

In [None]:
hist(auckland$TB_RATE)

It gets tedious typing `auckland$TB_RATE`, so we can use the `attach` command to save us the trouble, and view the same data a different way.

In [None]:
attach(auckland_df)
boxplot(TB_RATE)

## Mapping the data
First, we will load a couple of libraries useful for this purpose. `RColorBrewer` gives us access to nice color schemes from [ColorBrewer](http://colorbrewer.org) and `classInt` helps with partitioning data into classes (or intervals or categories) using a number of popular methods.

In [None]:
library(RColorBrewer)
library(classInt)

### Colors and classes
We'll make a map with four shades of green. To do this, first we need a color palette.

In [None]:
n <- 4
palette <- brewer.pal(n, "Greens")
palette

If you have done any graphical work you might recognize those numbers as a series of RGB color codes. If not, don't worry about it. The important thing is that the `RColorBrewer` command `brewer.pal` allows us to make nice sets of colors according to specifications as described in the [detailed documentation](https://www.rdocumentation.org/packages/RColorBrewer/versions/1.1-2/topics/RColorBrewer).

Note how we have put the number of colors in a variable `n` which will make it easier to change the code to make different maps later.

To accompany these colors we need a way to assign data to different classes, which will be colored differently. This is what the `classIntervals` function (from the `classInt` package) provides.

In [None]:
classes <- classIntervals(TB_RATE, n, style="equal")
classes

The resulting table shows us that classes consists of 4 classes, with data ranges 0-112.5, 112.5-225, 225-337.5 and 337.5-450. The list of numbers below is how many of the 103 areas will be assigned to each class when we make a map using these classes.  It is no coincidence that each of these data ranges is the same size (112.5 units), because we specified `style="equal"` when we called `classIntervals`. Other **classification schemes** are possible, as described in the [documentation for `classInt`](https://www.rdocumentation.org/packages/classInt/versions/0.1-24).

### Putting it all together in a map
To make a map, we will want to call the `plot` function on the `auckland` dataset, but this time specify a list of colors for the regions. We use another function from the `classInt` package to do this:

In [None]:
colors <- findColours(classes, palette)
colors

And we are ready to make a map.

In [None]:
plot(auckland, col=colors)

A map like this is not much use without a title and a legend to tell us what we are looking at, so a more complete recipe is below.

In [None]:
plot(auckland, col=colors)
title(main="TB Rates in Auckland, per 100,000 population")
legend('topleft', legend=names(attr(colors, "table")),
    fill=attr(colors, "palette"), cex=0.8, bty="n")

## Explore some more
The above sequence of cells (starting from **Colors and classes**) is worth spending some time with. Experiment with what happens when you change `n` (the number of classes), the color palette, or the classification `style`. To do this, keep in mind that you will need to run the series of cells in order from the one where `n` and the color paletter are defined on down.

## The power of programming
In case you are interested, in the next cell, we put all the operations required to make a map into a single function, which makes it easier to to make different maps. It is not necessary for you to understand everything going on here. Think of it as furthering your appreciation of what you can do in a mapping environment that is 'driven' using code.

In [None]:
## Definition of a function to automate a series of commands and make a choropleth map
choro <- function(sf, varname, nclasses=5, pal='Reds', sty='equal', ttl='') {
    palette <- brewer.pal(nclasses, pal)
    classes <- classIntervals(sf[[varname]], nclasses, style=sty)
    colors <- findColours(classes, palette)
    plot(sf, col=colors, lwd=0.35)
    legend('top', ncol=3, legend=names(attr(colors, 'table')), fill=attr(colors, 'palette'), cex=0.8, bty='n')
    title(ttl)
}

With the `choro` function defined it becomes easier to make a variety of different maps. Experiment by changing the cell below.

In [None]:
choro(auckland, 'TB_RATE', pal='YlOrRd', nclasses=5, sty='quantile', ttl='Auckland, TB rates, per 100,000')

### Keyboard shortcuts in jupyter

Keyboard shortcuts in jupyter are helpful so that you don't have to click on the top menu all the time. A short list of helpful keyboard shortcuts are located here:

http://maxmelnick.com/2016/04/19/python-beginner-tips-and-tricks.html

Some of the most helpful ones for us will be in `Command` Mode. You can tell when a cell is in command mode by the vertical color bar on the side of the cell -- it will be blue when it's in command mode.

- `shift` + `enter` run cell, select below
- `ctrl` + `enter` run cell
- `option` + `enter` run cell, insert below
- `d` , `d` delete selected cell
- the `?` before a function will give you helpful information about it

In [None]:
#Try running this cell. 
?plot

**Question**:
- What's the `Usage` of plot?

**Try deleting this cell using the `d`,`d` shortcut!**