# Plotting with Data!

In [None]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

# Python dictionary

Before we talk about working with data, we're going to learn a new python type, the dictionary!

A dictionary in python is kind of like a dictionary in the real world, it is an object that associates a reference to a value. This is similar to how a dictionary associates a word with a definition. 

In python, we can declare a dictionary with curly brackets {}. Inside the curly brackets, we write out keys and values, as so:

In [None]:
SampleDictionary = {"Aardvark": "A type of animal", "Bear": "A kind of scary furry mammal"}

Dictionaries have two types of things in them, <b>keys</b> and <b>values</b>. Each <b>values</b> is associated with a <b>key</b>. When defining a dictionary, the key is on the left of the colon, and the value is on the right. Keys and values can be any type, in the example above both our keys and our values are strings, but they could be ints, or floats, or even lists.

You can access the values from the dictionary by "indexing" into the dictionary with the key that is associated with that value, for example to see the definition for a "Bear" we type:

You can add a key, value pair to a dictionary by assigning that value to the key like a variable, as so:

In [None]:
SampleDictionary["Fish"] = "A swimming not-mammal"
print(SampleDictionary)

In [None]:
animalKey = "Lion"
SampleDictionary[animalKey] = "King of the jungle"

print(SampleDictionary)

Finally, you can print all of the keys or values for a dictionary by running `SampleDictionary.keys()` or `SampleDictionary.values()`

## Example 1: Making your own dictionary

A common code that people will use is to assign a number to each letter of the alphabet, such that A = 1, B = 2, etc. 

In the cell below, make your own dictionary where the keys are the first 5 letters of the alphabet, and the values is that letter's corresponding numerical value. Print out the dictionary, and access some of the values in the dictionary using the letters as keys.

# Example 2: How much is a word?

Let's use the dictionary that you created to calculate the numerical value of the word CAB. The numerical value of the word will be the total value of each of its letters, based on the value those letters have in the dictionary you defined (e.g., A = 1, B = 2, etc.).

First, write a ```for``` loop that will iterate over the string "CAB" and print each letter in the string

Next, change your ```for``` loop from above so that instead of printing each letter in the string, you are indexing into the dictionary with that letter.

Finally, change your ```for``` loop from above so that you add the result of indexing into the dictionary to a variable called ```wordValue```, which is initially zero before the for loop

In [None]:
wordValue = 0

# Working with data and pandas

`pandas` is a package that heavily builds upon this concept of a dictionary that we use to manage data.

Run the cell below to import it. A common way to abbreviate `pandas` (like `numpy` as `np`) is as `pd`.

In [None]:
import pandas as pd

## The pandas dataframe

`pandas` introduces a new type called a `dataframe`. If it helps, you can kind of think of it like an excel sheet, because it organizes data into columns and each column has a name. Thinking of it like a dictionary, the column name is the key and the column is the value.

We can load a dataframe from a text file using the `read_csv` function in pandas. Run the cell below to load the `PlanetEvolution` file into a pandas dataframe. 

(Quick note here, a .csv file is a way to organize data, csv stands for comma-separated values. This means that the columns in the data are separated by commas, the `read_csv` function knows this and can tell which column is which by looking for those commas.

In [None]:
data = pd.read_csv("PlanetEvolution.csv")

So now we have this variable data that contains the `pandas` dataframe that holds the data from "PlanetEvolution.csv". Whenever we load a dataframe we want to get a feel for the data. The best way to do this is with the `.head()` function that every dataframe has access to. This will give us the first 5 rows of the data, try running the cell below to see what this function does.

In [None]:
data.head()

Another helpful thing to get familiar with the data (especially if there are a lot of columns) is the `.columns` variable that every dataframe has available. This will return a numpy array that tells you all the columns in that dataframe. Run the cell below to see what it does.

In [None]:
print(data.columns)

This is data from my own research. I run simulations of how planets change over time and keep track of a few important values over time, here's what each value means:

`Time`: Is an array that gives the time of each index. The units are in billions of years.

`SurfWaterMass`: Is an array where every element represents the amount of water in the atmosphere of this modeled planet at each time. Is in units of terrestrial oceans (1 TO = 1.39e21 kg of water).

`MantleTemp`: Is an array where every element represents the temperature of the modeled planet's mantle at each time. Is in units of Kelvin.

`EruptionRate`: Is an array where every element represents the rate of magma being erupted at that time. Is in units of kilograms per second.

Finally, let's get data from this dataframe. Pandas dataframes work very similarly to a python dictionary, you get the data in the column by "indexing" into the dataframe with the name of the column, like below:

Notice that this looks a little different to a numpy array, this is actually a `pandas` <b>Series</b> (which you can read more about <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.html">here</a> and <a href="https://towardsdatascience.com/a-practical-introduction-to-pandas-series-9915521cdc69">here</a>). A Series works a little differently than a numpy array does, but it's easy to get the numpy array from the series, just use the `.values` command that is part of every Series.

In [None]:
print(data["Time"].values)

## Example 2: Getting data from a pandas dataframe

In the code cell below, print the Time column of the data, and the SurfWaterMass column.

## Example 3: Plotting data

In the code cell below plot the SurfWaterMass column by the Time column. Use ```plt.ylabel()``` and ```plt.xlabel()``` to caption the axes, ```plt.ylabel()``` and ```plt.xlabel()``` take one argument each, a string that contains the name of the variable plotted on that axis.

### Example 4: Log scaling data

Looking at the plot that you've made, what does it look like the amount of water in the atmosphere is at 4.5 billion years?

The amount of water in the atmosphere at 4.5 billion years is the same thing as the amount of water in the atmosphere at the end of the simulation, because the simulation models 4.5 billion years of evolution.

In the cell below, print the value of the last element of the `SurfWaterMass` array? Is it close to the estimate you made by looking at the plot?

The issue here is that the y axis is in <b>linear spacing</b>, this means that the spaces between ticks are linear (e.g., 1.5 -> 1.25 -> 1.0). This is great for viewing data that might be linearly spaced apart, but we can see that if the data goes from large to small numbers, it can be hard to estimate what the data is doing when it reaches those small valuables. 

Using linear spacing on the y axis in our plot makes it look like the final amount of water in the atmosphere is 0 terrestrial oceans, when it's actually 1.23e-05 TO's (which is close to the amount of water in Lake Superior, that's not a lot for a whole atmosphere but it is different than 0).

When you want to have small numbers and large numbers visible on the same axis, you should use <b>logarithmic spacing</b> not <b>linear spacing</b>. Logarithmic spacing means that the spaces between ticks are in powers of 10 (e.g., 1 -> 10 -> 100).

In the cell below, copy and paste your code plotting AtmosphericWater over Time with the axes labeled. Then, edit your code to change the y axis to a log scale using ```plt.semilogy()``` (this function takes no argument). What does the final value of AtmosphericWater look like when the figure is plotted in a log scale?