# Day 2. File I/O and plotting
---------

In [None]:
#import statements we will need

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import netCDF4 as nc

In [None]:
# whoops! the above might not work, because we haven't installed
# all of these modules yet.
# if it fails, uncomment the following line and run this cell:
# conda install netcdf4

## 0. Direct Input/output


A very simple way to get input is to use the input() command. This asks the user to type in a value, and the result will be set as a variable.

Note, the input will always be a string - so you may need to convert it if you want to do calculations with it.

In [None]:
x=input('enter input:')
print('the input was', x)
print('the type of input was', type(x))

----------
## 1. File I/O


#### The basics
Basic file I/O is done using the open() and close() statements.

The basic file I/O functions are file.write(str), file.writelines([list of str]), [list of str] = file.readlines(), or simply iterate over the lines using a for loop.

Note that the 'mode' for opening a file is important - it should generally be 'r' (read) or 'w' (write), though in special cases we can also use 'rw' to edit a file in place (not covered here).

In [None]:
myfile = open('simple_file.txt','w')
myfile.write('Here is a line\n')
myfile.write('Whoops, I forgot a newline here')
myfile.write('Will this appear on a new line?\n')
myfile.write('Last line.\n')
myfile.close()

In [None]:
myfile = open('simple_file.txt','r')
for line in myfile:
    print(line)
myfile.close()

#### Pandas

Pandas is a  useful library for data analysis, with great features that make reading and writing data very easy (it also has other features too!)

In [None]:
# read in some data from a csv file
cats=pd.read_csv('cats_over_time.csv')
print(cats)

In [None]:
# use the 'display' function instead of print() to get a nicer table
display(cats)

In [None]:
# to access individual columns, use the column name as a string, as the index:
print(cats['faculty1'])

In [None]:
# to get a single value or range of values from a column,
# we can index again using the row number:
print(cats['faculty1'][5])

In [None]:
# the result of extracting a column from a pandas dataframe
# is a pandas object called a series. Note how it appears:
print(type(cats['faculty1']))

In [None]:
# Dataframes and series act mostly like arrays, but sometimes we might want to
# convert them explicitly to a numpy array, so we can do more numpy-type
# operations on it. We can do this easily:

cats_array = np.array(cats)
print(cats_array)
print(type(cats_array))

# note that the column names have been lost!

In [None]:
# suppose our file is not comma-separated, but has whitespace (spaces or tabs)
# between the columns. Here's how to read such a file:

mydata = pd.read_csv('myfile.txt',delim_whitespace=True)
display(mydata)

In [None]:
# suppose our file has some missing data? Pandas doesn't care -
# it will fill in the missing values with "NaN" (not a number).

data1 = pd.read_csv('missing_data.csv')
display(data1)

-----
## 2. Making plots with matplotlib


Matplotlib is a matlab-style plotting library in python. It has become very popular and is easy to use for quick plots. There is too much to cover to do this justice, but a few examples are provided; also check the documentation for examples, tutorials, and detailed lists of options: https://matplotlib.org/stable/index.html

In [None]:
# before we start, if you have a high-resolution screen, first run this
# mystery line to help jupyter figure out that your screen is high-resolution,
# so that the figures don't appear all blurry.

%config InlineBackend.figure_format = 'retina'

In [None]:
# the simplest type of plot is just with the command plt.plot():
data1 = pd.read_csv('missing_data.csv')
plt.plot(data1['time'],data1['values'],'k.')
plt.show()

In the above, notice I set the marker to black dots with 'k.' The syntax here is color and marker type, just like matlab. Color options are: r,b,g,y,o,c,m,k (maybe others!) and the marker options are: -,--,-.,.,..,o,x,c,t,^, etc. Check out the whole list here: https://matplotlib.org/stable/api/markers_api.html

Try changing the options above to see a few different plot types!

In [None]:
# now a more complicated plot - two lines in one!
# just repeat the plot() command each time.

# use np.linspace(start,stop,N) to create a linearly spaced set of N values
t=np.linspace(0, 2*np.pi, 100)

# create some data - sine and cosine
y1=np.cos(t)
y2=np.sin(t)

# plot the data - if we don't specify the marker type, it defaults
# to a blue line, then an orange line, etc.
plt.plot(t,y1)
plt.plot(t,y2)

# add a grid
plt.grid()

# last command to finish the plot - after this, nothing can be added to it.
plt.show()

In [None]:
# another demo, setting figure size, plot style, legend, and subplots
plt.figure(figsize=(9,4))

ax1 = plt.subplot(2,1,1)
ax1.plot(t,y1,'--',label='cosine')
ax1.grid()
ax1.legend(loc='lower left')
ax1.set_ylabel('amplitude')
ax1.set_xlabel('time')

ax2 = plt.subplot(2,1,2)
ax2.plot(t,y2,'rx',label='sine')
ax2.grid()
ax2.legend(loc='lower left')
ax2.set_ylabel('amplitude')
ax2.set_xlabel('time')

# add a line at zero on the plot
ax2.axhline(y=0, color ='black')

plt.show()

#### Your turn:
Play around with the examples above and make them nicer!

Next, find an example you like on the matplotlib gallery here: https://matplotlib.org/stable/gallery/

Paste the code into the cell below, spend some time understanding what it does, then tweak the figure to your liking. Have fun!

#### Interactive plots

Because we are using jupyter (which uses iPython), we have access to special interactive features that are not exactly built into python, but are very legal and very cool.

To create an interactive object, we must define a function that contains all the code we need to run inside it (e.g. creates our plot), with input arguments as the options we want to be adjustable, then pass the name of that function to the interactive() command.

This is pretty neat - we can put almost anything we like inside our function, including a complicated numerical model, or a function that does different operations on a dataset, and then simply drag the slider to see the effects. In fact, we don't even have to put a plot inside the function - we can just put some print statements, and they will also be updated dynamically.

In [None]:
from ipywidgets import interactive

#define our plotting function, with inputs that we want to be variable.
# note that we give the inputs default values by specifying them here in the first line:
def my_special_plot(power=2,numpts=100,xmin=-10,xmax=10):
    # create the data:
    x=np.linspace(xmin,xmax,numpts)
    y=x**power

    # make a plot:
    plt.axhline(y=0, color ='black')
    plt.axvline(x=0, color ='black')
    plt.plot(x,y,'-b.')
    plt.title('$y=x^%d$'%power) # use dollar signs to format strings as math equations, using LaTeX

# now, we create the interactive plot with interactive() and display().
w = interactive(my_special_plot, power=(0,10), numpts=(1,200), xmin=(-25.,0.),xmax=(1,25.))
display(w)


## 3. Raster data

Go to https://downloads.psl.noaa.gov/Datasets/COBE/ and download the file 'sst.mon.ltm.1981-2010.nc'.

This is a NetCDF file that contains 12 monthly averages of global Sea Surface Temperature (SST), averaged over the period from 1981 to 2010, at 1x1 degree resolution. This file is a kind of "3D matrix" with dimensions 360 x 180 x 12. NetCDF files are very commonly used for this type of data storage in earth sciences, so it's good to be familiar with them.

Here is some example code that reads in the file, prints out the header information, and makes a simple plot of one slice of the data.

Note that the netCDF4 package may not be installed in your Anaconda environment. If you get an import error, you can add it by running in a separate cell the single command:
 `conda install netCDF4`. Or alternately, install it by running the same command on the command line (if you are using miniConda) or your Anaconda manager (if you are using full conda).


In [None]:
# here is some code to get you started viewing and interacting with NetCDF files

import netCDF4 as nc
filename = 'sst.mon.ltm.1981-2010.nc'
dataset = nc.Dataset(filename)

# print some information about the dataset:
print(dataset)
# in the last line, we can read that the array 'sst' is stored as a float32 array, with indices in the order (time,lat,lon).
# there is lots of other data in this file too!

# get the grid in January and plot it.
sst_jan=dataset['sst'][0,:,:]
plt.figure(figsize=(10,4))
plt.imshow(sst_jan)
plt.colorbar()
plt.title('Mean SST in January, 1981-2010')
plt.show()