# P03 - Time series analysis with Pandas

This tutorial introduces the time-series analysis features of Python.  It introduces the Datetime object, which handles calculations involving periods of time.  The Pandas module introduces two new objects: Dataframes, which are tables of data, and Series, which represent a single column or row.  We take advantage Pandas' datetime-based indexing to process time-series data.

### Setting up

In [None]:
# Import modules
import datetime as dt
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


# Show plots within notebooks
%matplotlib inline

# Show module versions
print('Python: {}'.format(sys.version))
print('Pandas: {}'.format(pd.__version__))
print('Numpy: {}'.format(np.__version__))
from matplotlib import __version__ as mplv
print('Matplotlib: {}'.format(mplv))

# Setup working directory
wdir = os.getcwd()  # Change this if required
os.chdir(wdir)

### Creating datetime objects

In [None]:
#  Directly enter year, month, day, hour etc.
dt.datetime(1955, 11, 12, 22, 4, 0)

In [None]:
#  Read from formatted text string (See strftime.org for codes)
dt.datetime.strptime('26 October 1985, 01:21:59 am', '%d %B %Y, %I:%M:%S %p')

In [None]:
#  Get current time from function
dt.datetime.now()

### Timedelta objects represent time spans

In [None]:
#  Timedelta objects represent the difference between to datetimes.
departure = dt.datetime.strptime('26 October 1985, 01:21:59 am', '%d %B %Y, %I:%M:%S %p')
delta = dt.datetime.now() - departure
print(delta)

In [None]:
#  You can define timedelta based on number of days, seconds and microseconds.  (Why not months or years?)
delta = dt.timedelta(1, 1, 1)
print(delta)

In [None]:
#  You can specify them by named keys
delta = dt.timedelta(seconds=864000)
print(delta)

In [None]:
#  The total_seconds() function converts into seconds.  (Python 2.7)
delta = dt.timedelta(1, 1, 1)
print(delta.total_seconds())

### Extracting information from datetime objects

In [None]:
#  Year, month, day etc. are attributes of the object
arrival = dt.datetime(1955, 11, 12, 22, 4, 0)
print(arrival.year)
print(arrival.hour)

In [None]:
#  There are methods calculate number of days since Jan 01, 1 A.D.
print(arrival.toordinal())

In [None]:
#  Or the day of the week (Monday = 0)
print(arrival.weekday())

In [None]:
#  The .strftime method writes a string in the specified format
arrival.strftime('%Y-%m-%d %H:%M:%S')

### Datetime Exercises

1. Was Marty McFly's journey Back to the Future (departure and arrival times are defined above)
   longer or shorter than if he had travelled to now?
2. When will you be (or were you) 1 billion seconds old?
3. Change the arrival.strftime() string to print the arrival date as "04 minutes past 10 on 12 November 1955".

### Loading time series data in Pandas

This example uses temperatures of steam vents (fumaroles) on the crater of Volcán de Colima, Mexico, as measured by infrared camera during a night in 2006.

In [None]:
#  Create a Pandas dataframe reading data from a .csv file.  It can translate dates into datetime objects
infraredData = pd.read_csv('InfraredCameraData.csv', parse_dates=[0])
infraredData.set_index('DateTime', inplace=True)  # Set the datetime column as the index
infraredData.head(10)  # Print the first 10 values

In [None]:
#  Plot the time series.
infraredData.plot()
plt.savefig('graph.png')

In [None]:
#  Extract a column as a data series with dictionary-like notation
e_flank = infraredData['EFlankAvg']
e_flank.head()

In [None]:
#  Extract 1 minute worth of rows by slicing the index with datetimes
rows = infraredData[dt.datetime(2006, 5, 23, 4, 0):dt.datetime(2006, 5, 23, 4, 1)]
rows

### When was the explosion?

In [None]:
#  Find the row corresponding to the explosion (where temperature is max temperature)
explosion_status = infraredData['CraterMax'] == infraredData['CraterMax'].max()
explosion_status.head()  # This Series has False for all rows, except for the explosion

In [None]:
#  Extract the timestamp
explosion_time = infraredData.index[explosion_status]  # Get index values where explosion_status is True
explosion_time[0]  # Extract the first (only) value

In [None]:
#  Add a column for the 2-minute rolling mean of the CraterMax temperature
rolling_mean = infraredData.rolling(center=False, window=24)  # 24 x 5 second intervals
infraredData['CraterMaxRolling'] = rolling_mean['CraterMax']
infraredData.plot()

### Calculating the mean fumarole temperatures

#### You can't trust daytime data because the sun heats the rocks

In [None]:
#  Use time index slicing to select time when data unaffected by sunlight or explosions (02:50 to 05:50).
night = infraredData['2006-05-23 02:50':'2006-05-23 05:50']
night.plot()

#### Clouds passing into view cause the temperature to drop and vary rapidly

In [None]:
#  Drop data where clouds obscure the crater (and max temperature appears below 28°C)
no_clouds = night[night['CraterMax'] > 28]
no_clouds.plot()

In [None]:
#  Resample to take 2 minute maximum values, dropping other cloud noise
max_2mins = no_clouds.resample('2min').max()
max_2mins.plot()

In [None]:
# The following are the fumarole temperature statistics for one night, unaffected by explosions, solar heating or sunshine
# A time series of these results shows long term changes at the volcano.
max_2mins.describe()

# Exercises

1. What percentage of the original data have we used in the max_2mins data?
2. Drop data within 10 minutes of explosion_time from infraredData

### _Footnote_
This analysis was the reason that I learned programming in the first place.  The time series datasets were just too large for Excel.  I learned Matlab to do it.  It took months and was hundreds of lines of code.  The simplicity of the Python version really highlights for me how scientific computing has come on in a decade.  For full results, see:
+ Stevenson, J. A., and N. Varley (2008), Fumarole monitoring with a handheld infrared camera: Volcán de Colima, Mexico, 2006-2007, Journal of Volcanology and Geothermal Research, 177(4), 911-924, doi:[10.1016/j.jvolgeores.2008.07.003](http://dx.doi.org/10.1016/j.jvolgeores.2008.07.003).