# Analyzing River Thames Water Levels
Time series data is everywhere, from watching your stock portfolio to monitoring climate change, and even live-tracking as local cases of a virus become a global pandemic. In this project, you’ll work with a time series that tracks the tide levels of the Thames River. You’ll first load the data and inspect it data visually, and then perform calculations on the dataset to generate some summary statistics. You’ll end by reducing the time series to its component attributes and analyzing them. 

The original dataset is available from the British Oceanographic Data Center.

Here's a map of the locations of the tidal meters along the River Thames in London.

![](locations.png)

The provided datasets are in the `data` folder in this workspace. For this project, you will work with one of these files, `10-11_London_Bridge.txt`, which contains comma separated values for water levels in the Thames River at the London Bridge. After you've finished the project, you can use your same code to analyze data from the other files (at other spots in the UK where tidal data is collected) if you'd like. 

The TXT file contains data for three variables, described in the table below. 

| Variable Name | Description | Format |
| ------------- | ----------- | ------ |
| Date and time | Date and time of measurement to GMT. Note the tide gauge is accurate to one minute. | dd/mm/yyyy hh:mm:ss |
| Water level | High or low water level measured by tide meter. Tide gauges are accurate to 1 centimetre. | metres (Admiralty Chart Datum (CD), Ordnance Datum Newlyn (ODN or Trinity High Water (THW)) | 
| Flag | High water flag = 1, low water flag = 0 | Categorical (0 or 1) |



In [1]:
# We've imported your first Python package for you, along with a function you will need called IQR
import pandas as pd               

def IQR(column): 
    """ Calculates the interquartile range (IQR) for a given DataFrame column using the quantile method """
    q25, q75 = column.quantile([0.25, 0.75])
    return q75-q25

In [2]:
thames = pd.read_csv("data/10-11_London_Bridge.txt")

thames = thames.drop(columns = " HW=1 or LW=0")
thames = thames.rename(columns = {" water level (m ODN)" : "Water Level", " flag" : "Flag"})

thames["Date and time"] = pd.to_datetime(thames["Date and time"], dayfirst = True)
thames["Month"] = thames["Date and time"].dt.month
thames["Year"] = thames["Date and time"].dt.year

thames["Water Level"] = thames["Water Level"].astype(float)

thames["Flag"] = thames["Flag"].replace({1 : "High", 0 : "Low"})

thames = thames[["Date and time", "Year", "Month", "Water Level", "Flag"]]
thames.head()

Unnamed: 0,Date and time,Year,Month,Water Level,Flag
0,1911-05-01 15:40:00,1911,5,3.713,High
1,1911-05-02 11:25:00,1911,5,-2.9415,Low
2,1911-05-02 16:05:00,1911,5,3.3828,High
3,1911-05-03 11:50:00,1911,5,-2.6367,Low
4,1911-05-03 16:55:00,1911,5,2.9256,High


In [8]:
high = thames[thames["Flag"] == "High"]
high_statistics = high.groupby("Flag").agg({"Water Level" : ["mean", "median", IQR]})

low = thames[thames["Flag"] == "Low"]
low_statistics = low.groupby("Flag").agg({"Water Level" : ["mean", "median", IQR]})

display(high_statistics)
display(low_statistics)

Unnamed: 0_level_0,Water Level,Water Level,Water Level
Unnamed: 0_level_1,mean,median,IQR
Flag,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
High,3.318373,3.3526,0.7436


Unnamed: 0_level_0,Water Level,Water Level,Water Level
Unnamed: 0_level_1,mean,median,IQR
Flag,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Low,-2.383737,-2.4129,0.5382


In [4]:
high_tide_days = thames[thames["Flag"] == "High"]
low_tide_days = thames[thames["Flag"] == "Low"]

In [5]:
all_high = high_tide_days.groupby("Year")["Water Level"].count()
very_high_days = high_tide_days[high_tide_days["Water Level"] > high_tide_days["Water Level"].quantile(0.9)].groupby("Year")["Water Level"].count()
very_high_ratio = (very_high_days / all_high).reset_index()
very_high_ratio

Unnamed: 0,Year,Water Level
0,1911,0.004098
1,1912,0.032316
2,1913,0.082212
3,1914,0.055313
4,1915,0.045045
...,...,...
80,1991,0.096317
81,1992,0.103253
82,1993,0.145923
83,1994,0.150355


In [6]:
all_low = low_tide_days.groupby("Year")["Water Level"].count()
very_high_days = low_tide_days[low_tide_days["Water Level"] < low_tide_days["Water Level"].quantile(0.1)].groupby("Year")["Water Level"].count()
very_low_ratio = (very_high_days / all_low).reset_index()
very_low_ratio

Unnamed: 0,Year,Water Level
0,1911,0.060606
1,1912,0.066667
2,1913,0.022388
3,1914,0.039017
4,1915,0.033435
...,...,...
80,1991,0.150355
81,1992,0.107496
82,1993,0.112696
83,1994,0.106383


In [None]:
solution = {"high_statistics": high_statistics, 
            "low_statistics": low_statistics, 
            "very_high_ratio": very_high_ratio, 
            "very_low_ratio": very_low_ratio}
solution

{'high_statistics':      Water Level                
             mean  median     IQR
 Flag                            
 High    3.318373  3.3526  0.7436,
 'low_statistics':      Water Level                
             mean  median     IQR
 Flag                            
 Low    -2.383737 -2.4129  0.5382,
 'very_high_ratio':     Year  Water Level
 0   1911     0.004098
 1   1912     0.032316
 2   1913     0.082212
 3   1914     0.055313
 4   1915     0.045045
 ..   ...          ...
 80  1991     0.096317
 81  1992     0.103253
 82  1993     0.145923
 83  1994     0.150355
 84  1995     0.170213
 
 [85 rows x 2 columns],
 'very_low_ratio':     Year  Water Level
 0   1911     0.060606
 1   1912     0.066667
 2   1913     0.022388
 3   1914     0.039017
 4   1915     0.033435
 ..   ...          ...
 80  1991     0.150355
 81  1992     0.107496
 82  1993     0.112696
 83  1994     0.106383
 84  1995     0.107801
 
 [85 rows x 2 columns]}