# Project 2 - Programming for Data Analysis
> ## An Analysis of paleo-present climate data

To keep this project focused, I am breaking this project into 5 steps based on Faraz Mubeen's article on Medium.com from July 2023. In the ['*6 Steps of any data anlytics project*'](https://medium.com/@farazmubeen902/6-steps-of-any-data-analytics-project-bde8d8072f89), he describes the "six fundamental steps involved in any data analytics project". These are:

1. **Ask** - *Formulating the right questions*
2. **Prepare** - *Collecting and organising relevant data*
3. **Process** - *Cleaning and preparing data for analysis*
4. **Analyse** - *Uncovering patterns and insights*
5. **Share** - *Communicating insights through data visualisations*
6. **Act** - *Translating insights into actionable plans*

As the "**Ask**" has already been set by the assignment, I will be following a modified set of steps - *Prepare*, *Process*, *Analyse*, **Predict** and *Share*. 

> ### Academic References
>
> - Mubeen, F. (2023). 6 Steps of any Data Analytics Project. [online] Available at: https://medium.com/@farazmubeen902/6-steps-of-any-data-analytics-project-bde8d8072f89 [Accessed 11 Dec. 2023]. 
>

> ### Technical References
>
> - Javiya, R. (n.d.). How to Read Text Files with Pandas. [online] GeeksforGeeks. Available at: https://www.geeksforgeeks.org/how-to-read-text-files-with-pandas/ [Accessed 14 Dec. 2023]. Written by Rushi Javiya.


> ### **The Ask**: 
>
> • Analyse CO2 vs Temperature Anomaly from 800kyrs – present.
>
> • Examine one other (paleo/modern) features (e.g. CH4 or polar ice-coverage)
>
> • Examine Irish context:
>
> • Climate change signals: (see Maynooth study: The emergence of a climate change signal in long-term Irish meteorological observations - ScienceDirect)
>
> • Fuse and analyse data from various data sources and format fused data set as a pandas dataframe and export to csv and json formats
>
> • For all of the above variables, analyse the data, the trends and the relationships between them (temporal leads/lags/frequency analysis).
>
> • Predict global temperature anomaly over next few decades (synthesise data) and compare to published climate models if atmospheric CO2 trends continue
>
> • Comment on accelerated warming based on very latest features (e.g. temperature/polar-icecoverage)
>
>Use a Jupyter notebook for your analysis and track your progress using GitHub.
>
>Use an academic referencing style

### 1. **Prepare** - *Collecting and organising relevant data*

#### EPICA Dome C - 800KYr Deuterium Data and Temperature Estimates
The first file I will create a DataFrame for is the [EPICA Dome C - 800KYr Deuterium Data and Temperature Estimates](https://www.ncei.noaa.gov/pub/data/paleo/icecore/antarctica/epica_domec/edc3deuttemp2007.txt) data. 

I based the seperator argument from the answer I saw on a [Stack Overflow question similar to my own](https://stackoverflow.com/a/55473279).

In [33]:
# LIbrary Imports
import pandas as pd

# Path to EDC file
file = 'files/edc3deuttemp2007.txt'

# Create EDC DataFrame
edc = pd.read_csv(file, sep='\s+', skiprows=91)

edc

Unnamed: 0,Bag,ztop,Age,Deuterium,Temperature
0,1,0.00,-50.00000,,
1,2,0.55,-43.54769,,
2,3,1.10,-37.41829,,
3,4,1.65,-31.61153,,
4,5,2.20,-24.51395,,
...,...,...,...,...,...
5795,5796,3187.25,797408.00000,-440.20,-8.73
5796,5797,3187.80,798443.00000,-439.00,-8.54
5797,5798,3188.35,799501.00000,-441.10,-8.88
5798,5799,3188.90,800589.00000,-441.42,-8.92


In [36]:
# Outputting to CSV to have instead of .txt file
edc.to_csv('files/edc3deuttemp.csv')


Next, I would like to drop the index and use the *Bag* column instead (based on [this example](https://pythonexamples.org/pandas-set-column-as-index/)).

In [38]:
edc = pd.read_csv('files/edc3deuttemp.csv', index_col='Bag', usecols=['Bag', 'ztop', 'Age', 'Deuterium', 'Temperature'])
edc

Unnamed: 0_level_0,ztop,Age,Deuterium,Temperature
Bag,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0.00,-50.00000,,
2,0.55,-43.54769,,
3,1.10,-37.41829,,
4,1.65,-31.61153,,
5,2.20,-24.51395,,
...,...,...,...,...
5796,3187.25,797408.00000,-440.20,-8.73
5797,3187.80,798443.00000,-439.00,-8.54
5798,3188.35,799501.00000,-441.10,-8.88
5799,3188.90,800589.00000,-441.42,-8.92


I will now to try to create a DataFrame for the IPCC data sourced from [this link](https://vlegalwaymayo.atu.ie/mod/url/view.php?id=874743) provided by lecturer. 

***
#### IPCC Data (Composite)

In [42]:
# Path to IPCC file
file_path = 'files/grl52461-sup-0003-supplementary.xls'

# Read the 'CO2 Composite' sheet starting from line 15
ipcc = pd.read_excel(file_path, sheet_name='CO2 Composite', skiprows=14, index_col=None)
ipcc

Unnamed: 0,Gasage (yr BP),CO2 (ppmv),sigma mean CO2 (ppmv)
0,-51.030000,368.022488,0.060442
1,-48.000000,361.780737,0.370000
2,-46.279272,359.647793,0.098000
3,-44.405642,357.106740,0.159923
4,-43.080000,353.946685,0.043007
...,...,...,...
1896,803925.284376,202.921723,2.064488
1897,804009.870607,207.498645,0.915083
1898,804522.674630,204.861938,1.642851
1899,805132.442334,202.226839,0.689587
