# Climate Change Visualization Notebook
## Goal
The goal of this project is to visualize temperature change on Earth over a significant time period of at least 500 years. 
I chose this project because climate change, as a contemporary point of debate, is is a very important issue. Although I am staunchly of the belief that climate change is a serious issue, I recognize that there are people who do not share this sentiment or do not consider climate change legitimate. I believe an easily-digested visual representation of climate change, with as little bias as possible, will allow people to form their own opinions on the change, or lack there of, in the Earth's climate.
## Characteristics of Desired Data
For this project, I will need to collect data that spans (preferably) over 500 years.
Through research, I have learned that there are several different techniques for estimating the average temperature for a certain year in the past. I hope to gather several data sets that are the results of different techniques.
I want to gather historical temperature data for as long of a time period as possible while still being able to verify its legitimacy.

## Gathering the Data
### Changing from a Solar System to a Global visualization
I initially wanted to present an animated visualization comparing the variance of atmospheric gases, temperature, or some other metric of climate change of Earth to other planets in the solar system. I liked this idea because it provided the other planets as independent variables to compare Earth to. 

I found [this visualization](https://codepen.io/juliangarnier/pen/idhuG/) which I thought would be perfect for showcasing this data. The graphics had been mostly completed for me; it even has a label showcasing one of Earth's traits. I planned to speed up the revolutions of the planets to show hundreds, or thousands, of years in roughly a minute. I also planned to add an auxiliary time-series chart below the solar system that would be drawn as the planets rotated.

However, after thirty minutes of research for climate data from other planets, I could not find a reputable data source that did not clearly label their data as rough estimates. I believe accurate historical (for the time range I desired) climate change data did not exist.&ast; It seemed unfair to compare the fairly accurate historical data for Earth to the inaccurate estimates for other planets.

&ast;I contemplated using data from the past ten or so years, which was available. I believed this was not a meaningful enough time period for a visualization relating to how planets change.

### Finding the CDIAC datasets
After filtering through several short articles and papers that only included graphics, I found a [repository of historical climate information](http://cdiac.ess-dive.lbl.gov/climate/temp/temp_table.html) that seemed fairly official. I googled the organization that claimed ownership for the data, the [CDIAC](https://en.wikipedia.org/wiki/Carbon_Dioxide_Information_Analysis_Center), or Carbon Dioxide Information Analysis Center, to find that they were a United States Department of Energy organization that ran from 1982-September 30, 2017.

### Finding an API for CDIAC data
I found a [dataset](http://cdiac.ess-dive.lbl.gov/ftp/trends/temp/vostok/vostok.1999.temp.dat) about the Vostok Ice Core that spans over 422,767 years. This dataset seemed to be in a .dat format, which I researched. While researching, it occured to me that this information might be readily available through an API. I could not easily find a CDIAC API. However, it appeared that they had collaborated with [NOAA](http://www.noaa.gov/),the National Oceanic and Atmospheric Administration in collecting this data. I quickly found an [NOAA API](https://www.ncdc.noaa.gov/cdo-web/webservices/v2).

After having significant trouble simply loading the datasets using the NOAA API, I decided to search the error I was getting (`Token parameter is required.`). I quickly found [this blog post](http://emilkirkegaard.dk/en/?p=5904) which described the exact same issue. I realized I had not been placing by URL in quotes and I also had simply been stating my token with a `token:` indicator in front of it. I had used this format for the command:
`curl https://www.ncdc.noaa.gov/cdo-web/api/v2/datasets 123456`
instead of with the proper format:
`curl 'https://www.ncdc.noaa.gov/cdo-web/api/v2/datasets' -H 'token: 123456'`.
The printout from the `curl` is shown below using Python's `requests` library.
    

In [None]:
import requests
url = 'https://www.ncdc.noaa.gov/cdo-web/api/v2/datasets'
# I learned how to use a token with requests at:
# http://docs.python-requests.org/en/master/user/quickstart/
headers = {'token': 'your_token'}
r = requests.get(url, headers=headers)
# r.json()

### Berkeley Data
After encountering some challenges turning the CDIAC data into a usable format such as CSV or GeoJSON, I decided to search again for usable data. I quickly found [Berkeley Earth](http://berkeleyearth.org/) which makes [its data](http://berkeleyearth.org/source-files/) readily available in a format that is easily turned into a CSV.

I found a liking for this data because although it did not cover the date range for which I had hoped, it was gathered by expert scientists and is very detailed.

I turned each `data.txt` file into a CSV by:
1. Opening `data.txt` in Excel
2. Deleting the textual rows before the data started
3. Deleting the "Series Number", "Uncertainty (C)", "Observations", and "Time of Observation" columns. For each of these columns, the data values were identical in every row and not relevant to the visualization.
4. Adding labels to each column. (`station_id`, `date`, and `temperature_c`)
5. Saving the file as a CSV.

### Creating a JSON Object from the CSV
I used [this Stack Overflow page](https://stackoverflow.com/questions/19697846/python-csv-to-json) for guidance.

Also https://courses.cs.washington.edu/courses/cse140/13wi/csv-parsing.html

In [None]:
import csv
import json
import pandas as pd
from math import isnan
from pprint import pprint

In [6]:
def createDataDict(csvVar, siteDetailFileCSV):
    reader = csv.DictReader(csvVar)
    data = {}
    yearIterator = {}
    for row in reader:
        data = {row['station_id']: {row['date'][:4]:float(row['temperature_c'])}}
        yearIterator = {'id':row['station_id'],'year':row['date'][:4],'sum':1}
        break

    for row in reader:
#         get the year from the first four characters of row['date']
        year = row['date'][:4]

#       if changing to a new station_id
        if not row['station_id'] in data:
            data[yearIterator['id']][yearIterator['year']] /= yearIterator['sum']

            data[row['station_id']] = {year: float(row['temperature_c'])}
            yearIterator = {'id':row['station_id'],'year':year, 'sum':1}
#       if changing to a new year with the same station_id
        else:
            if yearIterator['year'] != year:
                data[yearIterator['id']][yearIterator['year']] /= yearIterator['sum']

                data[row['station_id']][year] = float(row['temperature_c'])
                yearIterator = {'id':row['station_id'],'year':year, 'sum':1}
#           if iterating in the same station and the same year
            else:
                data[row['station_id']][year] += float(row['temperature_c'])
                yearIterator['sum'] += 1
    
    site_df = pd.read_csv(siteDetailFileCSV)
    if len(site_df.columns) > 5:
        site_df.drop(site_df.columns[[list(range(6,len(site_df.columns)))]],axis=1,inplace=True)
    site_df['station_name']=site_df['station_name'].str.strip()
    site_df = site_df.loc[site_df['latitude'] != -999.00]

    for i in list(data.keys()):
        if float(i) not in list(site_df['station_id']):
            del data[i]
        else:
            temp = data[i]
            del data[i]
            data[i] = {}
            data[i]['years'] = temp

    for index, row in site_df.iterrows():
        if str(row['station_id']) in data.keys():
            data[str(row['station_id'])]['station_name'] = row['station_name']
            data[str(row['station_id'])]['latitude'] = row['latitude']
            data[str(row['station_id'])]['longitude'] = row['longitude']
            data[str(row['station_id'])]['elevation_m'] = row['elevation_m']

    d = [{'station_id':key,"data":value} for key,value in data.items()] 
    return d
    

In [7]:
csv_names = [{"data": "colonial", "site_detail": "colonial_site_detail", "outfile": "colonialdata"},
            {"data": "monthly_climatic_data", "site_detail":"monthly_climatic_data_site_detail", "outfile": "monthlyclimaticdata"},
            {"data": "ghcn_daily", "site_detail":"ghcn_site_detail", "outfile": "ghcndata"},
            {"data":"world_monthly_surface_station_climatology", "site_detail":"world_monthly_surface_station_climatology_site_detail", "outfile":"wmssc"}]

for i in csv_names:
    dcf = "data_csv_files/"
    ccd = "cleaned_climate_data/"
    data_csv = open(dcf + i["data"] + ".csv", 'r')
    with open(ccd + i["outfile"] + '.json', 'w') as outfile:  
        json.dump(createDataDict(data_csv,dcf + i['site_detail'] + '.csv'), outfile)
# colonial = open('colonial.csv', 'r')
# with open('colonialdata.json', 'w') as outfile:  
#     json.dump(createDataDict(colonial,'colonial_site_detail.csv'), outfile)
    
# mcd = open('monthly_climatic_data.csv')
# with open('monthlyclimaticdata.json', 'w') as outfile:  
#     json.dump(createDataDict(mcd,'monthly_climatic_data_site_detail.csv'), outfile)
    
# ghcn = open('ghcn_daily.csv', 'r')
# with open('ghcndata.json', 'w') as outfile:  
#     json.dump(createDataDict(ghcn,'ghcn_site_detail.csv'), outfile)
    
# wmssc = open('world_monthly_surface_station_climatology.csv', 'r')
# with open ('wmssc.json', 'w') as outfile:
#     json.dump(createDataDict(wmssc,'world_monthly_surface_station_climatology_site_detail.csv'), outfile)

#### Making the Code Above
I went through some trouble in writing the code that made the surfacedata.json file small enough. I particularly had some trouble in averaging the temperatures for each year.
### Connecting Stations To Physical Locations
To show the locations on a physical map, I realized I had to connect the stations from `data.txt` to the corrolary stations in `site_detail.txt`. To do this, I cleaned `site_detail.txt` in a similar fashion as `data.txt` in Excel. For my purposes, the following values for each station were not relevant:
* Latitude Uncertainty
* Longitude Uncertainty
* Elevation Uncertainty
* State/ Province Code
* County
* Time Zone
* WMO ID
* Coop ID
* WBAN ID,
* ICAO ID
* Number of Relocations
* Number of Suggested Relocations
* Number of Sources
* Hash

I chose to only keep the Station ID, Station Name, Latitude, Longitude, Elevation, and Country. I now need to create a JSON object that connects each station to its information provided in the `site_detail.txt` file. I will also filter out the stations where the latitude and longitude are not provided, as they cannot be dynamically located on a global map.

https://stackoverflow.com/questions/41998624/how-to-convert-pandas-dataframe-to-nested-dictionary