# Project 2 - Earthquake Dataset Download 

*Authors: João Victor Barboza, Thais Lins, Vanessa Uchida, Yuri Martins*

# 1 - Description 

> This notebook consists of requests made to the [USGS - Earthquake Catalog](https://earthquake.usgs.gov/earthquakes/search/) website, so that data from 1980 though to 2018 could be collected and used for future analysis. For this, a few librearies were imported and requests were made on an offset of 20000 (that is the limit value in which the website allows downloads) until all the necessary data was collected. And finally, the datasets were merged together, and will be used later on.

In [0]:
# Import packages
import requests
import pandas as pd
import csv
import io

In [0]:
#Base url used for downloading data
total_results = 209_456
base_url = 'https://earthquake.usgs.gov/fdsnws/event/1/query.csv?starttime=1980-01-01%2000:00:00&endtime=2018-10-26%2023:59:59&minmagnitude=4.5&orderby=time&limit=20000&offset={}'

In [0]:
# Download data
downloaded_count = 0
data = []
for i in range(1, total_results, 20_000):
    data.append(requests.get(base_url.format(i)))

In [0]:
#https://earthquake.usgs.gov/fdsnws/event/1/query.csv?starttime=1980-01-01%2000:00:00&endtime=2018-10-26%2023:59:59&minmagnitude=4.5&orderby=time&limit=20000&offset=0
data

[<Response [200]>,
 <Response [200]>,
 <Response [200]>,
 <Response [200]>,
 <Response [200]>,
 <Response [200]>,
 <Response [200]>,
 <Response [200]>,
 <Response [200]>,
 <Response [200]>,
 <Response [200]>]

In [0]:
data[0].text[:100]

'time,latitude,longitude,depth,mag,magType,nst,gap,dmin,rms,net,id,updated,place,type,horizontalError'

In [0]:
dfs = [pd.read_csv(io.StringIO(d.content.decode('utf-8'))) for d in data]

In [0]:
#Merge collected dataframes
from functools import reduce

merged_df = reduce(lambda d1, d2: d1.append(d2, ignore_index=True), dfs)
merged_df.describe()

Unnamed: 0,latitude,longitude,depth,mag,nst,gap,dmin,rms,horizontalError,depthError,magError,magNst
count,209457.0,209457.0,209457.0,209457.0,87625.0,110710.0,37466.0,200202.0,30235.0,95990.0,35817.0,164656.0
mean,3.878726,45.582278,72.683005,4.879487,84.96105,95.211482,4.481166,0.9784,8.411724,8.566884,0.102116,27.747231
std,28.925058,120.435723,118.293309,0.411393,99.297185,47.488716,5.941718,0.302074,5.517162,8.375248,0.061556,46.814892
min,-84.133,-179.999,-2.54,4.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,-17.356,-70.726,10.45,4.6,24.0,59.0,1.286,0.82,6.1,3.6,0.061,5.0
50%,-0.066,103.386,33.0,4.8,48.0,90.4,2.687,0.98,7.9,6.4,0.088,13.0
75%,28.109,142.712,64.2,5.0,103.0,124.2,5.047,1.12,10.1,10.7,0.128,30.0
max,87.221,180.0,700.9,9.1,934.0,353.0,62.626,69.32,99.0,791.3,1.642,941.0


In [0]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 209457 entries, 0 to 209456
Data columns (total 22 columns):
time               209457 non-null object
latitude           209457 non-null float64
longitude          209457 non-null float64
depth              209457 non-null float64
mag                209457 non-null float64
magType            209456 non-null object
nst                87625 non-null float64
gap                110710 non-null float64
dmin               37466 non-null float64
rms                200202 non-null float64
net                209457 non-null object
id                 209457 non-null object
updated            209457 non-null object
place              209452 non-null object
type               209457 non-null object
horizontalError    30235 non-null float64
depthError         95990 non-null float64
magError           35817 non-null float64
magNst             164656 non-null float64
status             209457 non-null object
locationSource     209457 non-null object


In [0]:
# Export merged dataset to csv
merged_df.to_csv('earthquakes.csv')