# Dados de Bicicleta

Esse notebook é responsável por capturar os dados da API de dados abertos do programa Citibike, com dados de bicicletas compartilhadas de Nova Iorque. <br>
https://ride.citibikenyc.com/system-data

In [1]:
import requests
import zipfile
from io import BytesIO
import os
import pandas as pd

In [2]:
project_file = os.getcwd()

In [5]:
# Fazendo o download dos arquivos raw

year = '2019'
for month in range(1, 13):
    url = f"https://s3.amazonaws.com/tripdata/{year}{month:02d}-citibike-tripdata.csv.zip"
    print("Fazendo download de", url)
    req = requests.get(url)
    zf= zipfile.ZipFile(BytesIO(req.content))
    zf.extractall(f"{project_file}/Dados/CitiBikeRaw/")

Fazendo download de https://s3.amazonaws.com/tripdata/201901-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201902-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201903-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201904-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201905-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201906-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201907-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201908-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201909-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201910-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.com/tripdata/201911-citibike-tripdata.csv.zip
Fazendo download de https://s3.amazonaws.co

In [3]:
from os import listdir
from os.path import isfile, join

citibikeraw_path = f"{project_file}\Dados\CitiBikeRaw"

file_paths = [f for f in listdir(citibikeraw_path) if isfile(join(citibikeraw_path, f))]

Capturamos todos os dados raw, extraímos uma amostragem de cada um e agrupamos em um único dataframe.

A amostragem se faz necessária pois são muitos dados e não temos capacidade de tratar todos, mas pelo grande volume a extração mantém as propriedades das distribuições.

In [4]:
dfs = []
for datapath in file_paths:
    original_df = pd.read_csv(f"{citibikeraw_path}/{datapath}")
    original_df.columns = ["tripduration","starttime","stoptime","start station id","start station name","start station latitude","start station longitude","end station id","end station name","end station latitude","end station longitude","bikeid","usertype","birth year","gender"]
    
    quantidade_sample = int(len(original_df) * 0.7)

    print(datapath, "- sample", quantidade_sample)

    sample_df = original_df.sample(quantidade_sample, random_state=42)
    dfs.append(sample_df)

201701-citibike-tripdata.csv - sample 508673
201702-citibike-tripdata.csv - sample 554152
201703-citibike-tripdata.csv - sample 509365
201704-citibike-tripdata.csv - sample 920782
201705-citibike-tripdata.csv - sample 1066287
201706-citibike-tripdata.csv - sample 1212115
201707-citibike-tripdata.csv - sample 1214919
201708-citibike-tripdata.csv - sample 1271548
201709-citibike-tripdata.csv - sample 1314668
201710-citibike-tripdata.csv - sample 1328314
201711-citibike-tripdata.csv - sample 931454
201712-citibike-tripdata.csv - sample 622976
201801-citibike-tripdata.csv - sample 503295
201802-citibike-tripdata.csv - sample 590179
201803-citibike-tripdata.csv - sample 683670
201804-citibike-tripdata.csv - sample 915280
201805-citibike-tripdata.csv - sample 1277297
201806-citibike-tripdata.csv - sample 1367172
201807-citibike-tripdata.csv - sample 1339537
201808-citibike-tripdata.csv - sample 1384023
201809-citibike-tripdata.csv - sample 1314518
201810-citibike-tripdata.csv - sample 131505

In [5]:
df_complete = pd.concat(dfs, axis=0, ignore_index=True)

In [6]:
df_complete

Unnamed: 0,tripduration,starttime,stoptime,start station id,start station name,start station latitude,start station longitude,end station id,end station name,end station latitude,end station longitude,bikeid,usertype,birth year,gender
0,393,2017-01-21 23:04:08,2017-01-21 23:10:41,441.0,E 52 St & 2 Ave,40.756014,-73.967416,527.0,E 33 St & 2 Ave,40.744023,-73.976056,26621,Subscriber,1985.0,1
1,726,2017-01-19 08:50:17,2017-01-19 09:02:24,2008.0,Little West St & 1 Pl,40.705693,-74.016777,330.0,Reade St & Broadway,40.714505,-74.005628,18557,Subscriber,1979.0,1
2,608,2017-01-31 07:45:54,2017-01-31 07:56:03,523.0,W 38 St & 8 Ave,40.754666,-73.991382,491.0,E 24 St & Park Ave S,40.740964,-73.986022,19212,Subscriber,1974.0,1
3,208,2017-01-30 15:16:52,2017-01-30 15:20:21,3382.0,Carroll St & Smith St,40.680611,-73.994758,3321.0,Clinton St & Union St,40.683116,-73.997853,19092,Subscriber,1973.0,1
4,311,2017-01-27 16:02:02,2017-01-27 16:07:14,3389.0,Carroll St & Columbia St,40.683046,-74.003486,3344.0,Pioneer St & Van Brunt St,40.679043,-74.011169,16971,Subscriber,1985.0,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
38125261,1201,2019-12-11 16:52:18.5060,2019-12-11 17:12:20.0610,3733.0,Avenue C & E 18 St,40.730563,-73.973984,168.0,W 18 St & 6 Ave,40.739713,-73.994564,30702,Subscriber,1987.0,2
38125262,497,2019-12-18 14:55:54.7220,2019-12-18 15:04:11.7890,3561.0,37 Ave & 35 St,40.753111,-73.927992,3597.0,43 St & Broadway,40.757728,-73.916637,16094,Subscriber,1966.0,1
38125263,557,2019-12-02 07:49:13.8970,2019-12-02 07:58:31.8290,72.0,W 52 St & 11 Ave,40.767272,-73.993929,3233.0,E 48 St & 5 Ave,40.757246,-73.978059,39569,Subscriber,1989.0,1
38125264,503,2019-12-12 09:09:37.3970,2019-12-12 09:18:00.4740,505.0,6 Ave & W 33 St,40.749013,-73.988484,477.0,W 41 St & 8 Ave,40.756405,-73.990026,19352,Subscriber,1987.0,2


In [7]:
df_complete.to_parquet(f"{project_file}/Dados/citibike_70percent.parquet")

# Dados de Clima-Tempo

Também recuperamos dados de clima-tempo da organização "National Centers for Environmental Information (NOAA)" relativos ao Central Park. <br>
https://www.ncei.noaa.gov/access/past-weather/nyc%2C%20New%20York

In [10]:
df_tempo = pd.read_csv("Dados/NOAA/historical_central_park.csv")

In [16]:
df_tempo

Unnamed: 0,Date,TAVG (Degrees Fahrenheit),TMAX (Degrees Fahrenheit),TMIN (Degrees Fahrenheit),PRCP (Inches),SNOW (Inches),SNWD (Inches)
0,1869-01-01,,29.0,19.0,0.75,9.0,
1,1869-01-02,,27.0,21.0,0.03,0.0,
2,1869-01-03,,35.0,27.0,0.00,0.0,
3,1869-01-04,,37.0,34.0,0.18,0.0,
4,1869-01-05,,43.0,37.0,0.05,0.0,
...,...,...,...,...,...,...,...
56191,2022-11-06,,75.0,66.0,0.00,0.0,0.0
56192,2022-11-07,,77.0,54.0,0.00,0.0,0.0
56193,2022-11-08,,58.0,47.0,0.00,0.0,0.0
56194,2022-11-09,,53.0,40.0,0.00,0.0,0.0
