# CityBikes

Send a request to CityBikes for the city of your choice. 

The city of my choice is Boston, Massachusetts. 
The process of getting information on a specific city is done in 2 steps as described in the 
http://api.citybik.es/v2/ the CityBikes API Documenation.
1. Use the 'http://api.citybik.es/v2/networks' endpoint to `GET` a list of all the network API endpoints for the different cities.
2. Use the endpoint from (1) to `GET` the details of that city.

In [1]:
# initialization - importing the required library
import pandas as pd 
import numpy as np
import requests
# a hack to print everything . change 'all' to 'last_expr' to revert to default
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
# 1. GET the list of network API endpoints 
url = 'http://api.citybik.es/v2/networks'
params = {}
headers = {
        "Accept": "application/json"
    }

response = requests.request("GET", url, params=params, headers=headers)
response.json()

{'networks': [{'company': ['ЗАО «СитиБайк»'],
   'href': '/v2/networks/velobike-moscow',
   'id': 'velobike-moscow',
   'location': {'city': 'Moscow',
    'country': 'RU',
    'latitude': 55.75,
    'longitude': 37.616667},
   'name': 'Velobike'},
  {'company': ['Urban Infrastructure Partner'],
   'href': '/v2/networks/baerum-bysykkel',
   'id': 'baerum-bysykkel',
   'location': {'city': 'Bærum',
    'country': 'NO',
    'latitude': 59.89455,
    'longitude': 10.546343},
   'name': 'Bysykkel'},
  {'company': ['Comunicare S.r.l.'],
   'href': '/v2/networks/bicincitta-siena',
   'id': 'bicincitta-siena',
   'location': {'city': 'Siena',
    'country': 'IT',
    'latitude': 43.3186,
    'longitude': 11.3306},
   'name': 'Bicincittà',
   'source': 'https://www.bicincitta.com/frmLeStazioni.aspx?ID=202'},
  {'company': ['Cyclopolis Systems'],
   'href': '/v2/networks/cyclopolis-maroussi',
   'id': 'cyclopolis-maroussi',
   'location': {'city': 'Maroussi',
    'country': 'GR',
    'latitude':

Searching through the response file, I found the only network that covers Boston Massachusetts is Bluebikes.
**Note**: It is a simple/short .response file, so I used CTRL+F and not the `json` library for analysis)

    
```python
  {'company': ['Motivate International, Inc.', 'PBSC Urban Solutions'],
   'ebikes': True,
   'gbfs_href': 'https://gbfs.bluebikes.com/gbfs/gbfs.json',
   'href': '/v2/networks/blue-bikes',
   'id': 'blue-bikes',
   'location': {'city': 'Boston, MA',
    'country': 'US',
    'latitude': 42.3584308,
    'longitude': -71.0597732},
   'name': 'Blue Bikes'}
```



## Interesting information about Bluebikes.

https://www.boston.gov/bluebikes#:~:text=With%20more%20than%204%2C000%20bikes,to%20get%20around%20metro%20Boston

It is a public-sharing biking system in Boston Massachusetts with more than 4,000 bikes and 400 stations.


In [4]:
# per the instructions, appending the network ID to make a new API endpoint...
url = 'http://api.citybik.es/v2/networks/blue-bikes'
headers = {
        "Accept": "application/json"
    }

params = {"fields": "stations"} # to reduce the results to specific data

response = requests.request("GET", url, params=params, headers=headers)
response.json()

{'network': {'stations': [{'empty_slots': 11,
    'extra': {'ebikes': 0,
     'has_ebikes': True,
     'last_updated': 1682166895,
     'payment': ['key', 'creditcard'],
     'payment-terminal': True,
     'renting': 1,
     'returning': 1,
     'slots': 19,
     'uid': '217'},
    'free_bikes': 6,
    'id': '553ed0300d38108b4f21a6bafa3db70c',
    'latitude': 42.386781,
    'longitude': -71.006098,
    'name': 'Orient Heights T Stop - Bennington St at Saratoga St',
    'timestamp': '2023-04-22T13:02:08.647000Z'},
   {'empty_slots': 9,
    'extra': {'ebikes': 0,
     'has_ebikes': True,
     'last_updated': 1682168185,
     'payment': ['key', 'creditcard'],
     'payment-terminal': True,
     'renting': 1,
     'returning': 1,
     'slots': 33,
     'uid': '212'},
    'free_bikes': 23,
    'id': '93542dcbf21f5411569adb92cd7cc199',
    'latitude': 42.368844082898356,
    'longitude': -71.03977829217911,
    'name': 'Maverick Square - Lewis Mall',
    'timestamp': '2023-04-22T13:02:08.649

Parse through the response to get the details you want for the bike stations in that city (latitude, longitude, number of bikes). 

In [5]:
response_json = response.json()
type(response_json) # dictionary type
response_json.keys() # get the keys

dict

dict_keys(['network'])

In [6]:
response_json['network'].keys()

dict_keys(['stations'])

In [7]:
type(response_json['network']['stations']), len(response_json['network']['stations']) # a list of stations
stations = response_json['network']['stations'] # this is the list of the results I'm interested in

(list, 443)

In [8]:
# Randomly selecting one station as a Prototype to analyze for data
station1 = stations[20]
station1.keys()
for key in station1.keys():
    if key=='extra':
        print(f'Key: \t{key}')
        for extra_key in station1['extra'].keys():
            print(f"\t\tKey: \t{extra_key}\n\t\tValue:\t{station1['extra'][extra_key]}\n")
    else:
        print(f'Key: \t{key}\n\nValue:\t{station1[key]}\n\n')

dict_keys(['empty_slots', 'extra', 'free_bikes', 'id', 'latitude', 'longitude', 'name', 'timestamp'])

Key: 	empty_slots

Value:	12


Key: 	extra
		Key: 	ebikes
		Value:	0

		Key: 	has_ebikes
		Value:	True

		Key: 	last_updated
		Value:	1682153802

		Key: 	payment
		Value:	['key', 'creditcard']

		Key: 	payment-terminal
		Value:	True

		Key: 	renting
		Value:	1

		Key: 	returning
		Value:	1

		Key: 	slots
		Value:	19

		Key: 	uid
		Value:	340

Key: 	free_bikes

Value:	7


Key: 	id

Value:	226a8b1a11e307e5ccd9f4eed3798783


Key: 	latitude

Value:	42.274620671812244


Key: 	longitude

Value:	-71.09372552493369


Key: 	name

Value:	Blue Hill Ave at Almont St


Key: 	timestamp

Value:	2023-04-22T13:02:08.248000Z




Put your parsed results into a DataFrame.

From parsing, I realized that the relevant information are:

- `name`
- `latitude`
- `longitude`
- `slots`  (= empty slots + free bikes and seems to be the bike capacity of the station)
- `ids` - unique ID and a good check for duplication

In [9]:
# For loop to build the dictionary that I will convert to a DataFrame
# Empty lists to append during iteration
station_names = []
station_latitudes = []
station_longitudes = []
station_bikes = []
station_ids = []

for station in stations:

    
    station_names.append(station['name'])
    station_bikes.append(station['extra']['slots'])
    station_latitudes.append(station['latitude'])
    station_longitudes.append(station['longitude'])
    
    station_ids.append(station['id'])
    
station_dictionary = {'Station Name': station_names,
                     'Latitude': station_latitudes,
                     'Longitude': station_longitudes,
                     'Number of Bikes': station_bikes,
                     'Station ID': station_ids}

In [17]:
# Make a DataFrame from the results
stations_df = pd.DataFrame(station_dictionary)
stations_df
stations_df.head()
stations_df.tail()

Unnamed: 0,Station Name,Latitude,Longitude,Number of Bikes,Station ID
0,Orient Heights T Stop - Bennington St at Sarat...,42.386781,-71.006098,19,553ed0300d38108b4f21a6bafa3db70c
1,Maverick Square - Lewis Mall,42.368844,-71.039778,33,93542dcbf21f5411569adb92cd7cc199
2,East Boston Neighborhood Health Center - 20 Ma...,42.369536,-71.039431,16,d9c7ef5dbda4ed944d1bf51fe540acb6
3,Bennington St at Byron St,42.383533,-71.016191,15,0568389e659e679fbe29a5ac12cd49c0
4,Boston East - 126 Border St,42.373312,-71.041020,15,47b79abc28a54d0e4689b1096ceb8466
...,...,...,...,...,...
438,Central Ave at River St,42.270947,-71.073379,15,95aa5117b3fdea2ad2372c7aa74f62f3
439,1200 Beacon St,42.344149,-71.114674,15,a5276fd53136b5c1eeb4d7aa5db405a4
440,Newbury St at Hereford St,42.348717,-71.085954,22,832fc4a60379cdc7824acb0c1dbf5f6c
441,Boylston St at Dartmouth St,42.350193,-71.077442,19,cd299318baee6b2600b5e9f047a558ab


Unnamed: 0,Station Name,Latitude,Longitude,Number of Bikes,Station ID
0,Orient Heights T Stop - Bennington St at Sarat...,42.386781,-71.006098,19,553ed0300d38108b4f21a6bafa3db70c
1,Maverick Square - Lewis Mall,42.368844,-71.039778,33,93542dcbf21f5411569adb92cd7cc199
2,East Boston Neighborhood Health Center - 20 Ma...,42.369536,-71.039431,16,d9c7ef5dbda4ed944d1bf51fe540acb6
3,Bennington St at Byron St,42.383533,-71.016191,15,0568389e659e679fbe29a5ac12cd49c0
4,Boston East - 126 Border St,42.373312,-71.04102,15,47b79abc28a54d0e4689b1096ceb8466


Unnamed: 0,Station Name,Latitude,Longitude,Number of Bikes,Station ID
438,Central Ave at River St,42.270947,-71.073379,15,95aa5117b3fdea2ad2372c7aa74f62f3
439,1200 Beacon St,42.344149,-71.114674,15,a5276fd53136b5c1eeb4d7aa5db405a4
440,Newbury St at Hereford St,42.348717,-71.085954,22,832fc4a60379cdc7824acb0c1dbf5f6c
441,Boylston St at Dartmouth St,42.350193,-71.077442,19,cd299318baee6b2600b5e9f047a558ab
442,Copley Square - Dartmouth St at Boylston St,42.349928,-71.077392,31,9930aaf1ad9f4716ff669b09a5c4684f


#### Cleaning the DataFrame

In [18]:
# checking for null data - No null data
stations_df.info() 
# checking for duplicates
len(stations_df) - len(stations_df.drop_duplicates(subset=None, ignore_index=True))
# further check for null values as '0's
stations_df[stations_df['Latitude'] == 0].count()
stations_df[stations_df['Longitude'] == 0].count()
stations_df.describe() # checking for odd values like min = 0

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 443 entries, 0 to 442
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Station Name     443 non-null    object 
 1   Latitude         443 non-null    float64
 2   Longitude        443 non-null    float64
 3   Number of Bikes  443 non-null    int64  
 4   Station ID       443 non-null    object 
dtypes: float64(2), int64(1), object(2)
memory usage: 17.4+ KB


0

Station Name       0
Latitude           0
Longitude          0
Number of Bikes    0
Station ID         0
dtype: int64

Station Name       0
Latitude           0
Longitude          0
Number of Bikes    0
Station ID         0
dtype: int64

Unnamed: 0,Latitude,Longitude,Number of Bikes
count,443.0,443.0,443.0
mean,42.358307,-71.086913,17.345372
std,0.044396,0.052199,4.905016
min,42.2556,-71.247759,0.0
25%,42.336617,-71.116874,15.0
50%,42.357329,-71.087676,17.0
75%,42.379471,-71.063267,19.0
max,42.5299,-70.880691,53.0


In [19]:
# investigating the rows that are setting the minimum number_of_bikes = 0...
stations_df[stations_df['Number of Bikes'] ==0 ] # one result found

Unnamed: 0,Station Name,Latitude,Longitude,Number of Bikes,Station ID
42,Dartmouth St at Newbury St,42.350961,-71.077828,0,732b2c4db39060e57c324556b6661ffd


This is the dictionary of this result from the `results.json()`:

```python
{'empty_slots': 0,
    'extra': {'ebikes': 0,
     'has_ebikes': True,
     'last_updated': 86400,
     'payment': ['key', 'creditcard'],
     'payment-terminal': True,
     'renting': 1,
     'returning': 1,
     'slots': 0,
     'uid': '370'},
    'free_bikes': 0,
    'id': '732b2c4db39060e57c324556b6661ffd',
    'latitude': 42.35096144421219,
    'longitude': -71.07782810926437,
    'name': 'Dartmouth St at Newbury St',
    'timestamp': '2023-04-22T12:32:05.589000Z'}
```

It does not seem realistic that a bike station will have empty slots/no slots/zero bikes. With no other information, I have decided to fill its empty value with the mean of the 'Number of Bikes'.


In [20]:
# filling it with the mean value
stations_df.loc[42, 'Number of Bikes'] = int(stations_df['Number of Bikes'].mean())
# rechecking data
stations_df.describe() 

Unnamed: 0,Latitude,Longitude,Number of Bikes
count,443.0,443.0,443.0
mean,42.358307,-71.086913,17.383747
std,0.044396,0.052199,4.835007
min,42.2556,-71.247759,9.0
25%,42.336617,-71.116874,15.0
50%,42.357329,-71.087676,17.0
75%,42.379471,-71.063267,19.0
max,42.5299,-70.880691,53.0


In [14]:
# Saving DataFrame into folder
stations_df.to_csv('../data/stations_df.csv', index=False)
# checking contents
pd.read_csv('../data/stations_df.csv')

Unnamed: 0,Station Name,Latitude,Longitude,Number of Bikes,Station ID
0,Orient Heights T Stop - Bennington St at Sarat...,42.386781,-71.006098,19,553ed0300d38108b4f21a6bafa3db70c
1,Maverick Square - Lewis Mall,42.368844,-71.039778,33,93542dcbf21f5411569adb92cd7cc199
2,East Boston Neighborhood Health Center - 20 Ma...,42.369536,-71.039431,16,d9c7ef5dbda4ed944d1bf51fe540acb6
3,Bennington St at Byron St,42.383533,-71.016191,15,0568389e659e679fbe29a5ac12cd49c0
4,Boston East - 126 Border St,42.373312,-71.041020,15,47b79abc28a54d0e4689b1096ceb8466
...,...,...,...,...,...
438,Central Ave at River St,42.270947,-71.073379,15,95aa5117b3fdea2ad2372c7aa74f62f3
439,1200 Beacon St,42.344149,-71.114674,15,a5276fd53136b5c1eeb4d7aa5db405a4
440,Newbury St at Hereford St,42.348717,-71.085954,22,832fc4a60379cdc7824acb0c1dbf5f6c
441,Boylston St at Dartmouth St,42.350193,-71.077442,19,cd299318baee6b2600b5e9f047a558ab
