# Data Extraction


---


**Mentor:**
  - ***Professor Richard Sowers***, Department of Industrial and Systems Engineering, University of Illinois at Urbana-Champaign (UIUC).

**Group Members:**
  - ***Advika Pattiwar*** (linkedin.com/in/advika-pattiwar)
  - ***Dhruv Borda*** (linkedin.com/thebordadhruv)
  - ***Hrithik Rathi*** (linkedin.com/in/hrithik-rathi)
  - ***Suvrata Gayathri Kappagantula*** (linkedin.com/in/gayathrikappagantula)


---


In this project, we will be working with two datasets: a debugging dataset and a working dataset. It's important to understand their characteristics and how we will handle them.

**Primary Goals:**

In this project, we will be working with two datasets: a debugging dataset and a working dataset. It's important to understand their characteristics and how we will handle them.

1. **Debugging Dataset**
   - The debugging dataset is intentionally kept small. It's designed for testing our code efficiently, and reasonable code should run on it in about 2 minutes.

2. **Working Dataset**
   - The working dataset, on the other hand, is the main dataset we'll use for our project. It's larger and more representative of the problem we're tackling. However, we need to ensure that training on this dataset doesn't take excessively long, ideally, no more than 40 minutes.

3. **Data Conversion to Pandas**
   - To start our project, we'll convert both datasets into Pandas DataFrames for ease of manipulation and analysis. Additionally, we'll pay special attention to datetime columns and convert them into Pandas timestamps. This conversion will enable us to work with time deltas and perform various time-related operations seamlessly.

4. **Data Serialization with Pickle**
   - To optimize data loading and storage, we'll use the [`pandas.DataFrame.to_pickle`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_pickle.html) method. This allows us to serialize our data into a binary file format, which can be loaded much faster and efficiently, while preserving the correct data types.

In [None]:
!pip install boto3

Collecting boto3
  Downloading boto3-1.29.6-py3-none-any.whl (135 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.8/135.8 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting botocore<1.33.0,>=1.32.6
  Downloading botocore-1.32.6-py3-none-any.whl (11.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.5/11.5 MB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting jmespath<2.0.0,>=0.7.1
  Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.8.0,>=0.7.0
  Using cached s3transfer-0.7.0-py3-none-any.whl (79 kB)
Collecting python-dateutil<3.0.0,>=2.1
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting urllib3<2.1,>=1.25.4
  Downloading urllib3-2.0.7-py3-none-any.whl (124 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.2/124.2 kB[0m [31m799.4 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting six>=1.5
  Using c

In [None]:
import urllib.request
from zipfile import ZipFile
import re

import os
import requests

import io
from io import StringIO
from io import BytesIO

import boto3
from botocore import UNSIGNED
from botocore.client import Config

import pandas as pd

# Data Extraction

## Method 1: Retrieving Data from External URLs

- This method involves downloading data from external URLs, specifically CSV files contained within ZIP archives. It uses various libraries to fetch, unzip, and process the data, ultimately organizing it into separate DataFrames for different time periods based on the provided URLs.

In [None]:
# data_urls = [
#     "https://s3.amazonaws.com/tripdata/202309-citibike-tripdata.csv.zip",
#     "https://s3.amazonaws.com/tripdata/202308-citibike-tripdata.csv.zip",
#     "https://s3.amazonaws.com/tripdata/202307-citibike-tripdata.csv.zip",
#     "https://s3.amazonaws.com/tripdata/202306-citibike-tripdata.csv.zip",
#     "https://s3.amazonaws.com/tripdata/202305-citibike-tripdata.csv.zip",
#     "https://s3.amazonaws.com/tripdata/202304-citibike-tripdata.csv.zip"
# ]

# data_frames = {}

# for data_url in data_urls:
#     with urllib.request.urlopen(data_url) as url:
#         data = []
#         with ZipFile(BytesIO(url.read())) as my_zip_file:
#             for contained_file in my_zip_file.namelist():
#                 for line in my_zip_file.open(contained_file).readlines():
#                     s = str(line, 'unicode_escape')
#                     s = re.sub(r"\n", "", s)
#                     s = re.sub(r"\"", "", s)
#                     line_s = s.split(",")
#                     data.append(line_s)

#         month_year = re.search(r'(\d{6})', data_url).group(1)

#         df = pd.DataFrame(data)

#         data_frames[month_year] = df

# debugging_df = data_frames['202309'].sample(n=10000, random_state=1).copy()
# working_df = pd.concat([data_frames[key] for key in sorted(data_frames.keys(), reverse=True)], ignore_index=True).sample(n=100000, random_state=1)

## Method 2: Retrieving Data from AWS S3 Bucket

- This method entails fetching data directly from an AWS S3 bucket. AWS's Simple Storage Service (S3) provides a scalable object storage system, which is widely used for data storage and retrieval. In this method, we use the boto3 Python library to access and download data files stored in a specified S3 bucket. The data, once retrieved, is then read into a Pandas DataFrame for further analysis and processing.

**Note:**

For the datasets we've provided in this project:
- We have **not** allowed public users to directly import all the CSV files for both the "debugging dataset" and the "working dataset".

- We've included a commented-out Python example below showing how one might import these CSVs. However, for better efficiency and user experience, we've chosen an alternative method.

- Instead of raw CSVs, we've **preprocessed the data and stored it in pickle format**. This allows for faster loading and a reduced file size, offering benefits like:
  - **Efficiency**: Loading data from a pickle is generally faster than from a CSV.
  - **Size**: Pickle files can be more space-efficient.
  - **Simplicity**: Users can begin analyses without additional data wrangling.

By following this approach, we provide direct access to the pickle files for both datasets.


### Weather Dataset

In [None]:
# AWS S3 bucket details
BUCKET_NAME = 'dhruvborda-project-nyccitibikerentals'
FILE_KEY = 'Dataset/Weather_DailySummaries.csv'

# Initialize S3 client
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))

obj = s3.get_object(Bucket=BUCKET_NAME, Key=FILE_KEY)
DailyWeather = pd.read_csv(StringIO(obj['Body'].read().decode('utf-8')))

### Debugging Dataset

In [None]:
# AWS S3 bucket details
BUCKET_NAME = 'dhruvborda-project-nyccitibikerentals'
FOLDER_PATH = 'Dataset/Debugging Dataset/'

# Initialize S3 client
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))

# List files in the specified S3 bucket directory
objects = s3.list_objects_v2(Bucket=BUCKET_NAME, Prefix=FOLDER_PATH)

file_list = [content['Key'] for content in objects.get('Contents') if content['Key'].endswith('.csv')]

data_frames = {}

for file in file_list:
    # Read the S3 object directly into a pandas DataFrame
    csv_obj = s3.get_object(Bucket=BUCKET_NAME, Key=file)
    csv_body = csv_obj['Body'].read().decode('utf-8')

    df_name = os.path.splitext(os.path.basename(file))[0]
    data_frames[df_name] = pd.read_csv(StringIO(csv_body))

for csv_name in data_frames.keys():
    print("CSV file name:", csv_name)

debugging_df = pd.concat(data_frames.values(), ignore_index=True).sample(n=10000, random_state=1)

  data_frames[df_name] = pd.read_csv(StringIO(csv_body))


CSV file name: 202309-citibike-tripdata


### Working Dataset

In [None]:
# AWS S3 bucket details
BUCKET_NAME = 'dhruvborda-project-nyccitibikerentals'
FOLDER_PATH = 'Dataset/Working Dataset/'

# Initialize S3 client
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))

# List files in the specified S3 bucket directory
objects = s3.list_objects_v2(Bucket=BUCKET_NAME, Prefix=FOLDER_PATH)

file_list = [content['Key'] for content in objects.get('Contents') if content['Key'].endswith('.csv')]

data_frames = {}

for file in file_list:
    # Read the S3 object directly into a pandas DataFrame
    csv_obj = s3.get_object(Bucket=BUCKET_NAME, Key=file)
    csv_body = csv_obj['Body'].read().decode('utf-8')

    df_name = os.path.splitext(os.path.basename(file))[0]
    data_frames[df_name] = pd.read_csv(StringIO(csv_body))

for csv_name in data_frames.keys():
    print("CSV file name:", csv_name)

working_df = pd.concat(data_frames.values(), ignore_index=True).sample(n=1000000, random_state=1)

  data_frames[df_name] = pd.read_csv(StringIO(csv_body))
  data_frames[df_name] = pd.read_csv(StringIO(csv_body))
  data_frames[df_name] = pd.read_csv(StringIO(csv_body))
  data_frames[df_name] = pd.read_csv(StringIO(csv_body))
  data_frames[df_name] = pd.read_csv(StringIO(csv_body))
  data_frames[df_name] = pd.read_csv(StringIO(csv_body))


CSV file name: 202304-citibike-tripdata
CSV file name: 202305-citibike-tripdata
CSV file name: 202306-citibike-tripdata
CSV file name: 202307-citibike-tripdata
CSV file name: 202308-citibike-tripdata
CSV file name: 202309-citibike-tripdata


# Data Preprocessing

## Weather Dataset

In [None]:
print('Data Info before Preprocessing', DailyWeather.info())

DailyWeather.drop(['NAME', 'LATITUDE', 'LONGITUDE', 'ELEVATION', 'PGTM', 'TAVG'], axis=1, inplace=True)
DailyWeather['DATE'] = pd.to_datetime(DailyWeather['DATE'])
DailyWeather.fillna(0, inplace=True)

print('Data Info after Preprocessing', DailyWeather.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 183 entries, 0 to 182
Data columns (total 22 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   STATION    183 non-null    object 
 1   NAME       183 non-null    object 
 2   LATITUDE   183 non-null    float64
 3   LONGITUDE  183 non-null    float64
 4   ELEVATION  183 non-null    float64
 5   DATE       183 non-null    object 
 6   AWND       183 non-null    float64
 7   PGTM       0 non-null      float64
 8   PRCP       183 non-null    float64
 9   SNOW       183 non-null    float64
 10  SNWD       183 non-null    float64
 11  TAVG       0 non-null      float64
 12  TMAX       183 non-null    int64  
 13  TMIN       183 non-null    int64  
 14  WDF2       183 non-null    int64  
 15  WDF5       182 non-null    float64
 16  WSF2       183 non-null    float64
 17  WSF5       182 non-null    float64
 18  WT01       79 non-null     float64
 19  WT02       8 non-null      float64
 20  WT03      

## Debugging Dataset

In [None]:
print('Data Info before Preprocessing',debugging_df.info())

debugging_df['started_at'] = pd.to_datetime(debugging_df['started_at'])
debugging_df['ended_at'] = pd.to_datetime(debugging_df['ended_at'])

debugging_df['date'] = debugging_df['started_at'].dt.date
debugging_df['date'] = pd.to_datetime(debugging_df['date'])

debugging_df['year'] = debugging_df['started_at'].dt.year
debugging_df['month'] = debugging_df['started_at'].dt.month
debugging_df['week'] = debugging_df['started_at'].dt.isocalendar().week
debugging_df['day'] = debugging_df['started_at'].dt.day
debugging_df['weekday'] = debugging_df['started_at'].dt.weekday
debugging_df['weekday_name'] = debugging_df['started_at'].dt.day_name()
debugging_df['hour'] = debugging_df['started_at'].dt.hour

debugging_df['start_time'] = debugging_df['started_at'].dt.time
debugging_df['end_time'] = debugging_df['ended_at'].dt.time

debugging_df['trip_duration_in_minutes'] = (debugging_df['ended_at'] - debugging_df['started_at']).dt.total_seconds() / 60

debugging_df.dropna(inplace=True)

print('Data Info after Preprocessing',debugging_df.info())

<class 'pandas.core.frame.DataFrame'>
Index: 10000 entries, 3562429 to 1447365
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   ride_id             10000 non-null  object 
 1   rideable_type       10000 non-null  object 
 2   started_at          10000 non-null  object 
 3   ended_at            10000 non-null  object 
 4   start_station_name  9992 non-null   object 
 5   start_station_id    9992 non-null   object 
 6   end_station_name    9977 non-null   object 
 7   end_station_id      9977 non-null   object 
 8   start_lat           10000 non-null  float64
 9   start_lng           10000 non-null  float64
 10  end_lat             9992 non-null   float64
 11  end_lng             9992 non-null   float64
 12  member_casual       10000 non-null  object 
dtypes: float64(4), object(9)
memory usage: 1.1+ MB
Data Info before Preprocessing None
<class 'pandas.core.frame.DataFrame'>
Index: 9972 entries, 356242

### Merging Debugging Dataset with Weather Data

In [None]:
debugging = pd.merge(debugging_df, DailyWeather, left_on='date', right_on='DATE', how='left')
debugging.drop(['DATE', 'STATION'], axis=1, inplace=True)
debugging.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9972 entries, 0 to 9971
Data columns (total 38 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   ride_id                   9972 non-null   object        
 1   rideable_type             9972 non-null   object        
 2   started_at                9972 non-null   datetime64[ns]
 3   ended_at                  9972 non-null   datetime64[ns]
 4   start_station_name        9972 non-null   object        
 5   start_station_id          9972 non-null   object        
 6   end_station_name          9972 non-null   object        
 7   end_station_id            9972 non-null   object        
 8   start_lat                 9972 non-null   float64       
 9   start_lng                 9972 non-null   float64       
 10  end_lat                   9972 non-null   float64       
 11  end_lng                   9972 non-null   float64       
 12  member_casual       

### Saving and Loading Debugging Dataframe using Pickle File

In [None]:
# Convert the DataFrame to pickle format
pickle_data = BytesIO()
debugging.to_pickle(pickle_data)

# Set the S3 path where you want to store the pickle file
s3_path = 'Dataset/debugging.pkl'

# Upload the pickle data to S3
s3.put_object(Bucket=BUCKET_NAME, Key=s3_path, Body=pickle_data.getvalue())

{'ResponseMetadata': {'RequestId': 'BJST9WAJBDRQ0YP1',
  'HostId': 'owOXusBHgLsncCuYHuRzq2tD4Bg6IsLbyCzaXZ9n9Ed36lqdQE0tJVH/UmV32SjX9NgPylRiljg=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'owOXusBHgLsncCuYHuRzq2tD4Bg6IsLbyCzaXZ9n9Ed36lqdQE0tJVH/UmV32SjX9NgPylRiljg=',
   'x-amz-request-id': 'BJST9WAJBDRQ0YP1',
   'date': 'Mon, 27 Nov 2023 03:27:14 GMT',
   'x-amz-server-side-encryption': 'AES256',
   'etag': '"cf12be811c0a1692f1c8f1829e4b84c5"',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'ETag': '"cf12be811c0a1692f1c8f1829e4b84c5"',
 'ServerSideEncryption': 'AES256'}

In [None]:
# Load the data from the URL
url = "https://s3-us-east-2.amazonaws.com/dhruvborda-project-nyccitibikerentals/Dataset/debugging.pkl"
response = requests.get(url)

if response.status_code == 200:
    debugging = pd.read_pickle(io.BytesIO(response.content))
    print("Data loaded successfully.")
else:
    print(f"Failed to download debugging.pkl. Status code: {response.status_code}")
    exit()  # Exit if data loading fails

debugging.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,...,TMAX,TMIN,WDF2,WDF5,WSF2,WSF5,WT01,WT02,WT03,WT08
0,FA36FE47D3A88A26,classic_bike,2023-09-09 18:38:34,2023-09-09 18:57:42,E 47 St & 2 Ave,6498.1,Laight St & Hudson St,5539.06,40.753406,-73.97095,...,84,70,70,60.0,15.0,25.1,1.0,0.0,1.0,1.0
1,7DA2BBBFACB65FEE,classic_bike,2023-09-11 19:28:31,2023-09-11 19:46:35,S 3 St & Bedford Ave,5235.05,Lawrence St & Willoughby St,4596.09,40.712564,-73.96269,...,82,69,210,220.0,10.1,15.0,1.0,1.0,1.0,0.0
2,A34B90C2EDBDB134,classic_bike,2023-09-26 19:01:07,2023-09-26 19:20:57,5 Ave & E 87 St,7323.09,E 114 St & 1 Ave,7540.02,40.782576,-73.959704,...,60,54,60,30.0,14.1,19.9,1.0,0.0,0.0,0.0
3,97FEC210C1E0BA88,classic_bike,2023-09-20 07:18:40,2023-09-20 07:20:09,West Drive & Prospect Park West,3651.04,Prospect Park West & 8 St,3722.04,40.661218,-73.979227,...,74,58,300,250.0,10.1,15.0,0.0,0.0,0.0,0.0
4,5428792F62754CE8,classic_bike,2023-09-19 18:54:43,2023-09-19 19:18:35,Canal St & Rutgers St,5303.08,Wyckoff Av & Jefferson St,5051.01,40.714311,-73.989925,...,72,58,290,270.0,13.0,19.9,0.0,0.0,0.0,0.0


## Working Dataset

In [None]:
print('Data Types before Preprocessing',working_df.dtypes)

working_df['started_at'] = pd.to_datetime(working_df['started_at'])
working_df['ended_at'] = pd.to_datetime(working_df['ended_at'])

working_df['date'] = working_df['started_at'].dt.date
working_df['date'] = pd.to_datetime(working_df['date'])

working_df['year'] = working_df['started_at'].dt.year
working_df['month'] = working_df['started_at'].dt.month
working_df['week'] = working_df['started_at'].dt.isocalendar().week
working_df['day'] = working_df['started_at'].dt.day
working_df['weekday'] = working_df['started_at'].dt.weekday
working_df['weekday_name'] = working_df['started_at'].dt.day_name()
working_df['hour'] = working_df['started_at'].dt.hour

working_df['start_time'] = working_df['started_at'].dt.time
working_df['end_time'] = working_df['ended_at'].dt.time

working_df['trip_duration_in_minutes'] = (working_df['ended_at'] - working_df['started_at']).dt.total_seconds() / 60

working_df.dropna(inplace=True)

print('Data Types before Preprocessing',working_df.dtypes)

Data Types before Preprocessing ride_id                object
rideable_type          object
started_at             object
ended_at               object
start_station_name     object
start_station_id       object
end_station_name       object
end_station_id         object
start_lat             float64
start_lng             float64
end_lat               float64
end_lng               float64
member_casual          object
dtype: object
Data Types before Preprocessing ride_id                             object
rideable_type                       object
started_at                  datetime64[ns]
ended_at                    datetime64[ns]
start_station_name                  object
start_station_id                    object
end_station_name                    object
end_station_id                      object
start_lat                          float64
start_lng                          float64
end_lat                            float64
end_lng                            float64
member_casual   

### Merging Working Dataset with Weather Data

In [None]:
working = pd.merge(working_df, DailyWeather, left_on='date', right_on='DATE', how='left')
working.drop(['DATE', 'STATION'], axis=1, inplace=True)
working.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 997248 entries, 0 to 997247
Data columns (total 38 columns):
 #   Column                    Non-Null Count   Dtype         
---  ------                    --------------   -----         
 0   ride_id                   997248 non-null  object        
 1   rideable_type             997248 non-null  object        
 2   started_at                997248 non-null  datetime64[ns]
 3   ended_at                  997248 non-null  datetime64[ns]
 4   start_station_name        997248 non-null  object        
 5   start_station_id          997248 non-null  object        
 6   end_station_name          997248 non-null  object        
 7   end_station_id            997248 non-null  object        
 8   start_lat                 997248 non-null  float64       
 9   start_lng                 997248 non-null  float64       
 10  end_lat                   997248 non-null  float64       
 11  end_lng                   997248 non-null  float64       
 12  me

### Saving and Loading Working Dataframe using Pickle File

In [None]:
# Convert the DataFrame to pickle format
pickle_data = BytesIO()
working.to_pickle(pickle_data)

# Set the S3 path where you want to store the pickle file
s3_path = 'Dataset/working.pkl'

# Upload the pickle data to S3
s3.put_object(Bucket=BUCKET_NAME, Key=s3_path, Body=pickle_data.getvalue())

{'ResponseMetadata': {'RequestId': 'JRVJPWB2A9B1ZRDX',
  'HostId': 'RcKf82ai9i+MXg9dxN6hHTho+9kHsLQh5/VrF9IUmDzo5hbHHqdGaGIckTJ7qzY5NQ2S7k5GwN0=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'RcKf82ai9i+MXg9dxN6hHTho+9kHsLQh5/VrF9IUmDzo5hbHHqdGaGIckTJ7qzY5NQ2S7k5GwN0=',
   'x-amz-request-id': 'JRVJPWB2A9B1ZRDX',
   'date': 'Mon, 27 Nov 2023 04:15:44 GMT',
   'x-amz-server-side-encryption': 'AES256',
   'etag': '"64651154e613d7db00475fa9ba01fea1"',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'ETag': '"64651154e613d7db00475fa9ba01fea1"',
 'ServerSideEncryption': 'AES256'}

In [None]:
# Load the data from the URL
url = "https://s3-us-east-2.amazonaws.com/dhruvborda-project-nyccitibikerentals/Dataset/working.pkl"
response = requests.get(url)

if response.status_code == 200:
    working = pd.read_pickle(io.BytesIO(response.content))
    print("Data loaded successfully.")
else:
    print(f"Failed to download debugging.pkl. Status code: {response.status_code}")
    exit()  # Exit if data loading fails

working.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,...,TMAX,TMIN,WDF2,WDF5,WSF2,WSF5,WT01,WT02,WT03,WT08
0,E40E510853C33688,classic_bike,2023-07-26 23:25:09,2023-07-26 23:30:37,6 Ave & Canal St,5500.07,Greenwich St & Perry St,5922.04,40.722399,-74.005724,...,87,68,200,220.0,8.9,15.0,1.0,0.0,0.0,0.0
1,BF9D4640F2DDA41E,classic_bike,2023-04-24 13:06:46,2023-04-24 13:18:06,King St & Varick St,5687.11,Gramercy Park N & Gramercy Park E,6013.12,40.72789,-74.005243,...,62,44,290,290.0,10.1,16.1,0.0,0.0,0.0,0.0
2,588E2D056D1C3CCB,classic_bike,2023-08-17 17:52:02,2023-08-17 18:16:29,William St & Pine St,5065.12,Lexington Ave & E 26 St,6089.08,40.706872,-74.009108,...,79,72,120,120.0,12.1,19.9,1.0,0.0,0.0,1.0
3,9B205CFE74B13CAD,classic_bike,2023-07-04 10:41:26,2023-07-04 10:55:32,Lexington Ave & E 26 St,6089.08,E 47 St & 2 Ave,6498.1,40.741459,-73.983293,...,83,73,230,230.0,10.1,14.1,1.0,1.0,1.0,1.0
4,08A16B1DE7CA6DD9,classic_bike,2023-08-25 17:42:30,2023-08-25 17:51:50,Nostrand Ave & Myrtle Ave,4707.04,Lafayette Ave & Stuyvesant Ave,4576.11,40.69527,-73.952381,...,78,69,200,100.0,8.9,28.0,1.0,0.0,0.0,1.0
