### Importing Required Libraries
In this section, we import the necessary libraries to facilitate the development of the project:

- pandas: A powerful library for data manipulation and analysis, commonly used to handle structured datasets.

In [1]:
import pandas as pd

### Merging and Preparing Datasets

In this section, two cleaned datasets are merged based on their respective date columns. This process allows for combining data from different sources for comprehensive analysis.

#### Steps in the Code:

1. Load Cleaned Datasets:

- The cleaned datasets API_data_Cleaned.csv and us_accidents_cleaned.csv are loaded into pandas DataFrames.

2. Convert Dates:

- Ensure that the crash_date and start_time columns are in datetime format.
- Extract only the date portion from these columns to facilitate accurate filtering and merging.

3. Filter by City:

- Filter both datasets to include only rows where the city is "New York".

4. Merge Datasets:

- Perform an inner join on the two datasets using the date columns as keys (crash_date and start_time). This ensures that only rows with matching dates are retained.

5. Clean and Reorganize Columns:

- Remove redundant city columns (city_x and city_y) resulting from the merge.
- Add a unified city column with the value "New York".
- Rearrange the DataFrame to place the city column at the beginning.

6. Add a Period Column:

- Create a new column that contains the month and year (YYYY-MM) derived from the crash_date column. This enables temporal grouping for analysis.

7. Sort Data:

- Order the rows by crash_date in ascending order for better readability and analysis.

In [5]:
# Load the CSV files
API_merge = pd.read_csv('../data/API_data_Cleaned.csv')
db_merge = pd.read_csv('../data/us_accidents_cleaned.csv')

# Convert date columns to datetime format and use only the date
API_merge['crash_date'] = pd.to_datetime(API_merge['crash_date']).dt.date  # Use only the date
db_merge['start_time'] = pd.to_datetime(db_merge['start_time']).dt.date  # Use only the date

# Filter both datasets for rows where the city is 'New York'
api_data_ny = API_merge[API_merge['city'] == 'New York']
us_accidents_ny = db_merge[db_merge['city'] == 'New York']

# Merge the two datasets based on the date (inner join)
merged_df = pd.merge(api_data_ny, us_accidents_ny, left_on='crash_date', right_on='start_time', how='inner')

# Drop duplicate city columns ('city_x' and 'city_y')
merged_df = merged_df.drop(columns=['city_x', 'city_y'])

# 1. Ensure the 'crash_date' column is in datetime format
merged_df['crash_date'] = pd.to_datetime(merged_df['crash_date'], errors='coerce')

# 2. Create a new column with the month and year (formatted as "YYYY-MM")
merged_df['crash_date'] = merged_df['crash_date'].dt.to_period('M')

# Add a new column 'city' with the value "New York"
merged_df['city'] = "New York"

# Sort the data by 'crash_date'
merged_df = merged_df.sort_values(by='crash_date', ascending=True)

# Move the 'city' column to the beginning of the DataFrame
cols = ['city'] + [col for col in merged_df.columns if col != 'city']
merged_df = merged_df[cols]


### Extracting Time Information

In this step, the focus is on extracting and formatting the time information from the crash_time column to enhance temporal analysis.

### Steps in the Code:

1. Convert Column to DateTime Format:

- Ensure that the crash_time column is in datetime format using the pd.to_datetime() function.
- Handle errors by setting invalid parsing as NaT (Not a Time).

2. Extract Hour and Minute:

- Use the dt.strftime('%H:%M') method to format the crash_time column to display only the hour and minute values in a HH:MM format.

In [6]:
# Convert the column to datetime if it's not already in that format
merged_df['crash_time'] = pd.to_datetime(merged_df['crash_time'], errors='coerce')

# Extract only the hour and minute
merged_df['crash_time'] = merged_df['crash_time'].dt.strftime('%H:%M')


### Converting number_of_persons_injured to Integer

In this step, the number_of_persons_injured column is cleaned and converted to integer format for accurate numerical analysis.

#### Steps in the Code:

1. Convert Column to Numeric:

- Use the pd.to_numeric() function to convert the number_of_persons_injured column to a numeric type.
- Handle non-numeric values by coercing them into NaN.

2. Handle Missing Values:

- Replace any resulting NaN values with 0 using the .fillna(0) method.

3. Cast to Integer:

- Convert the column to the integer type with .astype(int).

In [7]:
# Convert 'number_of_persons_injured' to integer type
merged_df['number_of_persons_injured'] = pd.to_numeric(merged_df['number_of_persons_injured'], errors='coerce').fillna(0).astype(int)


### Refining and Optimizing the Merged Dataset

This section details the cleanup and transformation steps applied to the merged dataset to make it concise, consistent, and relevant for analysis.

#### Steps:

1. Drop Unnecessary Columns:

- Removed the following columns to declutter the dataset:
- collision_id, contributing_factor_vehicle_2, vehicle_type_code2, latitude, longitude, start_time, end_time, start_lat, start_lng, distance_mi, county, and zipcode.
- Additionally, columns such as airport_code, amenity, bump, and others were dropped based on their irrelevance.

2. Combine Related Columns:

- Combined borough and zip_code into a single column with the format borough - zip_code.
- Merged city and state into a single column with the format city, state.

3. Improve Data Focus:

- Removed latitude, longitude, and time-related columns (start_time and end_time), as well as redundant geographic information like zipcode and county.

In [8]:
# 3. Drop 'collision_id' column
merged_df.drop(columns=['collision_id'], inplace=True)

# 4. Drop 'contributing_factor_vehicle_2' column
merged_df.drop(columns=['contributing_factor_vehicle_2'], inplace=True)

# 5. Drop 'vehicle_type_code2' column
merged_df.drop(columns=['vehicle_type_code2'], inplace=True)

# 6. Merge 'borough' with 'zip_code'
merged_df['borough'] = merged_df['borough'] + ' - ' + merged_df['zip_code'].astype(str)

# 7. Drop 'latitude' and 'longitude' columns
merged_df.drop(columns=['latitude', 'longitude'], inplace=True)

# 8. Drop 'start_time' and 'end_time' columns
merged_df.drop(columns=['start_time', 'end_time'], inplace=True)

# 9. Drop 'start_lat' and 'start_lng' columns
merged_df.drop(columns=['start_lat', 'start_lng'], inplace=True)

# 10. Drop 'distance_mi' column
merged_df.drop(columns=['distance_mi'], inplace=True)

# 11. Drop 'county' column
merged_df.drop(columns=['county'], inplace=True)

# 12. Merge 'state' with 'city'
merged_df['city'] = merged_df['city'] + ', ' + merged_df['state']

# 13. Drop 'zipcode' column
merged_df.drop(columns=['zipcode'], inplace=True)

# 14. Drop unnecessary columns
columns_to_drop = [
    'airport_code', 'amenity', 'bump', 'crossing', 'give_way', 'junction', 
    'no_exit', 'railway', 'roundabout', 'station', 'stop', 'traffic_calming', 
    'traffic_signal', 'turning_loop'
]

merged_df.drop(columns=columns_to_drop, inplace=True, errors='ignore')

# Display the cleaned DataFrame
print(merged_df.head())

               city crash_date crash_time                    on_street_name  \
47201  New York, NY    2021-01      16:15  FRANKLIN AVENUE                    
47222  New York, NY    2021-01      18:00  LINDEN BOULEVARD                   
47223  New York, NY    2021-01      13:42  COMMERCE STREET                    
47197  New York, NY    2021-01      15:18  86 STREET                          
47198  New York, NY    2021-01      21:15  BEDFORD AVENUE                     

         off_street_name  number_of_persons_injured  number_of_persons_killed  \
47201     PACIFIC STREET                          1                         0   
47222  VAN SICLEN AVENUE                          0                         0   
47223    COLUMBIA STREET                          0                         0   
47197           102 ROAD                          1                         0   
47198     CLARENDON ROAD                          2                         0   

       number_of_pedestrians_injured  

### Removing Additional Unnecessary Columns

In this step, we refine the dataset further by removing columns that are no longer required or add little value to the analysis.

1. Columns Removed:

- zip_code: Redundant since it was already combined with borough in an earlier step.
- state: Merged with the city column, making it unnecessary as a standalone column.
- weather_timestamp: Dropped to streamline the dataset, assuming it’s not essential for the analysis.

In [9]:
merged_df.drop(columns=['zip_code','state','weather_timestamp'], inplace=True)


### Adding and Reordering the id Column

To enhance the dataset's structure, a new column, id, is added to uniquely identify each row. Additionally, the id column is moved to the beginning of the dataset for better organization.

#### Steps Taken:

1. Create id Column:

- A new column called id is created, with values ranging from 1 to the length of the dataset. This acts as a unique identifier for each row.

2. Reorder Columns:

- The id column is moved to the beginning of the DataFrame to improve the readability and organization of the data.

In [10]:
merged_df['id'] = range(1, len(merged_df) + 1)

# 16. Move the 'id' column to the beginning
cols = ['id'] + [col for col in merged_df.columns if col != 'id']
merged_df = merged_df[cols]

In [11]:
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
merged_df.head(4)

Unnamed: 0,id,city,crash_date,crash_time,on_street_name,off_street_name,number_of_persons_injured,number_of_persons_killed,number_of_pedestrians_injured,number_of_pedestrians_killed,number_of_cyclist_injured,number_of_cyclist_killed,number_of_motorist_injured,number_of_motorist_killed,contributing_factor_vehicle_1,vehicle_type_code1,borough,location,severity,street,timezone,temperature_f,wind_chill_f,humidity_percent,pressure_in,visibility_mi,wind_direction,wind_speed_mph,precipitation_in,weather_condition,sunrise_sunset
47201,1,"New York, NY",2021-01,16:15,FRANKLIN AVENUE,PACIFIC STREET,1,0,0,0,0,0,1,0,Accelerator Defective,Station Wagon/Sport Utility Vehicle,Brooklyn - 11238.0,"{'latitude': '40.67834', 'longitude': '-73.955...",2.0,FDR Dr N,US/Eastern,15.0,1.0,43.0,30.02,10.0,WNW,14.0,0.0,Fair,Day
47222,2,"New York, NY",2021-01,18:00,LINDEN BOULEVARD,VAN SICLEN AVENUE,0,0,0,0,0,0,0,0,Unspecified,Sedan,Brooklyn - 11207.0,"{'latitude': '40.660656', 'longitude': '-73.88...",4.0,Harlem River Dr N,US/Eastern,27.0,27.0,53.0,29.81,10.0,CALM,0.0,0.0,Fair,Night
47223,3,"New York, NY",2021-01,13:42,COMMERCE STREET,COLUMBIA STREET,0,0,0,0,0,0,0,0,Traffic Control Disregarded,Box Truck,Brooklyn - 11231.0,"{'latitude': '40.679035', 'longitude': '-74.00...",2.0,Broome St,US/Eastern,32.0,24.0,29.0,30.14,10.0,WNW,9.0,0.0,Fair,Day
47197,4,"New York, NY",2021-01,15:18,86 STREET,102 ROAD,1,0,0,0,0,0,1,0,Driver Inattention/Distraction,Station Wagon/Sport Utility Vehicle,Queens - 11416.0,"{'latitude': '40.680763', 'longitude': '-73.85...",3.0,Riverside Dr,US/Eastern,24.0,15.0,60.0,30.13,4.0,ENE,8.0,0.0,Light Snow,Night


### Final Checks and Saving the Merged Dataset

Before completing the data preparation, it's important to check the number of rows in the merged dataset, ensure that there are no null values, and save the cleaned data to a new file.

#### Steps Taken:

1. Check the Number of Rows:

- The total number of rows in the merged dataset is printed to verify the result of the merge operation.

2. Check for Null Values:

- The dataset is checked for any null values to ensure that there are no missing data that could affect further analysis.

3. Save the Merged Dataset:

- The final merged and cleaned dataset is saved to a new CSV file for future use.

In [12]:
# Check the number of rows after the merge
merged_count = merged_df.shape[0]
print(f"Number of rows after the merge: {merged_count}")

# Check for null values in the merged DataFrame
print(f"Null values: \n{merged_df.isnull().sum()}\n")

# Save the merged result to a CSV file
merged_df.to_csv('../data/merged_data.csv', index=False, encoding='utf-8')


Number of rows after the merge: 48001
Null values: 
id                               0
city                             0
crash_date                       0
crash_time                       0
on_street_name                   0
off_street_name                  0
number_of_persons_injured        0
number_of_persons_killed         0
number_of_pedestrians_injured    0
number_of_pedestrians_killed     0
number_of_cyclist_injured        0
number_of_cyclist_killed         0
number_of_motorist_injured       0
number_of_motorist_killed        0
contributing_factor_vehicle_1    0
vehicle_type_code1               0
borough                          0
location                         0
severity                         0
street                           0
timezone                         0
temperature_f                    0
wind_chill_f                     0
humidity_percent                 0
pressure_in                      0
visibility_mi                    0
wind_direction                   0
win

# DATA ANALYSIS

### Hourly Distribution of Crashes
In this step, we analyze the distribution of crashes based on the time of day. By grouping the dataset by the crash_time column, we can determine the number of crashes that occurred during each hour.


In [None]:
# Group the data by 'crash_time' and count the number of occurrences
hour_distribution = merged_df.groupby('crash_time').size().reset_index(name='count')

# Print the result, sorted by the 'count' in descending order
print("\n2. Distribution by hour of the day:")
print(hour_distribution.sort_values(by='count', ascending=False))



2. Distribución por hora del día:
     crash_time  count
0         00:00    693
829       14:00    462
889       15:00    445
1009      17:00    418
769       13:00    400
...         ...    ...
423       07:13      1
298       05:02      1
363       06:12      1
221       03:42      1
1273      21:24      1

[1429 rows x 2 columns]


### Distribution by Vehicle Type Involved

In this step, we analyze the distribution of crashes based on the type of vehicle involved. By counting the occurrences of each vehicle_type_code1, we can determine the number of crashes associated with different vehicle types.


In [None]:
# Count the occurrences of each vehicle type
vehicle_distribution = merged_df['vehicle_type_code1'].value_counts()

# Print the distribution of crashes by vehicle type
print("\nVehicle Type Distribution:")
print(vehicle_distribution)



Distribución por tipo de vehículo:
vehicle_type_code1
Sedan                                  22789
Station Wagon/Sport Utility Vehicle    16465
Taxi                                    1414
Bus                                     1122
Pick-up Truck                           1022
                                       ...  
TRACTOR                                    1
MTA bus                                    1
Vanette                                    1
REFG                                       1
Van Camper                                 1
Name: count, Length: 151, dtype: int64


### Relationship Between Weather Conditions and Injury Severity

In this step, we examine the relationship between weather conditions and the severity of accidents, as measured by the number of injuries. By grouping the data by weather_condition and summing the number of injuries (number_of_persons_injured), we can understand how different weather conditions correlate with injury severity.


In [None]:
# Group by 'weather_condition' and sum the number of injuries
climate_vs_injuries = merged_df.groupby('weather_condition')['number_of_persons_injured'].sum()

# Print the relationship between weather conditions and the number of injuries
print("\nRelationship between Weather and Injuries:")
print(climate_vs_injuries.sort_values(ascending=False))



Relación entre clima y número de personas heridas:
weather_condition
Fair             18005
Cloudy            6287
Light Rain        1848
Mostly Cloudy     1775
Partly Cloudy     1332
Heavy Rain         298
Rain               261
Fog                175
Light Snow         128
Haze                64
Snow                15
Name: number_of_persons_injured, dtype: int32


### Comparison Between Daytime and Nighttime Accidents

In this step, we compare the number of accidents that occurred during the day versus those that occurred at night. The sunrise_sunset column is used to differentiate between daytime and nighttime accidents.

In [None]:
# Count accidents occurring during the day and night
accidents_day_night = merged_df['sunrise_sunset'].value_counts()

# Print the comparison between daytime and nighttime accidents
print("\nComparison Between Day and Night Accidents:")
print(accidents_day_night)



Comparación de accidentes entre día y noche:
sunrise_sunset
Day      32483
Night    15518
Name: count, dtype: int64


### Most Common Contributing Factors in Serious Accidents (with Injuries)

In this step, we analyze the most common contributing factors in accidents that resulted in injuries. We focus on accidents where the number of persons injured is greater than zero and look at the contributing factors listed in the contributing_factor_vehicle_1 column.

In [None]:
# Filter accidents with injuries and count the contributing factors
factors_in_grave_accidents = merged_df[merged_df['number_of_persons_injured'] > 0]['contributing_factor_vehicle_1'].value_counts()

# Print the most common contributing factors in serious accidents
print("\nMost Common Contributing Factors in Serious Accidents:")
print(factors_in_grave_accidents)



Factores contribuyentes más comunes en accidentes graves:
contributing_factor_vehicle_1
Driver Inattention/Distraction                           5940
Unspecified                                              3639
Failure to Yield Right-of-Way                            2880
Traffic Control Disregarded                              1757
Following Too Closely                                    1264
Unsafe Speed                                              837
Passing or Lane Usage Improper                            774
Turning Improperly                                        634
Other Vehicular                                           502
Pedestrian/Bicyclist/Other Pedestrian Error/Confusion     434
Driver Inexperience                                       367
Unsafe Lane Changing                                      338
Alcohol Involvement                                       321
View Obstructed/Limited                                   311
Passing Too Closely                        

### Accidents by Geographical Location (Borough)

In this step, we analyze the distribution of accidents based on the geographical location, specifically the borough in which each accident occurred. The data is grouped by the borough column to identify the boroughs with the highest number of accidents.

In [None]:
# Count the number of accidents per borough
accidents_by_borough = merged_df['borough'].value_counts()

# Print the number of accidents by borough
print("\nAccidents by Borough:")
print(accidents_by_borough)


Accidentes por barrio (borough):
borough
Brooklyn - 11207.0     1442
Brooklyn - 11236.0      936
Brooklyn - 11234.0      839
Queens - 11434.0        782
Brooklyn - 11208.0      780
                       ... 
Manhattan - 10168.0       5
Manhattan - 10115.0       4
Manhattan - 10069.0       4
Queens - 11109.0          2
Manhattan - 10169.0       1
Name: count, Length: 186, dtype: int64


### Correlation Between Number of Vehicles Involved and Number of Injuries

In this analysis, we explore the relationship between the type of vehicle involved in an accident and the number of injuries. We group the data by vehicle type and sum the number of injuries for each vehicle type.

In [None]:
# Group by vehicle type and sum the number of injuries
vehicles_vs_injuries = merged_df.groupby('vehicle_type_code1')['number_of_persons_injured'].sum()

# Print the correlation between vehicle type and number of injuries
print("\nCorrelation Between Vehicle Type and Number of Injuries:")
print(vehicles_vs_injuries.sort_values(ascending=False))



Correlación entre tipo de vehículo y número de personas heridas:
vehicle_type_code1
Sedan                                  14918
Station Wagon/Sport Utility Vehicle    10434
Taxi                                     989
Bike                                     631
Pick-up Truck                            544
                                       ...  
MINI BUS                                   0
Lunch Wagon                                0
LOCOMOTIVE                                 0
Garbage Tr                                 0
van                                        0
Name: number_of_persons_injured, Length: 151, dtype: int32


### Accidents by Wind Speed

In this analysis, we examine how accidents are distributed based on the wind speed recorded at the time of the incident. We categorize accidents by the wind_speed_mph variable to understand if wind speed has an impact on the occurrence of accidents.

In [None]:
# Count accidents by wind speed
accidents_wind_direction = merged_df['wind_speed_mph'].value_counts()

# Print the distribution of accidents by wind speed
print("\nAccidents by Wind Speed:")
print(accidents_wind_direction)



Accidentes por velocidad del viento:
wind_speed_mph
0.000000     9210
3.000000     8691
5.000000     7890
6.000000     7278
7.000000     3840
7.681347     2529
8.000000     2501
9.000000     2075
10.000000    1358
13.000000     709
12.000000     687
15.000000     430
18.000000     308
16.000000     307
14.000000     188
Name: count, dtype: int64


### Impact of Precipitation on Accidents

In this analysis, we assess how precipitation affects the number of injuries in accidents. We group the data by the precipitation_in variable and sum the number of persons injured to understand the relationship between precipitation and accident severity.

In [None]:
# Group by precipitation and sum the number of injuries
precipitation_impact = merged_df.groupby('precipitation_in')['number_of_persons_injured'].sum()

# Print the impact of precipitation on accidents
print("\nImpact of Precipitation on Accidents:")
print(precipitation_impact.sort_values(ascending=False))



Impacto de la precipitación en accidentes:
precipitation_in
0.00    27827
0.01      935
0.02      360
0.18      162
0.21      118
0.04      104
0.05      104
0.96      100
0.16       64
0.10       60
0.11       55
0.63       52
0.07       47
0.30       47
0.13       44
0.17       42
0.24       39
0.03       28
Name: number_of_persons_injured, dtype: int32


### Comparison of Injured Persons: Pedestrians, Cyclists, and Motorists

This analysis compares the number of injuries between different types of individuals involved in accidents, including pedestrians, cyclists, and motorists. By summing the injuries in each category, we can assess the severity of accidents for these groups.

In [None]:
# Comparison of injured persons by type
injury_comparison = {
    'Injured Persons': merged_df['number_of_persons_injured'].sum(),
    'Injured Pedestrians': merged_df['number_of_pedestrians_injured'].sum(),
    'Injured Cyclists': merged_df['number_of_cyclist_injured'].sum(),
    'Injured Motorists': merged_df['number_of_motorist_injured'].sum(),
}

# Print the comparison between pedestrians, cyclists, and motorists injured
print("\nComparison of Injured Pedestrians, Cyclists, and Motorists:")
print(injury_comparison)



Comparación entre peatones, ciclistas y motoristas heridos:
{'Personas Heridas': 30188, 'Peatones Heridos': 220, 'Ciclistas Heridos': 4189, 'Motoristas Heridos': 23810}


### Factors Affecting Cyclist Safety

This analysis identifies the contributing factors to accidents that result in cyclist injuries. By examining the contributing_factor_vehicle_1 column for accidents where cyclists are injured, we can determine which factors are most commonly associated with these types of incidents.

In [None]:
# Factors affecting cyclist safety
factors_affecting_cyclists = merged_df[merged_df['number_of_cyclist_injured'] > 0]['contributing_factor_vehicle_1'].value_counts()

# Print the contributing factors that affect cyclist safety
print("\nFactors Affecting Cyclist Safety:")
print(factors_affecting_cyclists)



Factores que afectan la seguridad de ciclistas:
contributing_factor_vehicle_1
Driver Inattention/Distraction                           1330
Unspecified                                               601
Failure to Yield Right-of-Way                             586
Pedestrian/Bicyclist/Other Pedestrian Error/Confusion     352
Traffic Control Disregarded                               289
Passing or Lane Usage Improper                            182
Turning Improperly                                        125
Following Too Closely                                     101
View Obstructed/Limited                                    90
Unsafe Speed                                               90
Other Vehicular                                            61
Driver Inexperience                                        53
Passing Too Closely                                        45
Passenger Distraction                                      38
Unsafe Lane Changing                                 

### Correlation Between Temperature and Accidents

In this section, we explore the relationship between temperature and the number of accidents. We group the data by temperature in Fahrenheit (temperature_f) and count the number of accidents (represented by id) for each temperature value.

In [None]:
temperature_vs_accidents = merged_df.groupby('temperature_f')['id'].count()
print("\nCorrelation between temperature and the number of accidents:")
print(temperature_vs_accidents.sort_values(ascending=False))


Correlación entre temperatura y número de accidentes:
temperature_f
74.000000    2347
80.000000    1604
77.000000    1572
73.000000    1464
66.000000    1301
75.000000    1229
71.000000    1184
54.000000    1178
70.000000    1130
65.000000    1114
83.000000    1063
78.000000    1034
51.000000     983
79.000000     973
81.000000     948
60.000000     944
84.000000     900
55.000000     893
62.000000     874
56.000000     862
86.000000     861
63.000000     856
76.000000     856
69.000000     839
67.000000     796
42.000000     795
49.000000     775
36.000000     765
52.000000     736
44.000000     710
50.000000     678
82.000000     671
72.000000     640
40.000000     639
61.000000     623
57.000000     617
88.000000     608
68.000000     591
43.000000     553
41.000000     539
46.000000     539
85.000000     537
59.000000     526
32.000000     507
89.000000     484
87.000000     469
53.000000     465
27.000000     460
91.000000     428
64.000000     395
38.000000     389
28.000000    

### Number of Accidents by Precipitation (Rain/Snow)
In this section, we analyze the number of accidents based on precipitation levels (rain or snow). We group the data by the amount of precipitation (precipitation_in) and count the number of accidents (represented by id) for each precipitation level.

In [None]:
accidents_by_precipitation = merged_df.groupby('precipitation_in')['id'].count()
print("\nNumber of accidents by precipitation:")
print(accidents_by_precipitation)


Número de accidentes por precipitación:
precipitation_in
0.00    44325
0.01     1464
0.02      546
0.03       72
0.04      209
0.05      188
0.07       88
0.10       87
0.11       82
0.13       75
0.16       82
0.17       64
0.18      255
0.21      128
0.24       60
0.30       81
0.63       75
0.96      120
Name: id, dtype: int64


### Most Common Factors in Accidents with No Injuries

In this section, we identify the most common contributing factors in accidents that resulted in no injuries. We filter the data to include only accidents where no persons were injured (number_of_persons_injured == 0), and then we analyze the contributing factors for these accidents.

In [None]:
factors_in_minor_accidents = merged_df[merged_df['number_of_persons_injured'] == 0]['contributing_factor_vehicle_1'].value_counts()
print("\nMost common factors in accidents with no injuries:")
print(factors_in_minor_accidents)



Factores más comunes en accidentes sin heridos:
contributing_factor_vehicle_1
Driver Inattention/Distraction                           6889
Unspecified                                              4704
Failure to Yield Right-of-Way                            1914
Passing or Lane Usage Improper                           1720
Following Too Closely                                    1608
Passing Too Closely                                      1269
Traffic Control Disregarded                              1113
Turning Improperly                                       1072
Backing Unsafely                                          913
Unsafe Speed                                              799
Other Vehicular                                           712
Alcohol Involvement                                       603
Driver Inexperience                                       548
Unsafe Lane Changing                                      532
Reaction to Uninvolved Vehicle                       