# Roadkill Data Preprocessing

This notebook demonstrates the preprocessing steps for roadkill incident data in Korea 
(2020–2022). The raw data consists of separate Excel files for each year, and in this notebook we:


1. Import required libraries

# Imports

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import box, Point
import matplotlib.pyplot as plt
import folium
from folium.plugins import HeatMap
from collections import defaultdict
import os

In [2]:
data_path = '/Users/yeji_kim/Desktop/project/roadkill_analysis/data'
roadkill_path = os.path.join(data_path, 'roadkill')


## 1. Load Administrative Boundaries

We first load the shapefile containing administrative boundaries of Korea. 
From this file, we filter **Gangwon Providence**, whichi is the 

In [3]:
# Load shapefile of provincial boundaries
boundary_path = os.path.join(data_path,'SouthKoreaApi' )
boundary_gdf = gpd.read_file(boundary_path, encoding='cp949').to_crs(epsg = 4326)


# Filter Gangwon Province
gangwon_boundary = boundary_gdf[boundary_gdf['CTP_KOR_NM'] == '강원특별자치도']




## 2. Create Grid Over Gangwon Province 

We construct a grid covering Gangwon Province using a bounding box.
Each grid cell is 0.05 * 0.05 degrees, and only cells that intersect with Gangwon are retained. 

In [66]:
# Get bounding box
min_x, min_y, max_x, max_y = gangwon_boundary.total_bounds
grid_size = 0.05

# Generate grid
grid_cells, grid_ids, grid_id = [],[], 1

x = min_x

while x < max_x:
    y = min_y
    while y < max_y:
        grid_cells.append(box(x,y, x + grid_size, y + grid_size))
        grid_ids.append(f'grid_{grid_id}')
        grid_id += 1
        y += grid_size
    x += grid_size
    
grid = gpd.GeoDataFrame({'grid_id': grid_ids, 'geometry': grid_cells}, crs = 'EPSG:4326')

# Keep only cells intersecting with Gangwon 
gangwon_grid = gpd.sjoin(grid, gangwon_boundary, how='inner', predicate='intersects').drop(columns='index_right')
gangwon_grid = gangwon_grid[['grid_id', 'geometry']]

## 3. Load Roadkill Incident Data

We load the roadkill incident data files for 2020–2022.  
The datasets include longitude and latitude for each incident, which will later be mapped to grid cells.


In [67]:
# Dictionary to store yearly datasets
roadkill = {}
df_list = []
for file in os.listdir(roadkill_path):
    year = int(file[:4])
    df = pd.read_excel(os.path.join(roadkill_path, file))
    print(df.columns)
    roadkill[year] = df
    
# Merge all yearly DataFrames
roadkill_data = pd.concat(roadkill.values(), ignore_index = True)

print(roadkill_data.head())

# print(roadkill_data.columns())      

Index(['일련번호', '신고구분', '신고내용', '접수일자', '접수시각', '관할기관', 'GPS X', 'GPS Y'], dtype='object')
Index(['일련번호', '신고구분', '신고내용', '접수일자', '접수시각', '관할기관', 'GPS X', 'GPS Y'], dtype='object')
Index(['일련번호', '신고구분', '신고내용', '접수일자', '접수시각', '관할기관', 'GPS X', 'GPS Y'], dtype='object')
   일련번호 신고구분                                     신고내용        접수일자   접수시각  \
0  4776  로드킬      부산 북구 덕천동 132-67 번지에서 로드킬이 발생하였습니다.  2021-12-31  21:12   
1  4775  로드킬  경기 고양시 일산동구 중산동 1780 번지에서 로드킬이 발생하였습니다.  2021-12-31  19:08   
2  4774  로드킬     서울 강서구 방화동 281-38 번지에서 로드킬이 발생하였습니다.  2021-12-31  18:46   
3  4773  로드킬  경기 이천시 설성면 장천리 232-2 번지에서 로드킬이 발생하였습니다.  2021-12-31  18:29   
4  4772  로드킬        인천 서구 마전동 1096 번지에서 로드킬이 발생하였습니다.  2021-12-31  18:22   

          관할기관       GPS X      GPS Y  
0        부산 북구  129.021598  35.210053  
1  경기 고양시 일산동구  126.790437  37.684049  
2       서울 강서구  126.815437  37.572090  
3       경기 이천시  127.525797  37.139656  
4        인천 서구  126.675922  37.598946  


## 4. Map Roadkill Incidents to Gangwon Grid

We transform the dataset into a GeoDataFrame and map each roadkill incident  
to its corresponding grid cell in Gangwon Province using a spatial join.


In [88]:
# Convert to GeoDataFrame
roadkill_gdf = gpd.GeoDataFrame(
    roadkill_data, geometry = gpd.points_from_xy(roadkill_data['GPS X'], roadkill_data['GPS Y']),crs ='EPSG:4326'
)


# Filter roadkill incidents within Gangwon Province

gangwon_roadkill = gpd.sjoin(roadkill_gdf, gangwon_boundary, how = 'inner', predicate = 'within').drop(columns = 'index_right')

gangwon_roadkill["roadkill_count"] = 1
gangwon_roadkill['roadkill_count'].astype('int64')


# Map roadkill incidents to grid cells using a spatial join
gangwon_roadkill_grid = gpd.sjoin(gangwon_roadkill, gangwon_grid, how = 'right', predicate = 'within')

In [89]:
gangwon_roadkill_grid = gangwon_roadkill_grid[['grid_id','GPS X', 'GPS Y','roadkill_count']]
gangwon_roadkill_grid['roadkill_count'].fillna(0).astype('int64')
gangwon_roadkill_grid.tail()

Unnamed: 0,grid_id,GPS X,GPS Y,roadkill_count
1413,grid_1414,129.301094,37.288755,1.0
1413,grid_1414,129.314704,37.286061,1.0
1442,grid_1443,,,
1443,grid_1444,,,
1444,grid_1445,,,


## 5. Aggregate Roadkill Counts by Grid

We count the number of roadkill incidents per grid cell.  
Cells with no incidents are assigned a count of zero,  
resulting in a complete grid-level dataset.


In [77]:
roadkill_count = gangwon_roadkill_grid.groupby('grid_id')['roadkill_count'].sum().reset_index()
roadkill_count

Unnamed: 0,grid_id,roadkill_count
0,grid_1000,0.0
1,grid_1001,0.0
2,grid_1002,0.0
3,grid_1003,0.0
4,grid_1004,0.0
...,...,...
779,grid_995,0.0
780,grid_996,0.0
781,grid_997,0.0
782,grid_998,0.0


In [74]:
grid_features = gangwon_grid.merge(roadkill_count, on = 'grid_id', how = 'left')
grid_features

Unnamed: 0,grid_id,geometry,roadkill_count
0,grid_25,"POLYGON ((127.14504 38.22783, 127.14504 38.277...",0.0
1,grid_26,"POLYGON ((127.14504 38.27783, 127.14504 38.327...",0.0
2,grid_55,"POLYGON ((127.19504 38.12783, 127.19504 38.177...",0.0
3,grid_56,"POLYGON ((127.19504 38.17783, 127.19504 38.227...",0.0
4,grid_57,"POLYGON ((127.19504 38.22783, 127.19504 38.277...",0.0
...,...,...,...
779,grid_1413,"POLYGON ((129.34504 37.22783, 129.34504 37.277...",0.0
780,grid_1414,"POLYGON ((129.34504 37.27783, 129.34504 37.327...",2.0
781,grid_1443,"POLYGON ((129.39504 37.12783, 129.39504 37.177...",0.0
782,grid_1444,"POLYGON ((129.39504 37.17783, 129.39504 37.227...",0.0


In [72]:
gangwon_roadkill

Unnamed: 0,일련번호,신고구분,신고내용,접수일자,접수시각,관할기관,GPS X,GPS Y,geometry,CTPRVN_CD,CTP_ENG_NM,CTP_KOR_NM,roadkill_count
18,4758,로드킬,강원 평창군 진부면 동산리 12-4 번지에서 로드킬이 발생하였습니다.,2021-12-31,15:37,강원 평창군,128.600371,37.713968,POINT (128.60037 37.71397),51,Gangwon-do,강원특별자치도,1
20,4756,로드킬,강원 원주시 신림면 황둔리 967-4 번지에서 로드킬이 발생하였습니다.,2021-12-31,13:43,강원 원주시,128.164982,37.247599,POINT (128.16498 37.2476),51,Gangwon-do,강원특별자치도,1
21,4755,로드킬,강원 원주시 신림면 황둔리 1281-26 번지에서 로드킬이 발생하였습니다.,2021-12-31,13:41,강원 원주시,128.153956,37.248793,POINT (128.15396 37.24879),51,Gangwon-do,강원특별자치도,1
37,4739,로드킬,강원 속초시 청호동 1224-1 번지에서 로드킬이 발생하였습니다.,2021-12-30,15:22,강원 속초시,128.595800,38.199742,POINT (128.5958 38.19974),51,Gangwon-do,강원특별자치도,1
84,4692,로드킬,강원 철원군 갈말읍 문혜리 246-1 번지에서 로드킬이 발생하였습니다.,2021-12-28,18:12,강원 철원군,127.362588,38.186414,POINT (127.36259 38.18641),51,Gangwon-do,강원특별자치도,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9446,39,로드킬,강원 원주시 우산동 307-4 번지에서 로드킬이 발생하였습니다.,2020-08-02,12:56,강원 원주시,127.937834,37.373635,POINT (127.93783 37.37364),51,Gangwon-do,강원특별자치도,1
9448,37,로드킬,강원 원주시 무실동 1085-7 번지에서 로드킬이 발생하였습니다.,2020-08-02,12:11,강원 원주시,127.919171,37.322474,POINT (127.91917 37.32247),51,Gangwon-do,강원특별자치도,1
9454,27,로드킬,강원 원주시 호저면 주산리 963-1 번지에서 로드킬이 발생하였습니다.,2020-08-01,19:32,강원 원주시,127.921223,37.413686,POINT (127.92122 37.41369),51,Gangwon-do,강원특별자치도,1
9455,26,로드킬,강원 원주시 호저면 주산리 963-1 번지에서 로드킬이 발생하였습니다.,2020-08-01,19:32,강원 원주시,127.921223,37.413686,POINT (127.92122 37.41369),51,Gangwon-do,강원특별자치도,1


In [73]:
gangwon_grid

Unnamed: 0,grid_id,geometry
24,grid_25,"POLYGON ((127.14504 38.22783, 127.14504 38.277..."
25,grid_26,"POLYGON ((127.14504 38.27783, 127.14504 38.327..."
54,grid_55,"POLYGON ((127.19504 38.12783, 127.19504 38.177..."
55,grid_56,"POLYGON ((127.19504 38.17783, 127.19504 38.227..."
56,grid_57,"POLYGON ((127.19504 38.22783, 127.19504 38.277..."
...,...,...
1412,grid_1413,"POLYGON ((129.34504 37.22783, 129.34504 37.277..."
1413,grid_1414,"POLYGON ((129.34504 37.27783, 129.34504 37.327..."
1442,grid_1443,"POLYGON ((129.39504 37.12783, 129.39504 37.177..."
1443,grid_1444,"POLYGON ((129.39504 37.17783, 129.39504 37.227..."
