# Roadkill Data Analysis and Modeling

## Imports

In [11]:
import pandas as pd
import seaborn as sns
import numpy as np
import os 

In [12]:
data_path = '/Users/yeji_kim/Desktop/project/roadkill_analysis/data/processed'

data = pd.read_csv(os.path.join(data_path, 'grid_features.csv')).drop(columns = 'geometry')
data.head()
data.columns

Index(['grid_id', 'roadkill_count', 'national_highway_rk', 'white_rk',
       'corridor_count', 'min_dist_to_corridor', 'wild_boar_grid',
       'roe_deer_grid', 'is_hongcheon', 'wild_boar_count', 'roe_deer_count',
       'white_road_grade', 'national_road_count', 'highway_road_count'],
      dtype='object')

## Data Columns
`grid_id` : Unique identifier for each grid cell in Gangwon Province.
(강원도 내 각 격자 셀의 고유 ID)

`national_highway_rk` : Roadkill count occurring on **national roads, provincial roads, and highways**.  
Highway cases are very rare, so they are aggregated together.  
(국도, 지방도, 고속도로에서 발생한 로드킬 수. 고속도로 건수는 매우 적어 함께 합산)

`white_rk` : Roadkill count occurring on **ordinary roads (non-national, non-provincial, non-highway)**.  
(일반도로에서 발생한 로드킬 수 — 국도·지방도·고속도로가 아닌 도로)

`corridor_count` : Number of ecological corridors within the grid.  
(각 격자 셀 내 생태통로 개수)

`min_dist_to_corridor` : Minimum distance (meters) from any roadkill incident in the grid  
to the nearest ecological corridor.  
(격자 내 로드킬 지점에서 가장 가까운 생태통로까지의 최소 거리, 미터 단위)

`wild_boar_grid` : Area-weighted average wild boar damage ratio in each grid cell.  
Polygons are intersected with the grid, reprojected to a metric CRS,  
and the ratio is weighted by the overlap area.  
(격자별 면적 가중 평균 멧돼지 피해율 — 폴리곤과 격자 교차 면적을 가중치로 사용)

`roe_deer_grid` : Area-weighted average roe deer damage ratio in each grid cell.  
Polygons are intersected with the grid, reprojected to a metric CRS,  
and the ratio is weighted by the overlap area.  
                (격자별 면적 가중 평균 고라니 피해율 — 폴리곤과 격자 교차 면적을 가중치로 사용)

`is_hongcheon` : Indicator (0/1) whether the grid lies within Hongcheon County.  
                (격자가 홍천군에 포함되는지 여부, 0/1)

`wild_boar_count` : Estimated distribution of wild boar incidents in each grid cell.  
Values are scaled from the area-weighted ratios (`wild_boar_grid`) using 16,700 total incidents  
reported in external studies as a reference.  
(격자별 멧돼지 출몰 추정 분포 — 면적 가중 비율(`wild_boar_grid`)을 기반으로,  
외부 연구에서 보고된 총 16,700건을 참고하여 환산한 값)

`roe_deer_count` : Estimated distribution of roe deer incidents in each grid cell.  
Values are scaled from the area-weighted ratios (`roe_deer_grid`) using external reports on  
total incidents as a reference.  
(격자별 고라니 출몰 추정 분포 — 면적 가중 비율(`roe_deer_grid`)을 기반으로,  외부 보고 자료의 총 발생 건수를 참고하여 환산한 값)

`white_road_grade` : Categorical index (0–4) describing the level of road presence and surrounding land use within each grid cell.  
Defined as follows:  

- **0** : No roads (pure forest)  
- **1** : Forest area with 1–2 roads  
- **2** : Small village / rural settlement  
- **3** : Town  
- **4** : City / urban area  

(격자 내 도로 존재 및 입지 환경을 나타내는 범주형 지표, 0–4 등급으로 분류)



`national_road_count` : Number of national and provincial roads contained in the grid.  
(격자 내 포함된 국도·지방도 개수)

`highway_road_count` : Number of highways contained in the grid.  
(격자 내 포함된 고속도로 개수)



In [13]:
data['wild_boar_grid']

0      0.401211
1      0.401211
2      0.401211
3      0.401211
4      0.401211
         ...   
779    0.000000
780    0.000000
781    0.000000
782    0.000000
783    0.000000
Name: wild_boar_grid, Length: 784, dtype: float64