# Main Data Science Workflow

This notebook demonstrates the complete pipeline from data loading to model evaluation.

**Contents:**
1. Setup and Configuration
2. Data Loading from S3
3. Data Transformation

**Last Updated:** 2025-10-19
**Author:** Wiebke Hutiri

In [1]:
# Notebook Configuration

%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings('ignore')

## Setup and Configuration

In [3]:
from ds_code_challenge.data import download_from_s3, load_data, spatial_join
from ds_code_challenge.config import Config

Config.setup_directories()

[32m2025-10-19 23:15:56.572[0m | [1mINFO    [0m | [36mds_code_challenge.config[0m:[36mConfig[0m:[36m26[0m - [1mPROJ_ROOT path is: /Users/wiebke/PycharmProjects/ds_code_challenge[0m


## Data Loading from s3

In [None]:
s3_dataset_keys = ['sr.csv.gz', 'sr_hex.csv.gz', 'sr_hex_truncated.csv', 'city-hex-polygons-8.geojson', 'city-hex-polygons-8-10.geojson', 'images/swimming-pool/yes', 'images/swimming-pool/no']

for key in s3_dataset_keys:
    download_from_s3(key, 'raw')

In [4]:
sr_df = load_data('sr.csv.gz')
hex8_df = load_data('city-hex-polygons-8.geojson')
sr_hex_trun_df = load_data('sr_hex_truncated.csv')

## Data Transformation

In [6]:
sr_hex_df = spatial_join(sr_df, hex8_df)

In [7]:
sr_hex_df.head()

Unnamed: 0,notification_number,reference_number,creation_timestamp,completion_timestamp,directorate,department,branch,section,code_group,code,cause_code_group,cause_code,official_suburb,latitude,longitude,geometry,index_right,h3_level8_index,centroid_lat,centroid_lon
0,400583534,9109492000.0,2020-10-07 06:55:18+02:00,2020-10-08 15:36:35+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area Central,District: Blaauwberg,TD Customer complaint groups,Pothole&Defect Road Foot Bic Way/Kerbs,Road (RCL),Wear and tear,MONTAGUE GARDENS,-33.872839,18.522488,POINT (18.52249 -33.87284),1047.0,88ad360225fffff,-33.871121,18.524125
1,400555043,9108995000.0,2020-07-09 16:08:13+02:00,2020-07-14 14:27:01+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area East,District : Somerset West,TD Customer complaint groups,Manhole Cover/Gully Grid,Road (RCL),Vandalism,SOMERSET WEST,-34.078916,18.84894,POINT (18.84894 -34.07892),3055.0,88ad36d5e1fffff,-34.080426,18.851688
2,400589145,9109614000.0,2020-10-27 10:21:59+02:00,2020-10-28 17:48:15+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area East,District : Somerset West,TD Customer complaint groups,Manhole Cover/Gully Grid,Road (RCL),Vandalism,STRAND,-34.102242,18.821116,POINT (18.82112 -34.10224),2946.0,88ad36d437fffff,-34.104934,18.820143
3,400538915,9108601000.0,2020-03-19 06:36:06+02:00,2021-03-29 20:34:19+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area North,District : Bellville,TD Customer complaint groups,Paint Markings Lines&Signs,Road Markings,Wear and tear,RAVENSMEAD,-33.920019,18.607209,POINT (18.60721 -33.92002),1247.0,88ad361133fffff,-33.920536,18.607682
4,400568554,,2020-08-25 09:48:42+02:00,2020-08-31 08:41:13+02:00,URBAN MOBILITY,Roads Infrastructure Management,RIM Area South,District : Athlone,TD Customer complaint groups,Pothole&Defect Road Foot Bic Way/Kerbs,Road (RCL),Surfacing failure,CLAREMONT,-33.9874,18.45376,POINT (18.45376 -33.9874),2530.0,88ad361709fffff,-33.983538,18.45157
