# Wildfire


Dataset: **Wildfire**  `FPA_FOD_20170508.sqlite`

Source: https://www.kaggle.com/datasets/rtatman/188-million-us-wildfires/data

Reference: Short, Karen C. 2017. Spatial wildfire occurrence data for the United States, 1992-2015 [FPA_FOD_20170508]. 4th Edition. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2013-0009.4

This starter notebook includes code to extract a small subset and a few visualizations to get you started. You can download the processed file used in this notebook here: [wildfire](https://ucdavis.box.com/s/q0z2k765u0u49bvkioso3skt85apwa3s).

In [1]:
import pandas as pd

## 1  Draw a small subset for EDA

The full Kaggle file is about **796 MB**.  To keep things simple, we will start with a pregenerated 10k‑row subset. 

In [2]:
import sqlite3

# Connect to the SQLite database
conn = sqlite3.connect('../data/wildfires_sample_10k.sqlite')

# Read the first table (assuming only one table in the database)
table_name = pd.read_sql_query(
    "SELECT name FROM sqlite_master WHERE type='table';", conn
).iloc[0, 0]

# Load the data into a DataFrame
wildfires10k= pd.read_sql_query(f"SELECT * FROM {table_name}", conn)

conn.close()

print(wildfires10k.shape)
wildfires10k.head()

(10000, 39)


Unnamed: 0,OBJECTID,FOD_ID,FPA_ID,SOURCE_SYSTEM_TYPE,SOURCE_SYSTEM,NWCG_REPORTING_AGENCY,NWCG_REPORTING_UNIT_ID,NWCG_REPORTING_UNIT_NAME,SOURCE_REPORTING_UNIT,SOURCE_REPORTING_UNIT_NAME,...,FIRE_SIZE_CLASS,LATITUDE,LONGITUDE,OWNER_CODE,OWNER_DESCR,STATE,COUNTY,FIPS_CODE,FIPS_NAME,Shape
0,14,14,FS-1418872,FED,FS-FIRESTAT,FS,USCAENF,Eldorado National Forest,503,Eldorado National Forest,...,B,38.433333,-120.51,14.0,MISSING/NOT SPECIFIED,CA,5.0,5.0,Amador,b'\x00\x01\xad\x10\x00\x00p=\n\xd7\xa3 ^\xc0\x...
1,208,211,FS-1419349,FED,FS-FIRESTAT,FS,USCASQF,Sequoia National Forest,513,Sequoia National Forest,...,A,35.646944,-118.463611,5.0,USFS,CA,,,,b'\x00\x01\xad\x10\x00\x00\xc8\xe0\xee\xcd\xab...
2,227,230,FS-1419380,FED,FS-FIRESTAT,FS,USNVHTF,Humboldt-Toiyabe National Forest,417,Humboldt-Toiyabe National Forest,...,A,38.054444,-119.251944,13.0,STATE OR PRIVATE,CA,51.0,51.0,Mono,b'\x00\x01\xad\x10\x00\x00`\x8d\x92\xdb\x1f\xd...
3,338,344,FS-1419610,FED,FS-FIRESTAT,FS,USAZCNF,Coronado National Forest,305,Coronado National Forest,...,B,32.800833,-110.008333,5.0,USFS,AZ,9.0,9.0,Graham,b'\x00\x01\xad\x10\x00\x00D\xf4\x84\x88\x88\x8...
4,362,368,FS-1419655,FED,FS-FIRESTAT,FS,USAZCOF,Coconino National Forest,304,Coconino National Forest,...,A,35.547222,-111.208333,5.0,USFS,AZ,5.0,5.0,Coconino,b'\x00\x01\xad\x10\x00\x00\x10\xc1QUU\xcd[\xc0...


## 2 Data preprocessing

We select a few columns (i.e., variables) in this demonstration.

In [3]:
# Full keep-list (duplicates removed, one per literal column name)
cols = [
    # 1) discovery timing
    "FIRE_YEAR", "DISCOVERY_DATE", "DISCOVERY_DOY", "DISCOVERY_TIME",
    # 2) containment timing
    "CONT_DATE", "CONT_DOY", "CONT_TIME",
    # 3) ignition cause
    "STAT_CAUSE_CODE", "STAT_CAUSE_DESCR",
    # 4) fire size
    "FIRE_SIZE", "FIRE_SIZE_CLASS",
    # 5) point location
    "LATITUDE", "LONGITUDE",
    # 6) administrative place
    "STATE", "COUNTY", "FIPS_CODE", "FIPS_NAME",
    # 7) ownership / jurisdiction
    "OWNER_CODE", "OWNER_DESCR",
    # 8) reporting agency & unit
    "NWCG_REPORTING_AGENCY", "NWCG_REPORTING_UNIT_ID", "NWCG_REPORTING_UNIT_NAME",
    "SOURCE_REPORTING_UNIT", "SOURCE_REPORTING_UNIT_NAME",
    # 9) source-system metadata
    "SOURCE_SYSTEM_TYPE", "SOURCE_SYSTEM",
    #10) incident identity / names / codes
    "FIRE_NAME", "COMPLEX_NAME", "MTBS_FIRE_NAME", "ICS_209_NAME",
    "ICS_209_INCIDENT_NUMBER", "MTBS_ID", "LOCAL_FIRE_REPORT_ID",
    "LOCAL_INCIDENT_ID", "FIRE_CODE",
]


df = wildfires10k[cols].copy()          # keep only the pedagogically interesting bits
df["DAY_TO_CONT"] = df["CONT_DOY"] - df["DISCOVERY_DOY"]

## 3 Exploratory data analysis

### 3.1 Mapping the wildfires

In [4]:
import plotly.graph_objects as go

# Count fires per state
state_counts = (
    df.groupby("STATE", as_index=False)
      .size()                         # -> column "size"
      .rename(columns={"size": "fires"})
)

fig = go.Figure()

fig.add_trace(go.Choropleth(
    locations=state_counts["STATE"],     # two-letter state codes
    z=state_counts["fires"],             # color scale values
    locationmode="USA-states",           # use built-in USA state map
    colorscale="Reds",                   # optional: choose a colorscale
    colorbar_title="Fires"               # color bar label
))

fig.update_layout(
    title_text="Number of Wildfires by State 1992-2015 (subset)",
    geo_scope='usa'
)

fig.show()

### 3.2 Time to containment

In [5]:
import plotly.express as px

# ── 1.  Filter to valid rows (and optional sanity bounds) ──────────────────────
d = df.dropna(subset=["DAY_TO_CONT", "FIRE_YEAR"]).copy()
d = d[d["DAY_TO_CONT"].between(0.1, 730)]          # 0.1–730 days

# ── 2.  Scatter plot (log-y) with transparency ────────────────────────────────
fig = px.scatter(
    d,
    x="FIRE_YEAR",
    y="DAY_TO_CONT",
    opacity=0.2,         # dense years appear darker
    log_y=True,
    trendline="lowess",  # red smooth curve
    trendline_color_override="red",
    title="Days to Contain vs. Fire Year (log scale)"
)

fig.update_traces(marker=dict(size=4))             # small dots

fig.update_layout(
    xaxis_title="Fire Year",
    yaxis_title="Days to Contain (log scale)",
    height=500
)

fig.show()


## Trend of missing data?

In [6]:
# 1. Compute % missing per year
year_gap = (
    df
    .groupby("FIRE_YEAR")["DAY_TO_CONT"]
    .apply(lambda s: s.isna().mean()*100)        # % instead of proportion
    .reset_index(name="pct_missing")
    .sort_values("FIRE_YEAR")
)

# 2. Bar chart

fig = go.Figure()

fig.add_trace(go.Bar(                           # change to go.Scatter with mode = 'lines+markers' if you prefer a line
    x=year_gap["FIRE_YEAR"],
    y=year_gap["pct_missing"],
    name="% Missing",
    marker=dict(color="steelblue")
))

fig.update_layout(
    title="% of Fires Lacking Containment-Time Data by Year",
    xaxis_title="Fire Year",
    yaxis_title="% DAY_TO_CONT missing",
    yaxis=dict(
        ticksuffix=" %",
        range=[0, 100]
    ),
    height=500,
    margin=dict(l=60, r=40, t=60, b=60)
)

fig.show()

## 4 Next step?
