# Electric Vehicle (EV) Charging Infrastructure in the US
## Introduction 

The adoption of electric vehicles across the United States is accelerating, transforming the nation’s transportation landscape. As more drivers consider ditching fossil fuels for electricity, the availability of charging stations becomes pivotal—unlike gas stations, chargers aren’t yet ubiquitous. Charging infrastructure is the foundation of this transition; without a reliable, accessible network, widespread EV use simply stalls.

This notebook analyzes the EV charging infrastructure across U.S. states, combining charger counts, charger types, and geography with indicators of EV ownership. The goal is to surface where access is strong, where gaps persist, and how infrastructure aligns with adoption.

- Coverage: Which states have the highest and lowest concentrations of charging stations?
- Density: How does charging infrastructure density vary by state size (e.g., chargers per 100 square miles)?
- Accessibility: How does the EVs-per-charger ratio differ by state, and what does it suggest about ease of access?
- Technology mix: How are advanced chargers (DC fast chargers) distributed relative to Level 1 and Level 2?
- Alignment with adoption: How does infrastructure (counts, density, DCFC share) align with EV ownership levels?

After completing this analysis, readers will gain a clear picture of the current state of EV charging deployment across the United States. They will see which regions are best suited for owning an electric vehicle, and understand whether it’s feasible to travel long distances across the country without the risk of being stranded with a depleted battery.

---
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/yoav-es/ev_charging_infra_us/MASTER)


## Python libraries
This project will use the following Python libraries throughout this analysis:
- pandas – for data loading, cleaning, and formatting
- numpy – for numerical operations and calculations
- matplotlib & seaborn – for creating visualizations and exploring trends



In [3]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Enable inline plotting
%matplotlib inline

## Data Loading
To start our analysis, we’ll load the dataset into a Pandas DataFrame. This will give us a flexible and powerful structure for examining trends, performing calculations, and preparing visualizations.

In [12]:
def load_data(path):
    """Load the dataset from a CSV file."""
    df = pd.read_csv(path)
    return df

# Load the data
column = load_data('ev_chargers_data.csv')
column.head()
print(f"number of rows {column.shape[0]} ")
print(column.dtypes)
print(column.State.nunique())


number of rows 54 
State                           object
Total Evs                       object
Total Chargers                  object
Ratio: EVs to Charger Ports    float64
Level 1 Chargers                object
Level 2 Chargers                object
DCFC Chargers                   object
Square Mileage of State        float64
Unnamed: 8                     float64
Unnamed: 9                     float64
dtype: object
54


## Data Review

The EV charging dataset contains **54 rows**, each representing a U.S. state, the District of Columbia, or a U.S. territory, with several attributes describing EV charging infrastructure and usage. These attributes enable analysis of the availability, distribution, and accessibility of charging stations across the country.

At the bottom of the CSV file, there are additional rows containing **summary statistics** — *Median*, *Average*, and *Total* — which will be excluded from state‑level analysis but may be useful for high‑level comparisons.

**Key variables:**
- `State` – Name of the U.S. state, the District of Columbia.  
- `Total Evs` – Total number of registered electric vehicles in that state.  
- `Total Chargers` - Total number of registered electric vehicles in that state.  
- `EV-to-Charger Ratio` – Ratio of EVs to available public charging ports (lower ratios indicate better charger availability).  
- `Level 1 Chargers` – Number of public Level 1 charging ports (standard household‑type outlets; slowest charging speed).  
- `Level 2 Chargers` – Number of public Level 2 charging ports (240V; faster than Level 1, common in workplaces and public parking).  
- `DCFC Chargers` – Number of public DC fast charging ports (high‑power chargers capable of rapidly charging EV batteries).  
- `Square Mileage` – Total land area of the state or territory in square miles.  

---

## Data Cleaning

A review of the dataset identified several necessary preprocessing steps:

1. **Remove empty columns** – Two columns without titles or data will be dropped.

2. **Filter rows** – The dataset includes 54 entries; after removing *Total*, *Average*, and *Median* rows, 51 remain.

    The extra entry beyond the 50 states is the **District of Columbia**, a federal district that will be retained for analysis but clearly labeled as such.  

3. **Standardize column names** – Convert all headers to lowercase and rename *Chargers, Ratio: EVs to Charger Ports* to a clearer label such as `ev_to_charger_ratio`.  

4. **Correct data types** – Convert numeric columns (`total_evs`, `total_chargers`, `level_1_chargers`, `level_2_chargers`, `dcfc_chargers`) 

    from `object` to `float64` for accurate calculations.  

5. **Remove duplicates** – Ensure each record is unique.

6. **Handle missing values** – Address nulls through imputation or removal, depending on context. One is missing for the ratio column in the total row. Seems like its missing at random (MAR). 

7. **Clean text fields** – Normalize formatting (consistent casing, trim whitespace, fix encoding issues). 

8. **Standardize numerical formats** – Ensure consistent decimal separators and units, remove commas from numbers. 

9. **Validate data integrity** – Confirm that each column’s data type matches its intended use.  

These steps will ensure the dataset is clean, consistent, and ready for accurate analysis.

In [None]:
def clean_data(df):
    """Clean and format the DataFrame."""
    print(df.isna().sum())

    # Drop empty columns
    df = df.dropna(axis=1, how='all')

    # Remove duplicates
    df.drop_duplicates(inplace=True)

    # Standardize column names
    df.columns = df.columns.str.lower().str.strip()

    # Rename ratio column
    df.rename(columns={"ratio: evs to charger ports": "ev_charger_ratio"}, inplace=True)

    # Remove commas and convert numeric columns to float
    num_cols = ["total evs", "total chargers", "level 1 chargers", "level 2 chargers", "dcfc chargers"]
    df[num_cols] = df[num_cols].replace(',', '', regex=True).apply(pd.to_numeric, errors='coerce')

    # Recalculate ratio
    df["ev_charger_ratio"] = df["total evs"] / df["total chargers"]

    # Create summary tag 
    df['is_summary'] = False

    # Tag summary lines for filtering
    df.loc[df['state'].isin(['Total', 'Average', 'Median']), 'is_summary'] = True


    # Optional: validate dtypes
    assert 'object' not in df.dtypes[1:], "Non-numeric dtype found in numeric columns"
    assert not df.isna().any().any(), "DataFrame contains missing values"
    
    return df
# Clean the data
column = clean_data(column)
print('Data cleaned successfully!')


State                           0
Total Evs                       0
Total Chargers                  0
Ratio: EVs to Charger Ports     1
Level 1 Chargers                0
Level 2 Chargers                0
DCFC Chargers                   0
Square Mileage of State         0
Unnamed: 8                     54
Unnamed: 9                     54
dtype: int64
Data cleaned successfully!


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.drop_duplicates(inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.rename(columns={"ratio: evs to charger ports": "ev_charger_ratio"}, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[num_cols] = df[num_cols].replace(',', '', regex=True).apply(pd.to_numeric, errors='coerce')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer]

## Exploratory Data Analysis (EDA)

With the dataset cleaned and its structure reviewed, the next step is to explore its contents to uncover patterns and insights.


In [None]:
def explore_data(df):
    """Perform exploratory data analysis."""
    mean = np.mean(column)
    median = np.median(column)
    max =  np.max(column)
    min = np.min(column)
    zeros = np.zeros.df(column)
    sd_var = np.std(column)

    sns.histplot(data=df,x='column')
    return [mean, median, max, min, zeros, sd_var]



## Data Analysis

This stage focuses on investigating patterns, relationships, and trends in the prepared dataset to answer the defined questions or objectives.

Common steps may include:
- **Identifying correlations** between numerical variables.
- **Comparing groups** or categories to detect differences or trends.
- **Exploring relationships** through scatter plots, cross‑tabulations, or statistical tests.
- **Feature engineering** to create new variables that may improve insights.
- **Segmentation or clustering** to group similar observations.
- **Predictive modeling** to forecast outcomes, when relevant.

Analysis techniques should be chosen based on:
- The nature of the data (categorical, numerical, time‑series, text, etc.).
- The goals defined in the Introduction & Scope section.
- Any constraints or assumptions identified earlier.

Findings from this stage should directly inform conclusions or recommendations in the final section of the notebook.

In [None]:
# def analyze_data(df):
#     """Conduct a preliminary analysis on the data."""
#     if 'scientific_name' in df.columns and 'observations' in df.columns:
#         obs_counts = df.groupby('scientific_name')['observations'].sum().reset_index()
#         sorted_obs = obs_counts.sort_values(by='observations', ascending=False)
#         print("Summarized Observations:")
#         print(sorted_obs)
#     else:
#         print("Columns 'scientific_name' and/or 'observations' not found in the data.")

# analyze_data(df)

## Conclusions

This section summarizes the key findings from the analysis, highlighting patterns, relationships, or insights that directly address the project’s initial objectives.  

Typical elements include:
- **Restating the goals** and how the analysis addressed them.
- **Highlighting main discoveries** supported by the data.
- **Noting limitations** of the analysis, such as data quality, sample size, or scope constraints.
- **Suggesting next steps** for deeper investigation or practical application.
- **Potential implications** for decision‑making, policy, or further research.

Conclusions should focus on actionable takeaways and avoid repeating every detail of the analysis — instead, emphasize the most significant results and their relevance.