# Project 1

## Project Information

This project uses monthly peak load data from PJM, a regional transmission organization in the eastern United States. The dataset includes peak electricity demand for each month over several years. The analysis aims to compute mean, median, and mode of peak load values and visualize the distribution using only Python standard library tools.

## 1. Using Pandas

### 1.1 Read the CSV File:

In [27]:
import pandas as pd

#import csv file
df = pd.read_csv('peak_load.csv')
print(df.head())


     month  peak_load_MW
0  2019-01      138963.0
1  2019-02      132396.0
2  2019-03      120826.0
3  2019-04      102538.0
4  2019-05      114821.0


### 1.2 Calcualte Mean, Median and Mode:

In [28]:
mean_value = df['peak_load_MW'].mean()
median_value = df['peak_load_MW'].median()
mode_value = df['peak_load_MW'].mode()[0] #only the first mode

print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Mode: {mode_value}")

Mean: 122339.89024390244
Median: 120917.0
Mode: 86270.0


## 2. Using only the Python standard library

### 2.1 Read the CSV File:

In [29]:
import csv

# Read just the header to confirm column names
with open('peak_load.csv') as f:
    reader = csv.reader(f)
    header = next(reader)
    print(header)


['month', 'peak_load_MW']


### 2.2 Calculate Mean, Median and Mode:

In [30]:
# Read the actual numeric values

values = []
with open('peak_load.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        values.append(float(row['peak_load_MW']))

# Calculate mean
mean_value = sum(values) / len(values)
print(f"Mean: {mean_value}")

# Calculate median:
sorted_values = sorted(values)
n = len(sorted_values)

if n % 2 == 1:  # odd length
    median_value = sorted_values[n // 2]
else:           # even length
    median_value = (sorted_values[n // 2 - 1] + sorted_values[n // 2]) / 2

print(f"Median: {median_value}")

# Calculate mode:
freq = {}

for v in values:
    freq[v] = freq.get(v, 0) + 1

max_count = max(freq.values())
modes = [k for k, v in freq.items() if v == max_count]

# Select the first mode
mode_value = modes[0]
print(f"Mode: {mode_value}")

Mean: 122339.89024390244
Median: 120917.0
Mode: 138963.0


## 3. Data Visualization:

In [31]:
# Convert month column to datetime
df['month'] = pd.to_datetime(df['month'])

# Use peak_load_MW values for visualization
values = df['peak_load_MW'].tolist()
months = df['month'].dt.strftime('%Y-%m').tolist()  # format for printing

max_val = max(values)
scale = 50  # max blocks

print("PJM Monthly Peak Load Bar Chart (MW):\n")
print("Month       Peak Load (MW)\n")
for month, value in zip(months, values):
    bar = '█' * int((value / max_val) * scale)
    print(f"{month} {bar}")


PJM Monthly Peak Load Bar Chart (MW):

Month       Peak Load (MW)

2019-01 ███████████████████████████████████████████████
2019-02 █████████████████████████████████████████████
2019-03 █████████████████████████████████████████
2019-04 ███████████████████████████████████
2019-05 ███████████████████████████████████████
2019-06 ███████████████████████████████████████████████
2019-07 █████████████████████████████████████████████████
2019-08 █████████████████████████████████████████████████
2019-09 ████████████████████████████████████████████████
2019-10 ███████████████████████████████████████████
2019-11 ███████████████████████████████████████
2019-12 █████████████████████████████████████████
2020-01 █████████████████████████████████████████
2020-02 ███████████████████████████████████████
2020-03 ██████████████████████████████████
2020-04 █████████████████████████████
2020-05 █████████████████████████████████████
2020-06 █████████████████████████████████████████████
2020-07 ███████████████