The summary of project 1 Data Source
 
- URL: https://data.cityofnewyork.us/Health/COVID-19-Daily-Counts-of-Cases-Hospitalizations-an/rc75-m7u3/about_data 
- Indicator: COVID-19 Daily Counts of Cases, Hospitalizations, and Deaths 
- Organization: NYC open data 
- Data period: 2/29/2020 - 10/13/2025 
- Research Question - What are the mean, median and mode of the deaths/day by COVID-19

2. Using Pandas:
・Read in the data
・Compute the mean, medium and mode

In [1]:
import pandas as pd 

df = pd.read_csv("covid_nyc.csv") 
# read the file 

s = pd.to_numeric(df["DEATH_COUNT"], errors="coerce").dropna() 
#calculate the mean, median and mode 

mean_val = s.mean()
median_val = s.median()
mode_val = s.mode().iloc[0] if not s.mode().empty else None 

#output 
print("Mean:", mean_val) 
print("Median:", median_val) 
print("Mode :", mode_val)

Mean: 22.95423563777994
Median: 5.0
Mode : 1


3. Without pandas:
・Compute the mean, medium and mode

In [3]:
import csv

values = []

with open("covid_nyc.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)

    for row in reader:
        raw = row["DEATH_COUNT"]   
        try:
            v = float(raw)
            values.append(v)
        except:
            continue

print("Total values:", len(values))


mean_val = sum(values) / len(values)
print("Mean:", mean_val)

values_sorted = sorted(values)
n = len(values_sorted)

if n % 2 == 1:
    median_val = values_sorted[n // 2]
else:
    median_val = (values_sorted[n//2 - 1] + values_sorted[n//2]) / 2

print("Median:", median_val)

counts = {}

for v in values:
    if v in counts:
        counts[v] += 1
    else:
        counts[v] = 1

mode_val = max(counts, key=counts.get)

print("Mode:", mode_val)


Total values: 2054
Mean: 22.95423563777994
Median: 5.0
Mode: 1.0


4. visualization

In [5]:
import csv
from datetime import datetime

data = []

with open("covid_nyc.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        date_str = row["date_of_interest"]
        death_str = row["DEATH_COUNT"]

        try:
            date = datetime.strptime(date_str, "%m/%d/%Y")
            death = float(death_str)
            data.append((date.year, death))
        except:
            pass


year_totals = {}
year_counts = {}

for year, death in data:
    year_totals[year] = year_totals.get(year, 0) + death
    year_counts[year] = year_counts.get(year, 0) + 1

year_avg = {year: year_totals[year] / year_counts[year] for year in year_totals}


print("\nCOVID-19 Deaths per year (average per day)\n")

scale = 3  

for year in sorted(year_avg):
    avg = year_avg[year]
    bar = "*" * int(avg / scale)
    print(f"{year}: {bar}")



COVID-19 Deaths per year (average per day)

2020: ***************************
2021: *********
2022: *******
2023: *
2024: 
2025: 
