In [7]:
!git clone https://github.com/xJenDragon/StrangerStats.git

fatal: destination path 'StrangerStats' already exists and is not an empty directory.


# Stranger Stats: The Hawkins Mystery 🧪👾

# Important Installs

In [8]:
!python /content/StrangerStats/helper/install.py
!pip install pandas



# 1. Loading the Data 📂
We’ll start by loading the datasets using our helper functions.

In [9]:
# Add helper folder to sys.path
import sys
sys.path.append("/content/StrangerStats/helper")

# Import and reload
import data_loader as dl
import analysis as an
import viz as vz
import pandas as pd
import importlib
importlib.reload(dl)

# Path to your Excel file
DATA_FILE = "/content/StrangerStats/data/stranger_stats.xlsx"

# Load data
monster_df = dl.load_monster_sightings(DATA_FILE)
char_df = dl.load_character_stats(DATA_FILE)
events_df = dl.load_upside_down_events(DATA_FILE)

# 2. Helpers for Data 🙋
Providing some helpers for time conversion, severity conversion, etc.

In [10]:
# Converting Time columns to hours for easier merging
char_df["Hour"] = char_df["Time"].apply(lambda t: t.hour if pd.notnull(t) else None)
char_df = char_df.dropna(subset=["Hour"])
char_df["Hour"] = char_df["Hour"].astype(int)

events_df["Hour"] = events_df["Time"].apply(lambda t: t.hour if pd.notnull(t) else None)
events_df = events_df.dropna(subset=["Hour"])
events_df["Hour"] = events_df["Hour"].astype(int)

# Map Severity to numeric
severity_map = {"Low": 1, "Medium": 2, "High": 3}
events_df["Severity_Num"] = events_df["Severity"].map(severity_map)

# 3. The Hawkins Mystery ☎️

### Challenge 1: Which character is most likely to encounter a monster?

**Instructions:**  
- Merge the Monster Sightings and Character Stats datasets by `Day` and `Hour`.  
- Count how many times each character appears when a monster is present.  
- Sort to find the character most likely to encounter a monster.

**Hints:**  
- Use `pd.merge()` with `on=["Day","Hour"]`.  
- Use `groupby()` and `count()` to aggregate.


### Challenge 2: Which day/hour is the most dangerous overall?

**Instructions:**  
- Combine monster counts and Upside Down event severity.  
- Create a "danger score" (monster count × severity).  
- Identify the day/hour with the highest danger score.

**Hints:**  
- Use `groupby(["Day","Hour"]).agg(...)`.  
- Sort the results with `sort_values()`.

### Challenge 3: Does monster count correlate with Upside Down severity?

**Instructions:**  
- Measure correlation between monster counts and severity.
- Which monsters tend to cause more severe events?

**Hints:**  
- Use `corr()` to compute Pearson correlation.  
- Filter datasets if needed.

### Challenge 4: Which monster is most likely to appear next?

**Instructions:**  
- Calculate probability of each monster appearing next.
- Use historical monster sightings.

**Hints:**  
- Use `value_counts(normalize=True)` to get probabilities.

### Challenge 5: Probability of High severity for next monster event.

**Instructions:**  
- Calculate `P(High severity | Monster).`
- Optional: filter by day or location.

**Hints:**  
- Use g`roupby("Monster")["Severity"].apply(lambda x: (x=="High").mean()).`
- Round probabilities for readability.

### Challenge 6: Character activity vs monster peaks

**Instructions:**  
- Merge datasets by hour.
- Create a heatmap showing which characters are active when monsters appear.

**Hints:**  
- Use `groupby(["Hour","Character"])["Monster"].count().unstack(fill_value=0).`
- Use `sns.heatmap()` for visualization.