### Element Frequency Analysis

This section calculates how often each cleaned element appears across all episodes, providing a foundational understanding of the commonality of different elements in Bob Ross's paintings.

In [2]:
import pandas as pd

df = pd.read_csv("elements-by-episode.csv", sep="\t", encoding="utf-16")

total_episodes = df["Episode"].nunique()

drop_tags = {
    "Guest", "Steve Ross", "Diane Andre",
    "Framed", "Oval Frame", "Circle Frame", "Apple Frame", "Florida Frame",
    "Double Oval Frame", "Half Circle Frame", "Half Oval Frame",
    "Rectangle 3D Frame", "Rectangular Frame", "Tomb Frame", "Split Frame",
    "Seashell Frame", "Triple Frame", "Window Frame", "Wood Framed",
    "Nature Details"
}

element_clean_map = {
    # trees
    "Tree": "Deciduous trees",
    "Trees": "Deciduous trees",
    "Deciduous": "Deciduous trees",
    "Conifer": "Evergreen trees",
    "Palm Trees": "Palm trees",
    "Cactus": "Cactus",

    # sky / weather
    "Clouds": "Clouds",
    "Cumulus": "Clouds",
    "Cirrus": "Clouds",
    "FOG": "Fog / mist",
    "SUN": "Sun",
    "Moon": "Moon",
    "Aurora Borealis": "Aurora",

    # land
    "Mountain": "Mountains",
    "Mountains": "Mountains",
    "Snowy Mountain": "Mountains (snowy)",
    "Hills": "Hills",
    "Cliff": "Cliffs / rocks",
    "Rocks": "Cliffs / rocks",
    "Grass": "Grass",
    "Bushes": "Bushes",
    "Flowers": "Flowers",
    "Path": "Path / trail",
    "Snow": "Snow on ground",
    "Winter": "Winter scene",

    # water
    "Lake": "Lake",
    "River": "River / stream",
    "Ocean": "Ocean",
    "Beach": "Beach",
    "Waves": "Ocean waves",
    "Waterfall": "Waterfall",

    # structures
    "Cabin": "Cabin",
    "Barn": "Barn",
    "Farm": "Farm / farmhouse",
    "Fence": "Fence",
    "Bridge": "Bridge",
    "Building": "Building",
    "Dock": "Dock",
    "Lighthouse": "Lighthouse",
    "Mill": "Mill",
    "Windmill": "Windmill",
    "Boat": "Boat",

    # rare details / people
    "Portrait": "Portrait",
    "Person": "Person",
    "Fire": "Campfire"
}

df_clean = df[~df["Element"].isin(drop_tags)].copy()
df_clean["Element_Clean"] = df_clean["Element"].map(element_clean_map).fillna(df_clean["Element"])

freq = (
    df_clean[df_clean["Included"] == 1]
      .groupby("Element_Clean")["Episode"]
      .nunique()
      .sort_values(ascending=False)
      .to_frame(name="episode_count")
)

freq["pct_of_episodes"] = (freq["episode_count"] / total_episodes * 100).round(1)


freq


Unnamed: 0_level_0,episode_count,pct_of_episodes
Element_Clean,Unnamed: 1_level_1,Unnamed: 2_level_1
Deciduous trees,361,89.6
Evergreen trees,212,52.6
Clouds,182,45.2
Mountains,161,40.0
Lake,143,35.5
Grass,142,35.2
River / stream,126,31.3
Bushes,120,29.8
Mountains (snowy),109,27.0
Structure,85,21.1


### Conditional Probability Calculations

This section defines helper functions (`p`, `p_joint`, and `given`) to calculate the probability of individual elements and the conditional probability of one element appearing given another has appeared. These are crucial for understanding element co-occurrence patterns.

In [3]:
episode_matrix = (
    df_clean.pivot_table(index="Episode",
                         columns="Element_Clean",
                         values="Included",
                         aggfunc="max")
    .fillna(0)
)

# Probability functions
def p(elem):
    return episode_matrix[elem].mean()

def p_joint(a, b):
    return (episode_matrix[a] * episode_matrix[b]).mean()

def given(elem):
    out = {}
    base = p(elem)
    for other in episode_matrix.columns:
        if other == elem:
            continue
        joint = p_joint(other, elem)
        out[other] = joint / base if base > 0 else 0
    return pd.Series(out).sort_values(ascending=False)


In [4]:
print("Given Cabin appears, what else appears?")
print(given("Cabin").head(10))


Given Cabin appears, what else appears?
Deciduous trees    1.000000
Structure          0.869565
Evergreen trees    0.724638
Clouds             0.434783
Winter scene       0.420290
Snow on ground     0.405797
Lake               0.362319
Mountains          0.333333
Grass              0.304348
Bushes             0.231884
dtype: float64


In [5]:
print("Given Lake appears, what else appears?")
print(given("Lake").head(10))


Given Lake appears, what else appears?
Deciduous trees      0.993007
Evergreen trees      0.643357
Mountains            0.573427
Clouds               0.447552
Mountains (snowy)    0.405594
Bushes               0.384615
Grass                0.307692
Structure            0.202797
Cabin                0.174825
Snow on ground       0.174825
dtype: float64


In [6]:
print("Given Mountains appear, what else appears?")
print(given("Mountains").head(10))


Given Mountains appear, what else appears?
Deciduous trees      0.975155
Evergreen trees      0.807453
Mountains (snowy)    0.677019
Clouds               0.546584
Lake                 0.509317
Bushes               0.397516
Grass                0.372671
River / stream       0.304348
Snow on ground       0.180124
Winter scene         0.180124
dtype: float64


In [7]:
print("Given Clouds appear, what else appears?")
print(given("Clouds").head(10))


Given Clouds appear, what else appears?
Deciduous trees      0.824176
Evergreen trees      0.554945
Mountains            0.483516
Lake                 0.351648
Mountains (snowy)    0.335165
Grass                0.324176
Bushes               0.302198
River / stream       0.280220
Cliffs / rocks       0.247253
Structure            0.192308
dtype: float64


### Co-occurrence Heatmap Data Preparation

This section prepares the data structure required for visualizing conditional probabilities as a heatmap. It iterates through selected elements and calculates the conditional probability of each pair, storing the results in a DataFrame suitable for plotting.

In [8]:
episode_matrix = (
    df_clean.pivot_table(index="Episode",
                         columns="Element_Clean",
                         values="Included",
                         aggfunc="max")
    .fillna(0)
)

episode_matrix.head()


Element_Clean,Aurora,Barn,Beach,Boat,Bridge,Building,Bushes,Cabin,Cactus,Campfire,...,Path / trail,Person,Portrait,River / stream,Snow on ground,Structure,Sun,Waterfall,Windmill,Winter scene
Episode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
S01E01,0,0,0,0,0,0,1,0,0,0,...,0,0,0,1,0,0,0,0,0,0
S01E02,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,1,0,0,0,0,1
S01E03,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,1,1,0,0,1
S01E04,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
S01E05,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0


In [9]:
# choose the elements to include (same as your Tableau heatmap)
heatmap_elements = [
    "Bushes",
    "Structure",
    "Grass",
    "Deciduous trees",
    "Evergreen trees",
    "Winter scene",
    "Lake",
    "Snow on ground",
    "Mountains",
    "Clouds"
]

rows = []
for a in heatmap_elements:
    for b in heatmap_elements:
        if a == b:
            continue

        xa = episode_matrix[a]
        xb = episode_matrix[b]

        p_b = xb.mean()
        if p_b == 0:
            continue

        p_ab = (xa * xb).mean()
        p_a_given_b = p_ab / p_b

        rows.append([a, b, round(p_a_given_b, 2)])

co_cond = pd.DataFrame(rows, columns=["Element_A", "Element_B", "P_A_given_B"])
co_cond


Unnamed: 0,Element_A,Element_B,P_A_given_B
0,Bushes,Structure,0.19
1,Bushes,Grass,0.30
2,Bushes,Deciduous trees,0.33
3,Bushes,Evergreen trees,0.35
4,Bushes,Winter scene,0.19
...,...,...,...
85,Clouds,Evergreen trees,0.48
86,Clouds,Winter scene,0.41
87,Clouds,Lake,0.45
88,Clouds,Snow on ground,0.43


In [10]:
co_cond.to_csv("bob_ross_cooccurrence_heatmap.csv", index=False)
print("Saved bob_ross_cooccurrence_heatmap.csv")


Saved bob_ross_cooccurrence_heatmap.csv


In [12]:
avg_threshold = 0.30  # appears in ≥30% of episodes

avg_painting_elements = (
    freq[freq["pct_of_episodes"] >= avg_threshold]
        .sort_values("pct_of_episodes", ascending=False)
        .index
        .tolist()
)

print("Deterministic average painting elements:\n")
for elem in avg_painting_elements:
    print(f"- {elem} ({freq.loc[elem, 'pct_of_episodes']}%)")

Deterministic average painting elements:

- Deciduous trees (89.6%)
- Evergreen trees (52.6%)
- Clouds (45.2%)
- Mountains (40.0%)
- Lake (35.5%)
- Grass (35.2%)
- River / stream (31.3%)
- Bushes (29.8%)
- Mountains (snowy) (27.0%)
- Structure (21.1%)
- Cliffs / rocks (19.6%)
- Snow on ground (18.6%)
- Winter scene (17.1%)
- Cabin (17.1%)
- Path / trail (12.2%)
- Sun (9.9%)
- Waterfall (9.7%)
- Ocean (8.9%)
- Ocean waves (8.4%)
- Beach (6.7%)
- Fence (6.0%)
- Fog / mist (5.7%)
- Hills (4.5%)
- Barn (4.2%)
- Flowers (3.0%)
- Night (2.7%)
- Palm trees (2.2%)
- Bridge (1.7%)
- Cactus (1.0%)
- Portrait (0.7%)
- Moon (0.7%)
- Aurora (0.5%)
- Boat (0.5%)
- Mill (0.5%)


### Identifying Average Painting Elements

This section identifies elements that appear in a significant percentage of episodes (e.g., 30% or more). These 'average' elements represent common components in Bob Ross's paintings, providing a baseline for understanding his style.

In [27]:
base_elements = ['Lake', 'Mountains', 'Cabin', 'Ocean', 'Winter scene']

all_paintings = []

for base_elem in base_elements:
    # Get the 4 most co-occurring elements
    co_occurring_elements = given(base_elem).head(4).index.tolist()

    # Combine the base element with its co-occurring elements to form a painting
    painting = [base_elem] + co_occurring_elements

    # Ensure all elements are distinct within the painting (though given should handle this for top 4)
    painting = list(dict.fromkeys(painting)) # Remove duplicates while maintaining order

    all_paintings.append(painting)

print("Generated themed paintings:")
for i, painting in enumerate(all_paintings):
    print(f"Painting {i+1}: {painting}")

Generated themed paintings:
Painting 1: ['Lake', 'Deciduous trees', 'Evergreen trees', 'Mountains', 'Clouds']
Painting 2: ['Mountains', 'Deciduous trees', 'Evergreen trees', 'Mountains (snowy)', 'Clouds']
Painting 3: ['Cabin', 'Deciduous trees', 'Structure', 'Evergreen trees', 'Clouds']
Painting 4: ['Ocean', 'Ocean waves', 'Clouds', 'Beach', 'Cliffs / rocks']
Painting 5: ['Winter scene', 'Deciduous trees', 'Snow on ground', 'Evergreen trees', 'Structure']


In [28]:
import pandas as pd

df_paintings = pd.DataFrame(columns=['Painting_ID', 'Element', 'Size'])

painting_id_counter = 1
for painting_elements in all_paintings:
    for element in painting_elements:
        df_paintings = pd.concat([
            df_paintings,
            pd.DataFrame([{'Painting_ID': painting_id_counter, 'Element': element, 'Size': 1}])
        ], ignore_index=True)
    painting_id_counter += 1

df_paintings.to_csv('bob_ross_5_themed_paintings.csv', index=False)
print("Saved bob_ross_5_themed_paintings.csv")

Saved bob_ross_5_themed_paintings.csv
