# Sample Spaces and Events - Foundation of Probability

## Introduction

Welcome to your probability learning journey! This notebook introduces the fundamental building blocks of probability theory:
- **Sample spaces**: All possible outcomes
- **Events**: Collections of outcomes we're interested in
- **Probability**: How likely events are to occur

### What You'll Learn
1. How to identify sample spaces in agricultural contexts
2. How to define events and calculate their probabilities
3. How to visualize probability with Venn diagrams
4. How probability helps in farming decisions

### The Agricultural Context
Every day, farmers make decisions under uncertainty:
- Should I plant today? (Risk of late frost?)
- Do I need to spray? (Probability of pest outbreak?)
- Which crop variety? (Likelihood of disease resistance?)

Probability gives us a mathematical framework to reason about these uncertainties!

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib_venn import venn2, venn3
import pandas as pd

# Set style for better-looking plots
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
np.random.seed(42)

print("âœ“ Libraries imported successfully!")
print("Ready to explore probability!")

## 1. Sample Spaces: All Possible Outcomes

### What is a Sample Space?

A **sample space** (denoted $S$ or $\Omega$) is the set of **all possible outcomes** of a random experiment.

### Agricultural Examples

Let's start with simple examples from farming:

In [None]:
# Example 1: Weather tomorrow
weather_sample_space = {'Sunny', 'Cloudy', 'Rainy', 'Stormy'}
print("Weather Sample Space:")
print(weather_sample_space)
print(f"\nTotal possible outcomes: {len(weather_sample_space)}")

# Example 2: Crop germination
germination_sample_space = {'Germinated', 'Not Germinated'}
print("\n" + "="*50)
print("\nGermination Sample Space:")
print(germination_sample_space)
print(f"Total possible outcomes: {len(germination_sample_space)}")

# Example 3: Pest severity level
pest_severity_sample_space = {'None', 'Low', 'Medium', 'High', 'Severe'}
print("\n" + "="*50)
print("\nPest Severity Sample Space:")
print(pest_severity_sample_space)
print(f"Total possible outcomes: {len(pest_severity_sample_space)}")

### Discrete vs Continuous Sample Spaces

**Discrete Sample Space**: Countable outcomes (can list them)
- Crop types: {Wheat, Corn, Soy}
- Number of pests: {0, 1, 2, 3, ...}

**Continuous Sample Space**: Uncountable outcomes (measurements)
- Rainfall amount: [0, âˆž) mm
- Soil pH: [0, 14]
- Temperature: (-âˆž, âˆž) Â°C

In [None]:
# Visualize sample spaces
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Weather sample space
weather = list(weather_sample_space)
axes[0].bar(range(len(weather)), [1]*len(weather), color='skyblue', edgecolor='black', linewidth=2)
axes[0].set_xticks(range(len(weather)))
axes[0].set_xticklabels(weather, rotation=45)
axes[0].set_ylabel('Represents 1 Outcome', fontsize=11)
axes[0].set_title('Weather Sample Space\n(4 possible outcomes)', fontsize=13, fontweight='bold')
axes[0].set_ylim([0, 1.5])

# Germination sample space
germination = list(germination_sample_space)
axes[1].bar(range(len(germination)), [1]*len(germination), color='lightgreen', edgecolor='black', linewidth=2)
axes[1].set_xticks(range(len(germination)))
axes[1].set_xticklabels(germination, rotation=45)
axes[1].set_ylabel('Represents 1 Outcome', fontsize=11)
axes[1].set_title('Germination Sample Space\n(2 possible outcomes)', fontsize=13, fontweight='bold')
axes[1].set_ylim([0, 1.5])

# Pest severity sample space
pest = list(pest_severity_sample_space)
axes[2].bar(range(len(pest)), [1]*len(pest), color='salmon', edgecolor='black', linewidth=2)
axes[2].set_xticks(range(len(pest)))
axes[2].set_xticklabels(pest, rotation=45)
axes[2].set_ylabel('Represents 1 Outcome', fontsize=11)
axes[2].set_title('Pest Severity Sample Space\n(5 possible outcomes)', fontsize=13, fontweight='bold')
axes[2].set_ylim([0, 1.5])

plt.tight_layout()
plt.show()

print("\nðŸ’¡ Key Insight: A sample space contains ALL possible outcomes.")
print("   Every experiment must result in exactly one of these outcomes.")

## 2. Events: Outcomes We Care About

### What is an Event?

An **event** is a **subset** of the sample space - a collection of outcomes we're interested in.

**Notation**: Events are usually denoted by capital letters: $A$, $B$, $C$, etc.

### Examples

In [None]:
# Using weather sample space from earlier
print("Sample Space S:", weather_sample_space)
print("\nDefining Events:")
print("="*50)

# Event A: Good weather for planting
event_A_good_weather = {'Sunny', 'Cloudy'}
print("\nEvent A (Good planting weather):", event_A_good_weather)

# Event B: Precipitation occurs
event_B_precipitation = {'Rainy', 'Stormy'}
print("Event B (Precipitation):", event_B_precipitation)

# Event C: Extreme conditions
event_C_extreme = {'Stormy'}
print("Event C (Extreme weather):", event_C_extreme)

print("\n" + "="*50)
print("\nðŸ’¡ Notice: Events A and B together cover the entire sample space!")
print(f"   A âˆª B = {event_A_good_weather.union(event_B_precipitation)}")

### Special Types of Events

1. **Simple Event**: Contains exactly one outcome
   - Example: {Sunny}

2. **Compound Event**: Contains multiple outcomes
   - Example: {Sunny, Cloudy}

3. **Certain Event**: The entire sample space $S$ (always happens)
   - Example: "Weather will be one of the four types"

4. **Impossible Event**: Empty set $\emptyset$ (never happens)
   - Example: "Weather is both Sunny AND Rainy at the same time"

In [None]:
# Agricultural example with pest counts
# Sample space: number of aphids on a plant
pest_count_sample_space = set(range(0, 51))  # 0 to 50 aphids possible

# Define events
event_no_pests = {0}
event_low_infestation = set(range(1, 11))  # 1-10 aphids
event_high_infestation = set(range(26, 51))  # 26-50 aphids
event_treatment_needed = set(range(21, 51))  # >20 aphids need treatment

print("Pest Count Sample Space: 0 to 50 aphids")
print(f"\nEvent 'No Pests': {event_no_pests}")
print(f"Event 'Low Infestation' (1-10): {len(event_low_infestation)} outcomes")
print(f"Event 'High Infestation' (26-50): {len(event_high_infestation)} outcomes")
print(f"Event 'Treatment Needed' (>20): {len(event_treatment_needed)} outcomes")

# Visualize
fig, ax = plt.subplots(figsize=(14, 4))

counts = list(range(51))
colors = ['green' if c == 0 else 'lightgreen' if c <= 10 else 'yellow' if c <= 20 
          else 'orange' if c <= 25 else 'red' for c in counts]

ax.bar(counts, [1]*len(counts), color=colors, edgecolor='black', linewidth=0.5)
ax.set_xlabel('Number of Aphids', fontsize=12)
ax.set_ylabel('Outcome', fontsize=12)
ax.set_title('Sample Space: Aphid Count (0-50)\nColor coded by severity', 
             fontsize=14, fontweight='bold')
ax.set_ylim([0, 1.5])

# Add legend
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='green', edgecolor='black', label='No Pests (0)'),
    Patch(facecolor='lightgreen', edgecolor='black', label='Low (1-10)'),
    Patch(facecolor='yellow', edgecolor='black', label='Moderate (11-20)'),
    Patch(facecolor='orange', edgecolor='black', label='Needs Treatment (21-25)'),
    Patch(facecolor='red', edgecolor='black', label='High (26-50)')
]
ax.legend(handles=legend_elements, loc='upper right')

plt.tight_layout()
plt.show()

## 3. Probability: Measuring Likelihood

### What is Probability?

**Probability** measures how likely an event is to occur.

**Notation**: $P(A)$ = Probability of event $A$

**Range**: $0 \leq P(A) \leq 1$
- $P(A) = 0$: Event $A$ never happens (impossible)
- $P(A) = 1$: Event $A$ always happens (certain)
- $P(A) = 0.5$: Event $A$ happens half the time

### Simple Probability Formula

For **equally likely outcomes**:

$$P(A) = \frac{\text{Number of outcomes in A}}{\text{Total number of outcomes in S}}$$

In [None]:
# Example: Rolling a die to decide which field to inspect
die_sample_space = {1, 2, 3, 4, 5, 6}

# Event: Rolling an even number
event_even = {2, 4, 6}

# Calculate probability
prob_even = len(event_even) / len(die_sample_space)

print("Sample Space (Die Roll):", die_sample_space)
print("Event 'Even Number':", event_even)
print(f"\nP(Even) = {len(event_even)}/{len(die_sample_space)} = {prob_even}")
print(f"P(Even) = {prob_even:.2%}")

print("\n" + "="*50)

# Agricultural example: Weather data
# Historical data: Last 100 days in growing season
days_sunny = 45
days_cloudy = 30
days_rainy = 20
days_stormy = 5
total_days = 100

# Calculate probabilities
prob_sunny = days_sunny / total_days
prob_cloudy = days_cloudy / total_days
prob_rainy = days_rainy / total_days
prob_stormy = days_stormy / total_days

print("\nHistorical Weather Probabilities (Last 100 Days):")
print(f"P(Sunny) = {prob_sunny:.2%}")
print(f"P(Cloudy) = {prob_cloudy:.2%}")
print(f"P(Rainy) = {prob_rainy:.2%}")
print(f"P(Stormy) = {prob_stormy:.2%}")
print(f"\nTotal probability: {prob_sunny + prob_cloudy + prob_rainy + prob_stormy}")

# Probability of good planting weather
prob_good_weather = prob_sunny + prob_cloudy
print(f"\nP(Good planting weather) = P(Sunny or Cloudy) = {prob_good_weather:.2%}")

In [None]:
# Visualize probabilities
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Bar chart
weather_types = ['Sunny', 'Cloudy', 'Rainy', 'Stormy']
probabilities = [prob_sunny, prob_cloudy, prob_rainy, prob_stormy]
colors_weather = ['gold', 'lightgray', 'skyblue', 'darkblue']

ax1.bar(weather_types, probabilities, color=colors_weather, edgecolor='black', linewidth=2)
ax1.set_ylabel('Probability', fontsize=12)
ax1.set_title('Weather Probabilities\n(Based on 100-day history)', fontsize=14, fontweight='bold')
ax1.set_ylim([0, 0.5])
ax1.axhline(y=0.25, color='red', linestyle='--', alpha=0.5, label='Equal probability (0.25)')
ax1.legend()

# Add value labels on bars
for i, (weather, prob) in enumerate(zip(weather_types, probabilities)):
    ax1.text(i, prob + 0.01, f'{prob:.0%}', ha='center', fontsize=11, fontweight='bold')

# Pie chart
ax2.pie(probabilities, labels=weather_types, colors=colors_weather, autopct='%1.0f%%',
        startangle=90, textprops={'fontsize': 11, 'fontweight': 'bold'})
ax2.set_title('Weather Distribution', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("\nðŸ’¡ Key Insight: Probabilities of all outcomes in a sample space sum to 1.0")
print("   This is called the 'normalization' property of probability.")

## 4. Venn Diagrams: Visualizing Events

**Venn diagrams** help us visualize relationships between events.

### Set Operations

- **Union** ($A \cup B$): Event $A$ OR $B$ (or both) occurs
- **Intersection** ($A \cap B$): Both event $A$ AND $B$ occur
- **Complement** ($A^c$ or $\overline{A}$): Event $A$ does NOT occur

In [None]:
# Agricultural example: Crop conditions
# Sample space: 100 fields surveyed

# Event D: Disease present
# Event W: Water stress present

# Hypothetical data:
fields_with_disease = 30  # 30 fields have disease
fields_with_water_stress = 40  # 40 fields have water stress
fields_with_both = 15  # 15 fields have BOTH issues

print("Field Survey Results (100 fields):")
print(f"Fields with disease: {fields_with_disease}")
print(f"Fields with water stress: {fields_with_water_stress}")
print(f"Fields with both problems: {fields_with_both}")

# Calculate probabilities
total_fields = 100
P_D = fields_with_disease / total_fields
P_W = fields_with_water_stress / total_fields
P_D_and_W = fields_with_both / total_fields

# Union: Fields with disease OR water stress (or both)
fields_with_either = fields_with_disease + fields_with_water_stress - fields_with_both
P_D_or_W = fields_with_either / total_fields

print(f"\nP(Disease) = {P_D:.2%}")
print(f"P(Water Stress) = {P_W:.2%}")
print(f"P(Both) = {P_D_and_W:.2%}")
print(f"P(Disease OR Water Stress) = {P_D_or_W:.2%}")

# Fields with no problems
fields_healthy = total_fields - fields_with_either
P_healthy = fields_healthy / total_fields
print(f"P(Healthy - no problems) = {P_healthy:.2%}")

In [None]:
# Visualize with Venn diagram
try:
    plt.figure(figsize=(10, 8))
    
    # Create Venn diagram
    venn = venn2(subsets=(fields_with_disease - fields_with_both,  # Only disease
                          fields_with_water_stress - fields_with_both,  # Only water stress
                          fields_with_both),  # Both
                 set_labels=('Disease (D)', 'Water Stress (W)'),
                 set_colors=('lightcoral', 'lightblue'),
                 alpha=0.6)
    
    # Customize labels
    if venn.get_label_by_id('10'):
        venn.get_label_by_id('10').set_text(f'{fields_with_disease - fields_with_both}\nOnly Disease')
    if venn.get_label_by_id('01'):
        venn.get_label_by_id('01').set_text(f'{fields_with_water_stress - fields_with_both}\nOnly Water\nStress')
    if venn.get_label_by_id('11'):
        venn.get_label_by_id('11').set_text(f'{fields_with_both}\nBoth')
    
    plt.title('Field Survey: Disease and Water Stress\n(100 fields total)', 
              fontsize=14, fontweight='bold')
    plt.text(0.5, -0.7, f'{fields_healthy} fields have neither problem (Healthy)', 
             ha='center', fontsize=12, fontweight='bold', 
             bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.5),
             transform=plt.gca().transAxes)
    
    plt.tight_layout()
    plt.show()
    
except ImportError:
    print("\nNote: Install matplotlib-venn to see Venn diagrams")
    print("Run: pip install matplotlib-venn")

print("\nðŸ’¡ Key Insight: Venn diagrams help us visualize overlapping events.")
print("   The overlap shows outcomes that belong to BOTH events.")

## 5. Real Agricultural Decision Example

### Scenario: Pest Management Decision

You're managing an orchard and need to decide whether to spray pesticides.

**Sample Space**: {No Pests, Minor Infestation, Major Infestation}

**Historical Data** (from last 10 seasons):
- No Pests: 30 days
- Minor Infestation: 50 days  
- Major Infestation: 20 days

In [None]:
# Historical data
data = {
    'Condition': ['No Pests', 'Minor Infestation', 'Major Infestation'],
    'Days': [30, 50, 20],
    'Spray Needed': ['No', 'Optional', 'Yes']
}
df = pd.DataFrame(data)

# Calculate probabilities
total_days = df['Days'].sum()
df['Probability'] = df['Days'] / total_days
df['Percentage'] = df['Probability'].apply(lambda x: f'{x:.1%}')

print("Pest Occurrence Probabilities:")
print("="*60)
print(df.to_string(index=False))

# Decision analysis
print("\n" + "="*60)
print("\nDecision Analysis:")
print(f"P(Spray Needed) = P(Major Infestation) = {df.iloc[2]['Probability']:.1%}")
print(f"P(No Spray Needed) = P(No Pests) = {df.iloc[0]['Probability']:.1%}")
print(f"P(Uncertain) = P(Minor Infestation) = {df.iloc[1]['Probability']:.1%}")

print("\nðŸ’¡ Insight: With a 50% chance of minor infestation, the decision depends on:")
print("   - Cost of spraying vs. cost of crop damage")
print("   - Environmental considerations")
print("   - Risk tolerance")

In [None]:
# Visualize the decision scenario
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Historical frequency
ax1.bar(df['Condition'], df['Days'], color=['green', 'yellow', 'red'], 
        edgecolor='black', linewidth=2)
ax1.set_ylabel('Number of Days', fontsize=12)
ax1.set_title('Historical Pest Data\n(100 days total)', fontsize=14, fontweight='bold')
ax1.tick_params(axis='x', rotation=45)

for i, row in df.iterrows():
    ax1.text(i, row['Days'] + 1, str(row['Days']), ha='center', 
            fontsize=11, fontweight='bold')

# Probability distribution
ax2.bar(df['Condition'], df['Probability'], color=['green', 'yellow', 'red'],
        edgecolor='black', linewidth=2)
ax2.set_ylabel('Probability', fontsize=12)
ax2.set_title('Probability Distribution', fontsize=14, fontweight='bold')
ax2.tick_params(axis='x', rotation=45)
ax2.set_ylim([0, 0.6])

for i, row in df.iterrows():
    ax2.text(i, row['Probability'] + 0.02, row['Percentage'], ha='center',
            fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

## 6. Key Takeaways

### What You Learned

1. **Sample Space**: The set of all possible outcomes
   - Must be exhaustive (cover all possibilities)
   - Outcomes must be mutually exclusive

2. **Events**: Subsets of the sample space we're interested in
   - Simple events: single outcome
   - Compound events: multiple outcomes

3. **Probability**: Measures likelihood (0 to 1)
   - P(A) = (favorable outcomes) / (total outcomes)
   - All probabilities sum to 1

4. **Venn Diagrams**: Visualize event relationships
   - Union: A OR B
   - Intersection: A AND B
   - Complement: NOT A

### Why This Matters for ML

- **Classification**: Predicting which event (class) will occur
- **Probability Outputs**: ML models output probabilities of events
- **Decision Making**: Using probabilities to choose optimal actions
- **Risk Assessment**: Quantifying uncertainty in predictions

### Next Steps

In the next notebook, we'll learn:
- The three axioms of probability
- Addition and multiplication rules
- How to calculate probabilities of compound events

Continue to: `02_probability_axioms.ipynb`

## Exercises (Optional)

Try these on your own:

1. **Define a Sample Space**: You're monitoring crop growth stages. Define the sample space and calculate probabilities if: Seedling (20 days), Vegetative (40 days), Flowering (25 days), Maturity (15 days).

2. **Event Probability**: From the crop growth data above, what's P(Crop needs water)? Assume water is needed in Seedling and Vegetative stages.

3. **Venn Diagram**: Draw a Venn diagram for: Drought-resistant crops (50%), Disease-resistant crops (60%), Both properties (30%).

**Work in the cell below:**

In [None]:
# Your solutions here


---

**Congratulations!** You've completed your first probability notebook!

You now understand sample spaces, events, and basic probability - the foundation for all of probability theory and machine learning.

**Next**: `02_probability_axioms.ipynb`