# Day 3: The Protein Family Mystery 🧬
*Lists as Protein Collections: Organizing Related Molecules*

---

## Today's Biological Mystery

**"Why do related proteins have similar but not identical functions?"**

You're studying a family of enzyme proteins that all break down different types of sugars. Some work on glucose, others on fructose, and some can handle multiple sugars. Your mission: organize and analyze this protein family to understand their relationships.

Today you'll learn that **lists in Python are like protein families** - collections of related items that you can organize, sort, and analyze together.

---

## 🔬 The Biological Context

**Your protein family data:**
- **Protein names:** ["Glucosidase", "Fructosidase", "Sucrase", "Lactase", "Maltase"]
- **Activity levels:** [85, 92, 78, 65, 88] (units per minute)
- **Substrate specificity:** ["Glucose", "Fructose", "Sucrose", "Lactose", "Maltose"]
- **Tissue locations:** ["Liver", "Intestine", "Intestine", "Intestine", "Intestine"]

**Your biological questions:**
1. Which protein has the highest activity?
2. How many intestinal enzymes are there?
3. What's the average activity across the family?
4. Are there any patterns in substrate specificity?

**Your coding challenge:** Use Python lists to organize and analyze biological collections.

## 💡 The Biological Analogy

Think of **Python lists like protein families**:

| Protein Biology | Python Programming |
|---|---|
| **Protein family** (related enzymes) | **List** [item1, item2, item3] |
| **First protein** in family | **First item** list[0] |
| **Family size** (number of proteins) | **List length** len(list) |
| **Add new protein** to family | **Append item** list.append() |
| **Sort proteins** by activity | **Sort list** list.sort() |
| **Check if protein exists** | **Check membership** item in list |
| **Remove inactive protein** | **Remove item** list.remove() |

Just like protein families group related molecules with similar functions, lists group related data that you can analyze together!

## 🧪 Lab Exercise 1: Create Your Protein Database

**Your task:** Store the protein family data in organized lists.

**Think like a biochemist:** You'd organize proteins by their properties to study structure-function relationships.

In [None]:
# Create lists for each protein property
protein_names = ["Glucosidase", "Fructosidase", "Sucrase", "Lactase", "Maltase"]
activity_levels = [85, 92, 78, 65, 88]  # units per minute
substrates = ["Glucose", "Fructose", "Sucrose", "Lactose", "Maltose"]
tissue_locations = ["Liver", "Intestine", "Intestine", "Intestine", "Intestine"]

# Display the protein family
print("🧬 Glycosidase Protein Family Database")
print("=" * 50)
print(f"Family size: {len(protein_names)} proteins\n")

print("Protein Details:")
for i in range(len(protein_names)):
    print(f"{i+1}. {protein_names[i]}")
    print(f"   Activity: {activity_levels[i]} units/min")
    print(f"   Substrate: {substrates[i]}")
    print(f"   Location: {tissue_locations[i]}")
    print()

## 🧪 Lab Exercise 2: Find the Most Active Protein

**Biological goal:** Identify which enzyme has the highest catalytic activity.

**Your task:** Use list methods to find the maximum activity and corresponding protein.

In [None]:
# Find the highest activity level
max_activity = max(activity_levels)
print(f"Highest activity level: {max_activity} units/min")

# Find which protein has this activity
max_activity_index = activity_levels.index(max_activity)
most_active_protein = protein_names[max_activity_index]
most_active_substrate = substrates[max_activity_index]
most_active_location = tissue_locations[max_activity_index]

print(f"\n🏆 Most Active Protein:")
print(f"Name: {most_active_protein}")
print(f"Activity: {max_activity} units/min")
print(f"Substrate: {most_active_substrate}")
print(f"Location: {most_active_location}")

# Find the least active protein for comparison
min_activity = min(activity_levels)
min_activity_index = activity_levels.index(min_activity)
least_active_protein = protein_names[min_activity_index]

print(f"\n📉 Least Active Protein:")
print(f"Name: {least_active_protein}")
print(f"Activity: {min_activity} units/min")

# Calculate the activity difference
activity_range = max_activity - min_activity
print(f"\nActivity range in family: {activity_range} units/min")

## 🧪 Lab Exercise 3: Analyze Tissue Distribution

**Biological context:** Protein location often relates to function - digestive enzymes cluster in intestines, metabolic enzymes in liver.

**Your task:** Count how many proteins are found in each tissue type.

In [None]:
# Count proteins by tissue location
print("🏥 Tissue Distribution Analysis:")

# Method 1: Count specific tissues
intestine_count = tissue_locations.count("Intestine")
liver_count = tissue_locations.count("Liver")

print(f"Intestine: {intestine_count} proteins")
print(f"Liver: {liver_count} proteins")

# Method 2: Find all unique tissue types
unique_tissues = []
for tissue in tissue_locations:
    if tissue not in unique_tissues:
        unique_tissues.append(tissue)

print(f"\nUnique tissue types: {unique_tissues}")

# Create detailed tissue analysis
print(f"\n📊 Detailed Tissue Analysis:")
for tissue in unique_tissues:
    count = tissue_locations.count(tissue)
    percentage = (count / len(tissue_locations)) * 100
    print(f"{tissue}: {count} proteins ({percentage:.1f}% of family)")
    
    # List which proteins are in this tissue
    proteins_in_tissue = []
    for i in range(len(tissue_locations)):
        if tissue_locations[i] == tissue:
            proteins_in_tissue.append(protein_names[i])
    
    print(f"  Proteins: {', '.join(proteins_in_tissue)}")
    print()

## 🧪 Lab Exercise 4: Calculate Family Statistics

**Your task:** Compute statistical measures to understand the protein family's characteristics.

**Think like a bioinformatician:** Statistical analysis reveals patterns in protein families.

In [None]:
# Calculate basic statistics
total_activity = sum(activity_levels)
average_activity = total_activity / len(activity_levels)
family_size = len(protein_names)

print("📈 Protein Family Statistics:")
print(f"Family size: {family_size} proteins")
print(f"Total activity: {total_activity} units/min")
print(f"Average activity: {average_activity:.1f} units/min")
print(f"Highest activity: {max(activity_levels)} units/min")
print(f"Lowest activity: {min(activity_levels)} units/min")

# Find proteins above and below average
high_activity_proteins = []
low_activity_proteins = []

for i in range(len(activity_levels)):
    if activity_levels[i] > average_activity:
        high_activity_proteins.append(protein_names[i])
    else:
        low_activity_proteins.append(protein_names[i])

print(f"\n⬆️ Above average activity ({len(high_activity_proteins)} proteins):")
for protein in high_activity_proteins:
    index = protein_names.index(protein)
    activity = activity_levels[index]
    print(f"  {protein}: {activity} units/min")

print(f"\n⬇️ Below average activity ({len(low_activity_proteins)} proteins):")
for protein in low_activity_proteins:
    index = protein_names.index(protein)
    activity = activity_levels[index]
    print(f"  {protein}: {activity} units/min")

## 🧪 Lab Exercise 5: Protein Family Expansion

**Biological scenario:** Your lab discovers a new enzyme in this family!

**New protein data:**
- Name: "Trehalase"
- Activity: 95 units/min
- Substrate: "Trehalose"
- Location: "Muscle"

**Your task:** Add this protein to your database and update your analysis.

In [None]:
# Add the new protein to each list
new_protein_name = "Trehalase"
new_activity = 95
new_substrate = "Trehalose"
new_location = "Muscle"

# Expand the protein family
protein_names.append(new_protein_name)
activity_levels.append(new_activity)
substrates.append(new_substrate)
tissue_locations.append(new_location)

print("🔬 New Protein Added to Family!")
print(f"Added: {new_protein_name}")
print(f"Family size now: {len(protein_names)} proteins\n")

# Recalculate statistics with new protein
new_average = sum(activity_levels) / len(activity_levels)
new_max = max(activity_levels)
new_max_index = activity_levels.index(new_max)
new_champion = protein_names[new_max_index]

print("📊 Updated Family Statistics:")
print(f"New average activity: {new_average:.1f} units/min")
print(f"New highest activity: {new_max} units/min ({new_champion})")

# Check if the new protein is now the most active
if new_champion == new_protein_name:
    print(f"🏆 {new_protein_name} is now the most active enzyme!")
else:
    print(f"📈 {new_protein_name} has high activity but {new_champion} is still the champion")

# Update tissue distribution
updated_unique_tissues = []
for tissue in tissue_locations:
    if tissue not in updated_unique_tissues:
        updated_unique_tissues.append(tissue)

print(f"\n🏥 Updated tissue distribution:")
for tissue in updated_unique_tissues:
    count = tissue_locations.count(tissue)
    print(f"{tissue}: {count} proteins")

## 🧪 Lab Exercise 6: Sort and Rank Proteins

**Your task:** Create a sorted ranking of proteins by activity level.

**Biological insight:** Ranking helps identify the most catalytically efficient enzymes for potential therapeutic use.

In [None]:
# Create a combined list for sorting (protein name, activity pairs)
protein_activity_pairs = []
for i in range(len(protein_names)):
    pair = (protein_names[i], activity_levels[i], substrates[i], tissue_locations[i])
    protein_activity_pairs.append(pair)

# Sort by activity level (highest to lowest)
# We'll sort by the second element (index 1) which is activity
sorted_proteins = sorted(protein_activity_pairs, key=lambda x: x[1], reverse=True)

print("🏆 Protein Family Activity Ranking:")
print("=" * 60)
print(f"{'Rank':<6} {'Protein':<12} {'Activity':<10} {'Substrate':<10} {'Tissue'}")
print("-" * 60)

for rank, (name, activity, substrate, tissue) in enumerate(sorted_proteins, 1):
    print(f"{rank:<6} {name:<12} {activity:<10} {substrate:<10} {tissue}")

# Identify top performers
top_3_proteins = sorted_proteins[:3]
print(f"\n🥇 Top 3 Most Active Proteins:")
for i, (name, activity, substrate, tissue) in enumerate(top_3_proteins, 1):
    medals = ["🥇", "🥈", "🥉"]
    print(f"{medals[i-1]} {name}: {activity} units/min ({substrate})")

# Calculate performance tiers
high_performers = [protein for protein in sorted_proteins if protein[1] >= 90]
medium_performers = [protein for protein in sorted_proteins if 80 <= protein[1] < 90]
low_performers = [protein for protein in sorted_proteins if protein[1] < 80]

print(f"\n📊 Performance Tiers:")
print(f"High performers (≥90 units/min): {len(high_performers)} proteins")
print(f"Medium performers (80-89 units/min): {len(medium_performers)} proteins")
print(f"Low performers (<80 units/min): {len(low_performers)} proteins")

## 🤔 Biological Reflection

**Answer these questions by modifying the text below:**

1. **What patterns do you notice in the protein family's activity levels?**
   *Your analysis here...*

2. **Why might intestinal enzymes have different activities than liver enzymes?**
   *Your biological reasoning here...*

3. **How do Python lists help organize biological data compared to individual variables?**
   *Your coding insight here...*

4. **What would happen if you discovered an enzyme with 150 units/min activity?**
   *Your prediction here...*

## 🎯 Today's Key Insights

### Biological Concepts:
- Protein families and structure-function relationships
- Enzyme activity and catalytic efficiency
- Tissue-specific protein expression patterns
- Comparative enzyme analysis

### Programming Concepts:
- **Lists** organize related data like protein families
- **List indexing** accesses specific proteins by position
- **List methods** (.append(), .count(), .index()) manipulate collections
- **Loops** process entire protein families systematically
- **Statistical functions** (max(), min(), sum()) analyze biological data
- **Sorting** reveals patterns and rankings in datasets

### The Connection:
Just as biochemists group related proteins into families to study evolutionary relationships and functional patterns, programmers use lists to organize related data for systematic analysis!

---

## 📋 Before You Finish

1. **Save this notebook** with your completed solutions
2. **Ask Claude Code to review your work**: "Claude, please review my Day3_Protein_Lists.ipynb notebook"
3. **Connect concepts**: How do variables (Day 1), strings (Day 2), and lists (Day 3) work together?
4. **Preview tomorrow**: Day 4 explores enzyme functions as Python functions

**Tomorrow's mystery:** "How do enzymes accelerate specific reactions while ignoring others?"

*Outstanding work organizing life's molecular machinery! 🧬📊*