# Lab 2: Consumer Groups and Load Balancing

## üéØ Objectives
- Understand Consumer Groups and how they work
- Learn about partition assignment strategies
- Practice load balancing across multiple consumers
- Explore consumer group coordination
- Monitor consumer lag and performance

## üìã Prerequisites
- Lab 1 completed (Kafka basics)
- Kafka cluster running
- Understanding of partitions and topics

## üèóÔ∏è Architecture Overview
```
Stock Data Topic (3 Partitions)
         ‚Üì
    Consumer Group A
    ‚îú‚îÄ‚îÄ Consumer 1 (Partition 0)
    ‚îú‚îÄ‚îÄ Consumer 2 (Partition 1)
    ‚îî‚îÄ‚îÄ Consumer 3 (Partition 2)
    
    Consumer Group B
    ‚îú‚îÄ‚îÄ Consumer 1 (All Partitions)
    ‚îî‚îÄ‚îÄ Consumer 2 (Standby)
```


In [1]:
# Install and Import Dependencies
%pip install kafka-python pandas matplotlib seaborn

import json
import time
import random
import threading
from datetime import datetime
from kafka import KafkaProducer, KafkaConsumer
from kafka.errors import KafkaError
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List
from collections import defaultdict

print("‚úÖ Dependencies installed and imported successfully!")


Note: you may need to restart the kernel to use updated packages.
‚úÖ Dependencies installed and imported successfully!


In [2]:
# Kafka Configuration
KAFKA_BOOTSTRAP_SERVERS = 'localhost:9092'
TOPIC_NAME = 'stock-data'

# Consumer Group Names
ANALYTICS_GROUP = 'stock-analytics-group'
ALERTS_GROUP = 'stock-alerts-group'
STORAGE_GROUP = 'stock-storage-group'

print(f"üìä Configured for consumer groups:")
print(f"üîó Kafka Bootstrap Servers: {KAFKA_BOOTSTRAP_SERVERS}")
print(f"üìù Topic Name: {TOPIC_NAME}")
print(f"üë• Analytics Group: {ANALYTICS_GROUP}")
print(f"üë• Alerts Group: {ALERTS_GROUP}")
print(f"üë• Storage Group: {STORAGE_GROUP}")


üìä Configured for consumer groups:
üîó Kafka Bootstrap Servers: localhost:9092
üìù Topic Name: stock-data
üë• Analytics Group: stock-analytics-group
üë• Alerts Group: stock-alerts-group
üë• Storage Group: stock-storage-group


In [3]:
# Stock Data Generator Class
class StockDataGenerator:
    def __init__(self, bootstrap_servers: str, topic: str):
        self.producer = KafkaProducer(
            bootstrap_servers=bootstrap_servers,
            value_serializer=lambda v: json.dumps(v).encode('utf-8'),
            key_serializer=lambda k: k.encode('utf-8') if k else None
        )
        self.topic = topic
        self.symbols = ['AAPL', 'GOOGL', 'MSFT', 'TSLA', 'AMZN', 'META', 'NVDA', 'NFLX', 'ADBE', 'CRM']
        self.base_prices = {
            'AAPL': 150.0, 'GOOGL': 2800.0, 'MSFT': 350.0, 'TSLA': 250.0, 'AMZN': 3200.0,
            'META': 300.0, 'NVDA': 450.0, 'NFLX': 400.0, 'ADBE': 500.0, 'CRM': 200.0
        }
    
    def generate_ohlcv_data(self, symbol: str, base_price: float) -> dict:
        """Generate realistic OHLCV data"""
        price_change = random.uniform(-0.02, 0.02)
        new_price = base_price * (1 + price_change)
        
        open_price = round(new_price * random.uniform(0.998, 1.002), 2)
        close_price = round(new_price * random.uniform(0.998, 1.002), 2)
        high_price = round(max(open_price, close_price) * random.uniform(1.001, 1.005), 2)
        low_price = round(min(open_price, close_price) * random.uniform(0.995, 0.999), 2)
        
        volume = random.randint(100000, 1000000)
        
        return {
            "symbol": symbol,
            "timestamp": datetime.now().isoformat() + "Z",
            "open": open_price,
            "high": high_price,
            "low": low_price,
            "close": close_price,
            "volume": volume,
            "exchange": "NASDAQ"
        }
    
    def send_stock_data(self, num_messages: int = 10):
        """Send stock data to Kafka topic"""
        print(f"üìà Generating {num_messages} stock data messages...")
        
        for i in range(num_messages):
            symbol = random.choice(self.symbols)
            base_price = self.base_prices[symbol]
            ohlcv_data = self.generate_ohlcv_data(symbol, base_price)
            
            # Use symbol as key for partitioning
            future = self.producer.send(self.topic, key=symbol, value=ohlcv_data)
            record_metadata = future.get(timeout=10)
            
            print(f"üìä Sent {symbol}: ${ohlcv_data['close']} -> Partition {record_metadata.partition}")
            time.sleep(0.1)  # Small delay between messages
        
        self.producer.flush()
        print(f"‚úÖ Successfully sent {num_messages} messages to topic '{self.topic}'")

# Initialize data generator
data_generator = StockDataGenerator(KAFKA_BOOTSTRAP_SERVERS, TOPIC_NAME)
print("‚úÖ Stock data generator initialized!")


‚úÖ Stock data generator initialized!


## Exercise 1: Understanding Consumer Groups

### üéØ **Learning Objectives:**
- Understand how Consumer Groups work
- Learn about partition assignment strategies
- Observe load balancing in action
- Monitor consumer group behavior

### üìö **Key Concepts:**
1. **Consumer Group**: A collection of consumers that work together to consume messages from topics
2. **Partition Assignment**: Each partition is assigned to only one consumer in a group
3. **Load Balancing**: Messages are distributed across consumers in the group
4. **Group Coordination**: Kafka coordinates partition assignments automatically


In [4]:
# Exercise 1: Create Multiple Consumers in Same Group
print("üîß Creating multiple consumers in the same group...")

def create_consumer(group_id: str, consumer_id: str):
    """Create a consumer with specific group and ID"""
    return KafkaConsumer(
        TOPIC_NAME,
        bootstrap_servers=KAFKA_BOOTSTRAP_SERVERS,
        group_id=group_id,
        client_id=consumer_id,
        auto_offset_reset='earliest',
        enable_auto_commit=True,
        value_deserializer=lambda x: json.loads(x.decode('utf-8')),
        consumer_timeout_ms=5000  # Timeout after 5 seconds if no messages
    )

# Create consumers for analytics group
analytics_consumer_1 = create_consumer(ANALYTICS_GROUP, 'analytics-consumer-1')
analytics_consumer_2 = create_consumer(ANALYTICS_GROUP, 'analytics-consumer-2')

print("‚úÖ Created 2 consumers in analytics group")
print(f"üë• Group: {ANALYTICS_GROUP}")
print(f"üîß Consumer 1: analytics-consumer-1")
print(f"üîß Consumer 2: analytics-consumer-2")


üîß Creating multiple consumers in the same group...
‚úÖ Created 2 consumers in analytics group
üë• Group: stock-analytics-group
üîß Consumer 1: analytics-consumer-1
üîß Consumer 2: analytics-consumer-2


In [5]:
# Exercise 2: Generate Test Data
print("üìà Generating test stock data...")

# Send some test data
data_generator.send_stock_data(20)

print("‚úÖ Test data generated!")
print("üí° Now let's see how consumers handle the messages...")


üìà Generating test stock data...
üìà Generating 20 stock data messages...
üìä Sent TSLA: $252.81 -> Partition 0
üìä Sent NVDA: $451.89 -> Partition 0
üìä Sent ADBE: $506.71 -> Partition 0
üìä Sent GOOGL: $2808.73 -> Partition 0
üìä Sent NVDA: $442.19 -> Partition 0
üìä Sent ADBE: $494.64 -> Partition 0
üìä Sent NVDA: $453.14 -> Partition 0
üìä Sent ADBE: $498.2 -> Partition 0
üìä Sent AMZN: $3233.54 -> Partition 0
üìä Sent AMZN: $3148.17 -> Partition 0
üìä Sent CRM: $196.8 -> Partition 0
üìä Sent NVDA: $453.65 -> Partition 0
üìä Sent NFLX: $404.85 -> Partition 0
üìä Sent NVDA: $442.24 -> Partition 0
üìä Sent NFLX: $396.04 -> Partition 0
üìä Sent MSFT: $351.94 -> Partition 0
üìä Sent AMZN: $3226.96 -> Partition 0
üìä Sent AAPL: $152.56 -> Partition 0
üìä Sent AMZN: $3195.42 -> Partition 0
üìä Sent MSFT: $343.58 -> Partition 0
‚úÖ Successfully sent 20 messages to topic 'stock-data'
‚úÖ Test data generated!
üí° Now let's see how consumers handle the messages...


In [6]:
# Exercise 3: Consumer Group Load Balancing Demo
print("üîÑ Demonstrating consumer group load balancing...")

def consume_messages(consumer, consumer_name: str, max_messages: int = 10):
    """Consume messages and track partition assignments"""
    messages_received = 0
    partition_counts = {}
    
    print(f"üöÄ {consumer_name} starting to consume messages...")
    
    try:
        for message in consumer:
            if messages_received >= max_messages:
                break
                
            partition = message.partition
            partition_counts[partition] = partition_counts.get(partition, 0) + 1
            
            data = message.value
            print(f"üìä {consumer_name} received {data['symbol']}: ${data['close']} from partition {partition}")
            
            messages_received += 1
            
    except Exception as e:
        print(f"‚ö†Ô∏è {consumer_name} finished consuming: {e}")
    
    print(f"üìà {consumer_name} summary:")
    print(f"   Total messages: {messages_received}")
    print(f"   Partition distribution: {partition_counts}")
    
    return partition_counts

# Start consuming with both consumers
print("\nüîÑ Starting Consumer 1...")
partition_counts_1 = consume_messages(analytics_consumer_1, "Analytics Consumer 1", 10)

print("\nüîÑ Starting Consumer 2...")
partition_counts_2 = consume_messages(analytics_consumer_2, "Analytics Consumer 2", 10)

print("\nüìä Load Balancing Results:")
print(f"Consumer 1 partitions: {partition_counts_1}")
print(f"Consumer 2 partitions: {partition_counts_2}")


üîÑ Demonstrating consumer group load balancing...

üîÑ Starting Consumer 1...
üöÄ Analytics Consumer 1 starting to consume messages...
üìä Analytics Consumer 1 received AAPL: $147.29 from partition 0
üìä Analytics Consumer 1 received CRM: $200.24 from partition 0
üìä Analytics Consumer 1 received MSFT: $353.31 from partition 0
üìä Analytics Consumer 1 received AMZN: $3193.57 from partition 0
üìä Analytics Consumer 1 received META: $300.17 from partition 0
üìä Analytics Consumer 1 received TSLA: $249.58 from partition 0
üìä Analytics Consumer 1 received NVDA: $453.01 from partition 0
üìä Analytics Consumer 1 received NFLX: $401.11 from partition 0
üìä Analytics Consumer 1 received TSLA: $248.7 from partition 0
üìä Analytics Consumer 1 received NFLX: $401.48 from partition 0
üìà Analytics Consumer 1 summary:
   Total messages: 10
   Partition distribution: {0: 10}

üîÑ Starting Consumer 2...
üöÄ Analytics Consumer 2 starting to consume messages...
üìà Analytics Consumer

## Exercise 4: Multiple Consumer Groups

### üéØ **Learning Objectives:**
- Understand how different consumer groups work independently
- Learn about message replication across groups
- Observe group coordination and rebalancing

### üìö **Key Concepts:**
1. **Independent Groups**: Each consumer group processes all messages independently
2. **Message Replication**: Same message can be consumed by multiple groups
3. **Group Isolation**: Groups don't interfere with each other's processing


In [7]:
# Exercise 4: Create Multiple Consumer Groups
print("üîß Creating consumers for different groups...")

# Create consumers for different groups
alerts_consumer = create_consumer(ALERTS_GROUP, 'alerts-consumer')
storage_consumer = create_consumer(STORAGE_GROUP, 'storage-consumer')

print("‚úÖ Created consumers for different groups:")
print(f"üö® Alerts Group: {ALERTS_GROUP}")
print(f"üíæ Storage Group: {STORAGE_GROUP}")

# Generate more test data
print("\nüìà Generating more test data...")
data_generator.send_stock_data(15)


üîß Creating consumers for different groups...
‚úÖ Created consumers for different groups:
üö® Alerts Group: stock-alerts-group
üíæ Storage Group: stock-storage-group

üìà Generating more test data...
üìà Generating 15 stock data messages...
üìä Sent AMZN: $3248.6 -> Partition 0
üìä Sent AMZN: $3189.68 -> Partition 0
üìä Sent TSLA: $248.63 -> Partition 0
üìä Sent ADBE: $498.46 -> Partition 0
üìä Sent GOOGL: $2749.07 -> Partition 0
üìä Sent NVDA: $451.65 -> Partition 0
üìä Sent NFLX: $407.11 -> Partition 0
üìä Sent NVDA: $456.85 -> Partition 0
üìä Sent AMZN: $3150.55 -> Partition 0
üìä Sent ADBE: $503.86 -> Partition 0
üìä Sent TSLA: $254.41 -> Partition 0
üìä Sent NFLX: $407.73 -> Partition 0
üìä Sent AAPL: $151.42 -> Partition 0
üìä Sent CRM: $195.71 -> Partition 0
üìä Sent NVDA: $455.48 -> Partition 0
‚úÖ Successfully sent 15 messages to topic 'stock-data'


In [8]:
# Exercise 5: Demonstrate Independent Group Processing
print("üîÑ Demonstrating independent group processing...")

def process_alerts(message_data):
    """Process stock data for alerts"""
    symbol = message_data['symbol']
    close_price = message_data['close']
    
    # Simple alert logic: alert if price change is significant
    if close_price > 200:  # High price alert
        return f"üö® HIGH PRICE ALERT: {symbol} at ${close_price}"
    return None

def process_storage(message_data):
    """Process stock data for storage"""
    symbol = message_data['symbol']
    timestamp = message_data['timestamp']
    
    # Simulate storing to database
    return f"üíæ STORED: {symbol} at {timestamp}"

# Process messages with different groups
print("\nüö® Alerts Group Processing:")
alerts_processed = 0
for message in alerts_consumer:
    if alerts_processed >= 5:
        break
    
    data = message.value
    alert = process_alerts(data)
    if alert:
        print(f"   {alert}")
    else:
        print(f"   üìä {data['symbol']}: ${data['close']} - No alert")
    
    alerts_processed += 1

print("\nüíæ Storage Group Processing:")
storage_processed = 0
for message in storage_consumer:
    if storage_processed >= 5:
        break
    
    data = message.value
    storage_result = process_storage(data)
    print(f"   {storage_result}")
    
    storage_processed += 1

print("\n‚úÖ Both groups processed messages independently!")


üîÑ Demonstrating independent group processing...

üö® Alerts Group Processing:
   üìä AAPL: $147.29 - No alert
   üö® HIGH PRICE ALERT: CRM at $200.24
   üö® HIGH PRICE ALERT: MSFT at $353.31
   üö® HIGH PRICE ALERT: AMZN at $3193.57
   üö® HIGH PRICE ALERT: META at $300.17

üíæ Storage Group Processing:
   üíæ STORED: AAPL at 2025-09-21T15:56:47.006979Z
   üíæ STORED: CRM at 2025-09-21T16:04:43.742567Z
   üíæ STORED: MSFT at 2025-09-21T16:04:43.977061Z
   üíæ STORED: AMZN at 2025-09-21T16:04:44.094246Z
   üíæ STORED: META at 2025-09-21T16:04:44.215153Z

‚úÖ Both groups processed messages independently!


## Exercise 6: Consumer Group Monitoring

### üéØ **Learning Objectives:**
- Learn how to monitor consumer groups
- Understand consumer lag and performance metrics
- Practice troubleshooting consumer issues

### üìö **Key Concepts:**
1. **Consumer Lag**: Difference between producer and consumer offsets
2. **Group Coordination**: How Kafka manages group membership
3. **Rebalancing**: Automatic redistribution of partitions when consumers join/leave


In [9]:
# Exercise 6: Monitor Consumer Groups
print("üìä Monitoring consumer groups...")

from kafka.admin import KafkaAdminClient, ConfigResource, ConfigResourceType

def get_consumer_group_info():
    """Get information about consumer groups"""
    try:
        admin_client = KafkaAdminClient(
            bootstrap_servers=KAFKA_BOOTSTRAP_SERVERS,
            client_id='monitor-client'
        )
        
        # List all consumer groups
        groups = admin_client.list_consumer_groups()
        print("üìã Available Consumer Groups:")
        for group in groups:
            print(f"   üë• Group: {group[0]}, Type: {group[1]}")
        
        return groups
        
    except Exception as e:
        print(f"‚ùå Error getting consumer group info: {e}")
        return []

def monitor_group_offsets(group_id: str):
    """Monitor offsets for a specific consumer group"""
    try:
        consumer = KafkaConsumer(
            bootstrap_servers=KAFKA_BOOTSTRAP_SERVERS,
            group_id=group_id,
            enable_auto_commit=False
        )
        
        # Get partition info for the topic
        partitions = consumer.partitions_for_topic(TOPIC_NAME)
        print(f"\nüìä Partition info for topic '{TOPIC_NAME}':")
        print(f"   Partitions: {partitions}")
        
        # Get committed offsets
        if partitions:
            from kafka import TopicPartition
            topic_partitions = [TopicPartition(TOPIC_NAME, p) for p in partitions]
            committed_offsets = consumer.committed(*topic_partitions)
            
            print(f"\nüìà Committed offsets for group '{group_id}':")
            if committed_offsets is not None:
                for tp, offset in zip(topic_partitions, committed_offsets):
                    if offset is not None:
                        print(f"   Partition {tp.partition}: {offset}")
                    else:
                        print(f"   Partition {tp.partition}: No committed offset")
            else:
                print(f"   No committed offsets found for group '{group_id}'")
        
        consumer.close()
        
    except Exception as e:
        print(f"‚ùå Error monitoring offsets: {e}")

# Monitor all our consumer groups
print("üîç Monitoring consumer groups...")
groups = get_consumer_group_info()

for group_id in [ANALYTICS_GROUP, ALERTS_GROUP, STORAGE_GROUP]:
    print(f"\nüìä Monitoring group: {group_id}")
    monitor_group_offsets(group_id)

print("\n‚úÖ Consumer group monitoring completed!")


üìä Monitoring consumer groups...
üîç Monitoring consumer groups...
üìã Available Consumer Groups:
   üë• Group: stock-alerts-group, Type: consumer
   üë• Group: stock-analytics-group, Type: consumer
   üë• Group: stock-storage-group, Type: consumer

üìä Monitoring group: stock-analytics-group

üìä Partition info for topic 'stock-data':
   Partitions: {0}

üìà Committed offsets for group 'stock-analytics-group':
   No committed offsets found for group 'stock-analytics-group'

üìä Monitoring group: stock-alerts-group

üìä Partition info for topic 'stock-data':
   Partitions: {0}

üìà Committed offsets for group 'stock-alerts-group':
   No committed offsets found for group 'stock-alerts-group'

üìä Monitoring group: stock-storage-group

üìä Partition info for topic 'stock-data':
   Partitions: {0}

üìà Committed offsets for group 'stock-storage-group':
   No committed offsets found for group 'stock-storage-group'

‚úÖ Consumer group monitoring completed!


## Exercise 7: Advanced Consumer Group Scenarios

### üéØ **Learning Objectives:**
- Practice consumer group rebalancing
- Understand partition assignment strategies
- Learn about consumer group coordination

### üìö **Key Concepts:**
1. **Rebalancing**: Automatic redistribution when consumers join/leave
2. **Assignment Strategies**: Range, Round Robin, Sticky
3. **Group Coordination**: How Kafka manages group membership


In [10]:
# Exercise 7: Consumer Group Rebalancing Demo
print("üîÑ Demonstrating consumer group rebalancing...")

def create_consumer_with_assignment_strategy(group_id: str, consumer_id: str, strategy: str = 'range'):
    """Create consumer with specific assignment strategy"""
    from kafka.coordinator.assignors.range import RangePartitionAssignor
    from kafka.coordinator.assignors.roundrobin import RoundRobinPartitionAssignor
    from kafka.coordinator.assignors.sticky import StickyPartitionAssignor
    
    # Map strategy names to actual assignor classes
    assignor_map = {
        'range': RangePartitionAssignor,
        'roundrobin': RoundRobinPartitionAssignor,
        'sticky': StickyPartitionAssignor
    }
    
    assignor_class = assignor_map.get(strategy, RangePartitionAssignor)
    
    return KafkaConsumer(
        TOPIC_NAME,
        bootstrap_servers=KAFKA_BOOTSTRAP_SERVERS,
        group_id=group_id,
        client_id=consumer_id,
        auto_offset_reset='earliest',
        enable_auto_commit=True,
        value_deserializer=lambda x: json.loads(x.decode('utf-8')),
        consumer_timeout_ms=3000,
        # Assignment strategy - use class objects, not strings
        partition_assignment_strategy=[assignor_class()]
    )

# Create a new group for rebalancing demo
REBALANCE_GROUP = 'rebalance-demo-group'

print(f"üîß Creating consumers for rebalancing demo group: {REBALANCE_GROUP}")

# Start with 1 consumer
print("\nüìä Step 1: Starting with 1 consumer...")
consumer_1 = create_consumer_with_assignment_strategy(REBALANCE_GROUP, 'rebalance-consumer-1')

# Generate some data
data_generator.send_stock_data(10)

# Consume with 1 consumer
print("\nüîÑ Consumer 1 consuming messages...")
messages_1 = []
for message in consumer_1:
    if len(messages_1) >= 5:
        break
    messages_1.append(message)
    print(f"   Consumer 1: {message.value['symbol']} from partition {message.partition}")

print(f"\nüìà Consumer 1 processed {len(messages_1)} messages")

# Add second consumer (this will trigger rebalancing)
print("\nüìä Step 2: Adding second consumer (rebalancing)...")
consumer_2 = create_consumer_with_assignment_strategy(REBALANCE_GROUP, 'rebalance-consumer-2')

# Generate more data
data_generator.send_stock_data(10)

# Both consumers should now share the load
print("\nüîÑ Both consumers processing messages...")
messages_2 = []
for message in consumer_2:
    if len(messages_2) >= 5:
        break
    messages_2.append(message)
    print(f"   Consumer 2: {message.value['symbol']} from partition {message.partition}")

print(f"\nüìà Consumer 2 processed {len(messages_2)} messages")
print("\n‚úÖ Rebalancing demonstration completed!")

# Clean up consumers
consumer_1.close()
consumer_2.close()
print("üßπ Consumers closed successfully!")


üîÑ Demonstrating consumer group rebalancing...
üîß Creating consumers for rebalancing demo group: rebalance-demo-group

üìä Step 1: Starting with 1 consumer...


ImportError: cannot import name 'StickyPartitionAssignor' from 'kafka.coordinator.assignors.sticky' (/Users/trungtv/miniforge3/envs/datalab/lib/python3.10/site-packages/kafka/coordinator/assignors/sticky/__init__.py)

## Exercise 8: Performance Analysis and Visualization

### üéØ **Learning Objectives:**
- Analyze consumer group performance
- Visualize message distribution across partitions
- Understand throughput and latency patterns

### üìö **Key Concepts:**
1. **Throughput**: Messages processed per second
2. **Latency**: Time between message production and consumption
3. **Partition Distribution**: How messages are distributed across partitions


In [None]:
# Exercise 8: Performance Analysis and Visualization
print("üìä Analyzing consumer group performance...")

def analyze_performance():
    """Analyze performance of consumer groups"""
    
    # Create performance tracking consumer
    perf_consumer = KafkaConsumer(
        TOPIC_NAME,
        bootstrap_servers=KAFKA_BOOTSTRAP_SERVERS,
        group_id='performance-analysis-group',
        auto_offset_reset='earliest',
        enable_auto_commit=True,
        value_deserializer=lambda x: json.loads(x.decode('utf-8')),
        consumer_timeout_ms=5000
    )
    
    # Generate test data
    print("üìà Generating performance test data...")
    data_generator.send_stock_data(30)
    
    # Track performance metrics
    partition_counts = defaultdict(int)
    symbol_counts = defaultdict(int)
    processing_times = []
    
    print("\nüîÑ Analyzing message processing...")
    start_time = time.time()
    
    for message in perf_consumer:
        process_start = time.time()
        
        # Track partition distribution
        partition_counts[message.partition] += 1
        
        # Track symbol distribution
        symbol_counts[message.value['symbol']] += 1
        
        # Simulate processing time
        time.sleep(0.01)  # 10ms processing time
        
        process_end = time.time()
        processing_times.append(process_end - process_start)
        
        if len(processing_times) >= 20:  # Analyze first 20 messages
            break
    
    total_time = time.time() - start_time
    throughput = len(processing_times) / total_time if total_time > 0 else 0
    avg_latency = sum(processing_times) / len(processing_times) if processing_times else 0
    
    print(f"\nüìä Performance Analysis Results:")
    print(f"   Total messages processed: {len(processing_times)}")
    print(f"   Total time: {total_time:.2f} seconds")
    print(f"   Throughput: {throughput:.2f} messages/second")
    print(f"   Average processing latency: {avg_latency*1000:.2f} ms")
    
    # Create visualizations
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('Consumer Group Performance Analysis', fontsize=16)
    
    # Partition distribution
    axes[0, 0].bar(partition_counts.keys(), partition_counts.values())
    axes[0, 0].set_title('Message Distribution by Partition')
    axes[0, 0].set_xlabel('Partition')
    axes[0, 0].set_ylabel('Message Count')
    
    # Symbol distribution
    axes[0, 1].bar(symbol_counts.keys(), symbol_counts.values())
    axes[0, 1].set_title('Message Distribution by Stock Symbol')
    axes[0, 1].set_xlabel('Stock Symbol')
    axes[0, 1].set_ylabel('Message Count')
    axes[0, 1].tick_params(axis='x', rotation=45)
    
    # Processing time distribution
    axes[1, 0].hist(processing_times, bins=10, alpha=0.7)
    axes[1, 0].set_title('Processing Time Distribution')
    axes[1, 0].set_xlabel('Processing Time (seconds)')
    axes[1, 0].set_ylabel('Frequency')
    
    # Throughput over time
    throughput_over_time = [1/t for t in processing_times if t > 0]
    axes[1, 1].plot(throughput_over_time)
    axes[1, 1].set_title('Throughput Over Time')
    axes[1, 1].set_xlabel('Message Number')
    axes[1, 1].set_ylabel('Messages/Second')
    
    plt.tight_layout()
    plt.show()
    
    perf_consumer.close()
    
    return {
        'partition_counts': dict(partition_counts),
        'symbol_counts': dict(symbol_counts),
        'throughput': throughput,
        'avg_latency': avg_latency
    }

# Run performance analysis
performance_results = analyze_performance()


## Exercise 9: Cleanup and Best Practices

### üéØ **Learning Objectives:**
- Learn proper cleanup procedures
- Understand best practices for consumer groups
- Review key takeaways from the lab

### üìö **Best Practices:**
1. **Always close consumers** to free resources
2. **Use appropriate group IDs** for different use cases
3. **Monitor consumer lag** in production
4. **Handle rebalancing** gracefully in applications


In [None]:
# Exercise 9: Cleanup and Best Practices
print("üßπ Cleaning up resources...")

def cleanup_consumers():
    """Properly close all consumers"""
    consumers_to_close = [
        analytics_consumer_1,
        analytics_consumer_2,
        alerts_consumer,
        storage_consumer
    ]
    
    for consumer in consumers_to_close:
        try:
            consumer.close()
            print(f"‚úÖ Closed consumer: {consumer.config['client_id']}")
        except Exception as e:
            print(f"‚ö†Ô∏è Error closing consumer: {e}")
    
    print("\n‚úÖ All consumers closed successfully!")

def cleanup_producer():
    """Properly close producer"""
    try:
        data_generator.producer.close()
        print("‚úÖ Producer closed successfully!")
    except Exception as e:
        print(f"‚ö†Ô∏è Error closing producer: {e}")

# Cleanup resources
cleanup_consumers()
cleanup_producer()

print("\nüìö Lab Summary:")
print("‚úÖ Learned about Consumer Groups and Load Balancing")
print("‚úÖ Practiced Multiple Consumer Groups")
print("‚úÖ Demonstrated Consumer Group Rebalancing")
print("‚úÖ Analyzed Performance and Created Visualizations")
print("‚úÖ Applied Best Practices for Resource Management")

print("\nüéØ Key Takeaways:")
print("1. Consumer Groups enable horizontal scaling of message processing")
print("2. Each partition is consumed by only one consumer in a group")
print("3. Multiple groups can process the same messages independently")
print("4. Kafka automatically handles partition assignment and rebalancing")
print("5. Monitoring consumer lag is crucial for production systems")

print("\nüöÄ Next Steps:")
print("- Try Lab 3: Partitioning Strategies")
print("- Experiment with different assignment strategies")
print("- Practice with real-world data scenarios")
