# Step 4: Kafka Broker/Topic Configuration

**Objective:**  
Set up the Kafka broker with required topic(s) and partitions.

**Instructions:**
- Create topic ("f1-speed-stream") with N partitions (for scaling demo)
- Validate: Topic exists, correct partitioning, can inspect message count


In [None]:
# Import required libraries
import sys
import os
import yaml

# Add project root to Python path
# In Jupyter, getcwd() typically returns the project root
# If not, navigate up from notebooks/ directory
current_dir = os.getcwd()
if os.path.basename(current_dir) == 'notebooks':
    # We're in notebooks/ directory, go up one level
    project_root = os.path.dirname(current_dir)
else:
    # We're already at project root
    project_root = current_dir

# Add to path if not already there
if project_root not in sys.path:
    sys.path.insert(0, project_root)

try:
    from kafka.admin import KafkaAdminClient, NewTopic
    from kafka.errors import TopicAlreadyExistsError
except ImportError:
    print("kafka-python not installed. Install with: pip install kafka-python")
    raise


✅ Imports successful


In [None]:
# Load configuration
config_path = "../config/config.yaml"
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

kafka_config = config['kafka']


Kafka Configuration:
  Bootstrap servers: localhost:9092
  Topic name: f1-speed-stream
  Partitions: 4
  Replication factor: 1


In [None]:
# Connect to Kafka admin client
admin_client = KafkaAdminClient(
    bootstrap_servers=kafka_config['bootstrap_servers'],
    client_id='f1_topic_setup'
)

Connecting to Kafka...
✅ Connected to Kafka


In [None]:
# Create topic
topic_name = kafka_config['topic_name']
partitions = kafka_config['partitions']
replication_factor = kafka_config['replication_factor']

topic = NewTopic(
    name=topic_name,
    num_partitions=partitions,
    replication_factor=replication_factor
)

try:
    fs = admin_client.create_topics([topic], validate_only=False)
    for topic_name_check, f in fs.items():
        try:
            f.result()  # Wait for topic to be created
        except TopicAlreadyExistsError:
            # Topic already exists; this is acceptable for setup
            pass
        except Exception as e:
            print(f"Error creating topic '{topic_name_check}': {e}")
except TopicAlreadyExistsError:
    # Topic already exists; this is acceptable for setup
    pass
except Exception as e:
    print(f"Error creating topic '{topic_name}': {e}")


Creating topic 'f1-speed-stream' with 4 partitions...
ℹ️  Topic 'f1-speed-stream' already exists (this is fine)


In [None]:
# Validate: List topics and check partitioning
try:
    # List all topics - list_topics() returns a list in newer kafka-python versions
    metadata = admin_client.list_topics()

    # Handle both list and ClusterMetadata object formats
    if isinstance(metadata, list):
        topics = metadata
    else:
        try:
            topics = list(metadata.topics.keys()) if hasattr(metadata, 'topics') else []
        except Exception:
            topics = []

    if topic_name in topics:
        try:
            from kafka import KafkaConsumer
            consumer = KafkaConsumer(bootstrap_servers=kafka_config['bootstrap_servers'])
            topic_partitions = consumer.partitions_for_topic(topic_name)
            consumer.close()
        except Exception:
            topic_partitions = None

        if topic_partitions:
            partition_count = len(topic_partitions)
        else:
            partition_count = None
    else:
        partition_count = None

except Exception:
    partition_count = None

# Close admin client
admin_client.close()



Validating topic creation...
Available topics: ['f1-speed-stream']
✅ Topic 'f1-speed-stream' exists
✅ Partition count: 4 (expected: 4)
✅ Partitioning correct

✅ Step 4 Complete: Kafka topic setup successful!
