# Education Data Collection and Processing

This notebook demonstrates the process of collecting and processing education data from various sources.

## Contents
1. Setup and Configuration
2. Data Collection from Eurostat
3. Data Processing and Cleaning
4. Database Storage
5. Initial Data Analysis

## 1. Setup and Configuration

First, we'll import necessary libraries and set up our environment.

In [None]:
import sys
import os
import pandas as pd
import numpy as np
import eurostat
import logging
from dotenv import load_dotenv

# Configure logging
logging.basicConfig(level=logging.INFO,
                   format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Load environment variables
load_dotenv()

# Add project root to Python path
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)

# Import project modules
from src.data_collection.eurostat_collector import EurostatCollector
from src.data_processing.data_processor import DataProcessor

## 2. Data Collection from Eurostat

We'll collect education data from Eurostat using their API.

In [None]:
# Initialize data collector
collector = EurostatCollector()

# Define indicators to collect
indicators = {
    'educ_uoe_fina01': 'Education finance data',
    'educ_uoe_perp01': 'Teaching staff data',
    'educ_uoe_enrt01': 'Student enrollment data'
}

# Collect data for each indicator
collected_data = {}
for code, description in indicators.items():
    logger.info(f"Collecting {description} (Code: {code})")
    data = collector.get_education_data(code)
    if data is not None:
        collected_data[code] = data
        print(f"\nSample of {description}:")
        display(data.head())
    else:
        logger.warning(f"Failed to collect data for {code}")