# 教育投资与经济增长分析

本notebook展示了如何使用我们的分析和可视化模块来研究教育投资与经济增长之间的关系。

In [5]:
# 导入必要的库
import sys
import os
import pandas as pd
import numpy as np
from dotenv import load_dotenv

# 添加项目根目录到Python路径
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)

# 导入自定义模块
from src.data_collection.eurostat_collector import EurostatCollector
from src.data_processing.data_processor import DataProcessor
from src.analysis.education_analyzer import EducationAnalyzer
from src.visualization.data_visualizer import DataVisualizer

# 加载环境变量
load_dotenv()

True

In [None]:
!pip install pandas numpy matplotlib seaborn plotly psycopg2-binary pymongo python-dotenv eurostat statsmodels


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## 1. 数据收集

首先，我们从Eurostat收集教育投资数据和经济指标数据。

In [6]:
# 初始化数据收集器
collector = EurostatCollector()

# 收集教育投资数据
education_data = collector.collect_education_data()

# 收集经济指标数据
economic_data = collector.collect_economic_data()

print("Education data shape:", education_data.shape)
print("Economic data shape:", economic_data.shape)

AttributeError: 'EurostatCollector' object has no attribute 'collect_education_data'

## 2. 数据处理

对收集到的数据进行清洗和预处理。

In [7]:
# 初始化数据处理器
processor = DataProcessor()

# 处理教育数据
education_data_cleaned = processor.clean_data(education_data)

# 处理经济数据
economic_data_cleaned = processor.clean_data(economic_data)

# 显示基本统计信息
print("\nEducation Data Summary:")
print(education_data_cleaned.describe())
print("\nEconomic Data Summary:")
print(economic_data_cleaned.describe())

NameError: name 'education_data' is not defined

## 3. 数据分析

使用我们的分析模块来研究教育投资与经济增长之间的关系。

In [8]:
# 初始化分析器
analyzer = EducationAnalyzer()

# 分析教育投资趋势
trend_results = analyzer.analyze_trends(
    education_data_cleaned,
    'education_investment',
    'year'
)

print("\nTrend Analysis Results:")
print("Average YoY Growth:", trend_results['yoy_growth_stats']['mean'])
print("CAGR by Country:")
print(trend_results['cagr_by_country'])

NameError: name 'education_data_cleaned' is not defined

In [9]:
# 分析教育投资对经济的影响
impact_results = analyzer.analyze_education_impact(
    education_data_cleaned,
    economic_data_cleaned
)

print("\nEducation Impact Analysis:")
print("\nSignificant Correlations:")
for pair, details in impact_results['correlation_analysis'].items():
    print(f"{pair}: {details['correlation']:.3f} ({details['strength']})")

print("\nRegression Analysis:")
print(f"R² Score: {impact_results['regression_analysis']['r2_score']:.3f}")
print("Coefficients:")
for var, coef in impact_results['regression_analysis']['coefficients'].items():
    print(f"{var}: {coef:.3f}")

NameError: name 'education_data_cleaned' is not defined

## 4. 数据可视化

使用我们的可视化模块来创建各种图表，展示分析结果。

In [10]:
# 初始化可视化器
visualizer = DataVisualizer()

# 创建时间序列图
time_series_fig = visualizer.create_time_series_plot(
    education_data_cleaned,
    'year',
    'education_investment',
    'country_code',
    'Education Investment Over Time'
)
time_series_fig.show()

NameError: name 'education_data_cleaned' is not defined

In [11]:
# 创建相关性热图
merged_data = pd.merge(
    education_data_cleaned,
    economic_data_cleaned,
    on=['country_code', 'year']
)

corr_matrix = merged_data[[
    'education_investment',
    'gdp_growth',
    'employment_rate'
]].corr()

heatmap_fig = visualizer.create_correlation_heatmap(
    corr_matrix,
    'Correlation between Education and Economic Indicators'
)
heatmap_fig.show()

NameError: name 'education_data_cleaned' is not defined

In [12]:
# 创建散点图
scatter_fig = visualizer.create_scatter_plot(
    merged_data,
    'education_investment',
    'gdp_growth',
    'country_code',
    'Education Investment vs GDP Growth',
    add_trendline=True
)
scatter_fig.show()

NameError: name 'merged_data' is not defined

In [13]:
# 创建综合仪表板
dashboard = visualizer.create_education_dashboard(
    education_data_cleaned,
    economic_data_cleaned
)
dashboard.show()

NameError: name 'education_data_cleaned' is not defined

## 5. 结论

基于上述分析，我们可以得出以下结论：

1. **教育投资趋势**：
   - 观察各国教育投资的年度变化
   - 识别投资增长最快的国家

2. **教育与经济关系**：
   - 分析教育投资与GDP增长的相关性
   - 研究教育投资对就业率的影响

3. **政策建议**：
   - 基于数据分析结果提出建议
   - 确定最佳的教育投资策略