# 中国PM2.5浓度空间分布估算  
## 背景介绍  

中国作为世界上最大的发展中国家，伴随着工业化和城市化的不断推进，空气质量问题日益严重。生态环境部发布的《2017 中国生态环境状况公报》指出，全国 338 个地级及以上城市中 239 个城市的环境空气质量超标，占比超过 70%。空气质量问题已严重影响人们的日常出行与身体健康，制约着经济的可持续发展，成为了公众及政府部门的关注热点。  

PM2.5是指在空气动力学领域中直径不大于2.5微米的可吸入颗粒物，是空气质量评价的主要指标之一。全面掌握PM2.5浓度的空间分布规律，表征大气污染的空间过程和环境行为，对于支撑大气污染监测预警与综合治理、保护人类健康与社会可持续发展，具有重大的现实意义和指导价值。截至2017 年底，中国环境监测总站已建成超过 400 个地面空气质量监测站点，并对外发布包括PM2.5在内的每小时空气质量监测数据，提供了高精度、高可靠的实时监测结果。然而，由于地面监测站点空间分布不均、覆盖程度不高，现有研究难以对其监测数据进行有效地时空分析与深度挖掘。与地面监测不同，基于卫星的遥感观测可获取高覆盖的大气环境空间数据集，例如大气气溶胶光学厚度（Aerosol Optical Depth，AOD）数据。大量研究表明，AOD 与 PM2.5浓度具有较强的相关性。研究PM2.5浓度与基于遥感反演的AOD等相关因子之间的空间回归关系，能为获得整个研究区域的PM2.5浓度分布提供有效解决方案。  

基于GWR的地理加权思想，吴森森将OLR 和神经网络模型结合提出了一种地理神经网络加权回归（Geographically Neural Network Weight-ed Regression，GNNWR）模型。该模型通过利用神经网络的学习能力，能够处理回归关系的空间异质性和复杂非线性特征，比OLR、GWR等模型具有更好的拟合精度和更优的预测性能。本案例旨在建立一种基于GNNWR的PM2.5浓度空间估算模型，实现PM2.5回归关系中空间异质与非线性特征的精准拟合，进而获得中国高精度、高合理性的 PM2.5浓度空间分布。  

## 数据说明  

许多研究表明融合气象条件、地表高程等因子能进一步提高 PM2.5 空间估算精度。本案例在选取AOD数据作为辅助因子的基础上，进一步增加了温度（TEMP）、降水量（TP）、风速（WS）、风向（WD）等气象因子以及地表高程（DEM）因子作为模型的自变量输入，研究时间尺度为 2017 年平均，具体内容如下：  

（1）PM2.5 监测站点数据。2017 年 1 月 1 日至2017 年 12 月 31 日的每小时 PM2.5浓度观测值来自中国环境监测总站。PM2.5浓度采用锥形元件振荡微量天平或β衰减法测量，校准和质量控制符合国家标准 GB3095-2012。PM2.5数据按照年尺度进行平均。  

（2）气溶胶数据。气溶胶（AOD）数据来自LAADS网站，包括Terra 和 Aqua 两种采用暗像元法反演的 3 km 分辨率气溶胶数据产品（MOD04_3K 和 MYD04_3k），以及采用深蓝算法反演的 10 km 分 辨 率 气 溶 胶 数 据 产 品（MOD04_L2 和MYD04_L2）。在文章中，3 km分辨率AOD产品是PM2.5估算的主要数据来源。当3 km分辨率数据缺失时，则尽可能采用 10 km分辨率数据进行重采样替代。为保证 AOD 数据的可靠性，本文将一年中AOD数值缺失天数超过20%的区域进行剔除，即采用无值表示。  

（3）DEM数据。DEM数据来自NOAA的ETO-PO1全球地表高程模型，分辨率为1弧分。  

（4）温度、降水量、风速、风向数据。来自于EC-MWF全球气候再分析模式ERA5的数据产品，提供0.5度分辨率的小时级格网数据。  

## 模型介绍  

基于类似于 GWR 的地理加权思想，GNNWR模型认为回归关系的空间差异性可视为空间非平稳性在不同位置对“OLR 回归关系”的波动水平变化。因此，在本案例PM2.5浓度空间估算实验中，GNNWR模型结构定义如下：  
 
![Image Name](https://mydde.deep-time.org/s3/static-files/upload/upload/1694059648746_1.png)  

式中：（ui，vi）是第i个样本点的空间坐标，β =（β0，β1，… ，β6）是OLR模型的回归系数，反映了整个区域PM2.5回归关系的平均水平。OLR系数的估计矩阵表示如下：  

![Image Name](https://mydde.deep-time.org/s3/static-files/upload/upload/1694059665465_2.png)  

其中：  

![Image Name](https://mydde.deep-time.org/s3/static-files/upload/upload/1694059673642_3.png)  


<br/><br/>  


![Image Name](https://mydde.deep-time.org/s3/static-files/upload/upload/1694003342595_1.png)  
基于GNNWR的PM2.5浓度空间估算模型定义

## 简要步骤  
1. 导入数据  
2. 初始化数据集  
3. 初始化GNNWR模型  
4. 模型训练  
5. 数据估算

# Step 1：导入必要的库

In [1]:
from gnnwr import models,datasets
import pandas as pd
import numpy as np
import folium
import torch.nn as nn
from sklearn.metrics import r2_score as r2
import matplotlib.pyplot as plt

# Step 2：导入数据

In [9]:
data = pd.read_csv(u'../data/pm25_data.csv')
data.head(5)

Unnamed: 0,监测点编码,监测点名称,城市,经度,纬度,date,PM2_5,row_index,col_index,proj_x,...,t2m,sp,tp,blh,e,r,u10,v10,aod_sat,ndvi
0,1001A,万寿西宫,北京,116.366,39.8673,20170601,54.733894,2201,6867,1650847.552,...,284.561066,100809.2734,0.001006,134.995636,-7e-06,46.315975,0.425366,0.170262,0.870967,2401
1,1002A,定陵,北京,116.17,40.2865,20170601,48.080737,2134,6835,1625003.973,...,282.907684,97125.08594,0.001044,157.77597,-6e-06,53.605503,0.211734,-0.676848,0.71208,5255
2,1003A,东四,北京,116.434,39.9522,20170601,54.898592,2188,6877,1653776.71,...,284.492249,100830.9688,0.001002,129.971298,-7e-06,45.537464,0.266666,0.069172,0.875811,2609
3,1004A,天坛,北京,116.434,39.8745,20170601,52.266382,2200,6877,1655828.045,...,284.6362,100936.8047,0.00101,138.793961,-7e-06,45.387913,0.299403,0.22795,0.869679,2420
4,1005A,农展馆,北京,116.473,39.9716,20170601,53.189076,2185,6884,1656224.681,...,284.506561,100880.1797,0.001019,130.520599,-7e-06,44.790119,0.169121,0.079546,0.873232,3296


### 数据展示

In [7]:
lon_center,lat_center = data['经度'].mean(),data['纬度'].mean()
map = folium.Map(location=[lat_center,lon_center],zoom_start=4,tiles = "Stamen Terrain")
data.apply(lambda x:folium.Marker(location=[x['纬度'],x['经度']],popup=x['监测点名称']+'\n PM2.5: '+str(x['PM2_5'])).add_to(map),axis=1)
map

# Step 3：数据集划分

In [11]:
train_dataset, val_dataset, test_dataset = datasets.init_dataset(data=data,
                                                        test_ratio=0.15,
                                                        valid_ratio=0.15,
                                                        x_column=['dem', 'w10','d10','t2m','aod_sat','tp'],
                                                        y_column=['PM2_5'],
                                                        spatial_column=['经度','纬度'],
                                                        sample_seed=42,
                                                        batch_size=64)

x_min:[-5.0000000e+00  4.1591436e-02  3.9565850e-02  2.6959613e+02
  5.6254357e-02  3.8816700e-05];  x_max:[4.52000000e+03 3.20341086e+00 3.59605225e+02 2.97242950e+02
 1.06999075e+00 4.07377200e-03]
y_min:[3.85633803];  y_max:[133.8005618]


### 数据集展示

In [12]:
map_1 = folium.Map(location=[lat_center,lon_center],zoom_start=4,tiles = "Stamen Terrain")
train_dataset.dataframe.apply(lambda x:folium.Marker(location=[x['纬度'],x['经度']],icon=folium.Icon(color='red'),popup=x['监测点名称']+'\n PM2.5: '+str(x['PM2_5'])).add_to(map_1),axis=1)
val_dataset.dataframe.apply(lambda x:folium.Marker(location=[x['纬度'],x['经度']],icon=folium.Icon(color='green'),popup=x['监测点名称']+'\n PM2.5: '+str(x['PM2_5'])).add_to(map_1),axis=1)
test_dataset.dataframe.apply(lambda x:folium.Marker(location=[x['纬度'],x['经度']],icon=folium.Icon(color='blue'),popup=x['监测点名称']+'\n PM2.5: '+str(x['PM2_5'])).add_to(map_1),axis=1)

map_1

# Step 4：初始化GNNWR模型

In [15]:
gnnwr = models.GNNWR(train_dataset = train_dataset,
                     valid_dataset = val_dataset, 
                     test_dataset = test_dataset,
                     dense_layers = [512, 256, 128],
                     start_lr = 0.2,
                     optimizer = "Adam",
                     activate_func = nn.PReLU(init=0.1),
                     model_name = "GNNWR_PM25",
                     model_save_path = "./demo_result/gnnwr_models",
                     log_path = "./demo_result//gnnwr_logs",
                     write_path = "./demo_result/gnnwr_tensorboard/"
                     )

# Step 5：模型训练

In [None]:
gnnwr.run(max_epoch = 20000,early_stop = 5000,print_frequency = 1000) # 可以设置最大迭代次数、早停标准和训练过程输出频率

# Step 6：查询与保存训练结果

In [None]:
gnnwr.result()

In [None]:
gnnwr.reg_result('./demo_result/GNNWR_PM25_Result.csv')

### 查看权重分布热力图

In [None]:
result_data = pd.read_csv('./demo_result/GNNWR_PM25_Result.csv')
result_data['id'] = result_data['id'].astype(np.int64)
result_data.rename(columns={"PM2_5":"Pred_PM2_5"},inplace=True)
data = pd.concat([train_dataset.dataframe,val_dataset.dataframe,test_dataset.dataframe])
data.head(5)
data.set_index('id',inplace=True)
result_data.set_index('id',inplace=True)
result_data = result_data.join(data)
result_data.head(5)

# Step 7：保存数据集

In [None]:
train_dataset.save('./demo_result/gnnwr_datasets/train_dataset')
val_dataset.save('./demo_result/gnnwr_datasets/val_dataset')
test_dataset.save('./demo_result/gnnwr_datasets/test_dataset')

# Step 8：加载已有数据集与模型

In [2]:
train_dataset_loaded = datasets.load_dataset('./demo_result/gnnwr_datasets/train_dataset/')
val_dataset_loaded = datasets.load_dataset('./demo_result/gnnwr_datasets/val_dataset/')
test_dataset_loaded = datasets.load_dataset('./demo_result/gnnwr_datasets/test_dataset/')

In [3]:
gnnwr_loaded = models.GNNWR(train_dataset = train_dataset_loaded,
                     valid_dataset = val_dataset_loaded, 
                     test_dataset = test_dataset_loaded,
                     dense_layers = [512, 256, 128],
                     start_lr = 0.2,
                     optimizer = "Adam",
                     activate_func = nn.PReLU(init=0.1),
                     model_name = "GNNWR_PM25",
                     model_save_path = "./demo_result/gnnwr_models",
                     log_path = "./demo_result//gnnwr_logs",
                     write_path = "./demo_result/gnnwr_tensorboard/"
                     )

In [4]:
gnnwr_loaded.load_model('./demo_result/gnnwr_models/GNNWR_PM25.pkl')

In [None]:
gnnwr_loaded.result()

# Step 9：导入估算数据

In [8]:
pred_data = pd.read_csv('../data/pm25_predict_data.csv')
pred_data.head(5)

Unnamed: 0,监测点编码,监测点名称,城市,经度,纬度,date,PM2.5,row_index,col_index,proj_x,...,t2m,sp,tp,blh,e,r,u10,v10,aod_sat,ndvi
0,1001A,万寿西宫,北京,116.366,39.8673,20170930,56.357143,2201.0,6867.0,1650848.0,...,294.224304,100287.671875,5.1e-05,64.583054,-7e-06,52.682091,0.384257,0.784808,0.762762,3443
1,1002A,定陵,北京,116.17,40.2865,20170930,47.148148,2134.0,6835.0,1625004.0,...,292.293274,96752.507812,0.000304,40.62114,-7e-06,62.529091,-0.156175,-0.537717,0.574785,7810
2,1003A,东四,北京,116.434,39.9522,20170930,53.857143,2188.0,6877.0,1653777.0,...,294.010468,100307.703125,5.8e-05,60.242908,-7e-06,52.12664,0.093867,0.617515,0.796827,3328
3,1004A,天坛,北京,116.434,39.8745,20170930,46.333333,2200.0,6877.0,1655828.0,...,294.296631,100410.367188,4.7e-05,69.535637,-8e-06,51.301529,0.197439,0.893495,0.758839,4535
4,1005A,农展馆,北京,116.473,39.9716,20170930,52.203704,2185.0,6884.0,1656225.0,...,293.959381,100355.054688,5.9e-05,62.281456,-7e-06,51.071964,-0.060543,0.634863,0.760148,3901


# Step 10：构建估算数据集

In [13]:
pred_dataset = datasets.init_predict_dataset(data = pred_data,train_dataset = train_dataset,x_column=['dem', 'w10','d10','t2m','aod_sat','tp'],spatial_column=['经度','纬度'])

# Step 11：数据估算

In [14]:
res = gnnwr_loaded.predict(pred_dataset)
res.head(5)

Unnamed: 0,监测点编码,监测点名称,城市,经度,纬度,date,PM2.5,row_index,col_index,proj_x,...,sp,tp,blh,e,r,u10,v10,aod_sat,ndvi,pred_result
0,1001A,万寿西宫,北京,116.366,39.8673,20170930,56.357143,2201.0,6867.0,1650848.0,...,100287.671875,5.1e-05,64.583054,-7e-06,52.682091,0.384257,0.784808,0.762762,3443,49.208725
1,1002A,定陵,北京,116.17,40.2865,20170930,47.148148,2134.0,6835.0,1625004.0,...,96752.507812,0.000304,40.62114,-7e-06,62.529091,-0.156175,-0.537717,0.574785,7810,39.442886
2,1003A,东四,北京,116.434,39.9522,20170930,53.857143,2188.0,6877.0,1653777.0,...,100307.703125,5.8e-05,60.242908,-7e-06,52.12664,0.093867,0.617515,0.796827,3328,49.644527
3,1004A,天坛,北京,116.434,39.8745,20170930,46.333333,2200.0,6877.0,1655828.0,...,100410.367188,4.7e-05,69.535637,-8e-06,51.301529,0.197439,0.893495,0.758839,4535,48.927525
4,1005A,农展馆,北京,116.473,39.9716,20170930,52.203704,2185.0,6884.0,1656225.0,...,100355.054688,5.9e-05,62.281456,-7e-06,51.071964,-0.060543,0.634863,0.760148,3901,47.239021
