# **VaxMap Thailand: Analyzing Vaccination Coverage and Hospital Locations**

Welcome to the VaxMap Thailand project! My goal is to provide insights into the vaccination coverage among `children aged 1 year` across Thailand, leveraging geospatial data to map the distribution of hospitals. This project combines public health data with geospatial analytics to identify areas of high and low vaccine coverage and to visualize the accessibility of healthcare facilities across the country.

[The official dashboard for this data](https://hdcservice.moph.go.th/hdc/reports/report.php?cat_id=4df360514655f79f13901ef1181ca1c7&id=28dd2c7955ce926456240b2ff0100bde) was already done by HDC Service.

# Objectives:
- Assess vaccination coverage for various vaccines among children aged 1 year, ensuring they meet the recommended guidelines.
- Map hospital locations relative to population centers, identifying areas with potential healthcare accessibility issues.
- Visualize data through an interactive dashboard, making the information accessible and understandable for public health officials and the general public alike.


# Data Sources
Data obtained from [Ministry of Public Health's Open Data](https://opendata.moph.go.th/) from the following sources:

- Vaccination Coverage: "ความครอบคลุมการได้รับวัคซีนแต่ละชนิดครบตามเกณฑ์ในเด็กอายุครบ 1 ปี (fully immunized)" [Vaccination Coverage](https://opendata.moph.go.th/th/services/summary-table/4df360514655f79f13901ef1181ca1c7/s_epi_complete/28dd2c7955ce926456240b2ff0100bde).

- Hospital Coordinates: Essential for mapping and analysis [Hospital GIS](https://opendata.moph.go.th/th/services/hospital-gis).

- Map Service: Utilized for geospatial visualization. [Map Service](https://opendata.moph.go.th/th/services/map).

In [32]:
import requests
import json
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import time
from datetime import datetime


In [4]:
# Set pandas to display all columns
pd.set_option('display.max_columns', None)

# ER diagram

<img src="DBMLdiagram5.png" width="800"/>

<details>
    <summary>Click to toggle visibility of DBML code</summary>
    Table gdf1 {
  id integer [pk]
  geometry varchar
  data integer
  zone varchar
  name varchar
  type integer
}

Table gdf2 {
  id integer [pk]
  geometry varchar
  data integer
  zone varchar
  name varchar
  type integer
}

Table gdf3 {
  id integer [pk]
  geometry varchar
  data integer
  zone varchar
  name varchar
  type integer
}

Table hospital {
  hospcode integer [pk]
  name varchar
  prigov varchar
  type varchar
  org varchar
  region integer
  provcode integer
  prov varchar
  distcode integer
  dist varchar
  subdistcode integer
  subdist varchar
}

Table all_province_data {
  id varchar [pk]
  hospcode integer
  areacode varchar
  date_com varchar
  b_year integer
  target integer
  result integer
  // Assume other relevant fields are included
}

// Relationships
Ref: hospital.region > gdf1.id // A hospital is linked to a health region (gdf1)
Ref: hospital.provcode > gdf2.id // A hospital is located within a province (gdf2)
Ref: hospital.distcode > gdf3.id // A hospital is located within a district (gdf3)
Ref: all_province_data.hospcode > hospital.hospcode // all_province_data entries are related to a hospital
</details>


# Data Description

Below is a description of the dataset used in the VaxMap Thailand project, detailing the structure and meaning of each column:

| Column Name | Column Type | Nullable | Comment |
|-------------|-------------|----------|---------|
| id          | varchar(32) | NO       | ลำดับรายงาน (Report ID) |
| hospcode    | varchar(5)  | NO       | รหัสหน่วยบริการ (Hospital Code) |
| areacode(villcode)    | varchar(8)  | NO       | รหัสพื้นที่ตามกระทรวงมหาดไทย (Area Code according to the Ministry of Interior) |
| date_com    | varchar(14) | YES      | วันที่ประมวลผล (Date Processed) |
| b_year      | varchar(4)  | NO       | ข้อมูลตามปีงบประมาณ (Budget/Fiscal Year of Data) |
| target      | int(11)     | YES      | จำนวนเด็กอายุครบ 1 ปี ที่อาศัยอยู่จริงในพื้นที่รับผิดชอบทั้งหมด ในงวดที่รายงาน (Total 1-year-old children living in the area for the reported period) |
| result      | int(11)     | YES      | จำนวนเด็กอายุครบ 1 ปี ในงวดที่รายงานที่ได้รับวัคซีนแต่ละชนิดครบตามเกณฑ์ (Number of 1-year-old children fully vaccinated as per guidelines for the reported period) |
| target10    | int(11)     | YES      | ... (similarly for other months, specifying the total target children for vaccination) |
| result10    | int(11)     | YES      | ... (similarly for other months, specifying the result of fully vaccinated children) |
| ...         | ...         | ...      | ... |
| target09    | int(11)     | YES      | จำนวนเด็กอายุครบ 1 ปี ที่อาศัยอยู่จริงในพื้นที่รับผิดชอบทั้งหมด ในงวดที่รายงาน เดือน พฤศจิกายน (Total 1-year-old children living in the area for November) |
| result09    | int(11)     | YES      | จำนวนเด็กอายุครบ 1 ปี ในงวดที่รายงานที่ได้รับวัคซีนแต่ละชนิดครบตามเกณฑ์ เดือน พฤศจิกายน (Number of 1-year-old children fully vaccinated as per guidelines for November) |



# [Hospital GIS](https://opendata.moph.go.th/th/services/hospital-gis) Web Service Documentation

## Parameter Description

| Attribute  | Attribute Type | Attribute Name                                         |
|------------|----------------|--------------------------------------------------------|
| `hoscode`  | String         | Health service facility code as announced by [THCC](http://thcc.or.th) |

## Response Description

The response includes detailed information about health service facilities, structured as follows:

| Attribute      | Attribute Type | Attribute Definition                                       |
|----------------|----------------|------------------------------------------------------------|
| `hoscode`      | String         | Health service facility code as announced by THCC          |
| `hosname`      | String         | Name of the health service facility                        |
| `hostype`      | String         | Type of health service facility                            |
| `bed`          | String         | Number of beds                                             |
| `dep`          | String         | Department affiliation                                     |
| `Level_service`| String         | Level of service                                           |
| `address`      | String         | Address                                                    |
| `moo`          | String         | Village number                                             |
| `subdistcode`  | String         | Sub-district code                                          |
| `distcode`     | String         | District code                                              |
| `provcode`     | String         | Province code                                              |
| `postcode`     | String         | Postal code                                                |
| `Geometry`     | String         | Geographical position                                      |


### Example Request

To request data for a specific health service facility identified by `hoscode`, use the following endpoint structure:

```url
https://opendata-service.moph.go.th/gis/v1/getgis/hoscode/00933



## `hoscode` code

### `health_office.csv`
provided by [Strategy and Planning Division](https://spd.moph.go.th/) from this [link](https://hcode.moph.go.th/dashboard/)

Select and store in `select_office.csv`
(Accessed on 10th FEB 2024)

In [27]:
# health_office_df = pd.read_csv("material/health_office.csv")

  health_office_df = pd.read_csv("material/health_office.csv")


In [28]:
# # Strip unwanted characters and ensure correct format
# health_office_df['รหัส 5 หลัก'] = health_office_df['รหัส 5 หลัก'].str.replace('="', '').str.replace('"', '')
# health_office_df['รหัส 5 หลัก'] = health_office_df['รหัส 5 หลัก'].apply(lambda x: f"{int(x):05d}")

# health_office_df['รหัส 9 หลัก'] = health_office_df['รหัส 9 หลัก'].str.replace('="', '').str.replace('"', '')
# health_office_df['รหัส 9 หลัก'] = health_office_df['รหัส 9 หลัก'].apply(lambda x: f"{int(x):09d}")

In [29]:
# # Extract numeric values from the "เขตบริการ" column and format them with leading zeros
# # NaN values are filled with '0' and then formatted to '00'
# health_office_df["เขตบริการ"] = health_office_df["เขตบริการ"].str.extract('(\d+)').fillna('0').apply(lambda x: '{:02}'.format(int(x[0])), axis=1)

In [26]:
# # Export the DataFrame to a CSV file
# health_office_df.to_excel('material/health_office.xlsx', index=False)

In [33]:
# # Rename selected columns
# health_office_df.rename(columns={
#     'รหัส 9 หลัก': 'code9',
#     'รหัส 5 หลัก': 'code5',
#     'ชื่อ': 'name',
#     'ประเภทองค์กร': 'prigov',
#     'ประเภทหน่วยบริการสุขภาพ': 'type',
#     'สังกัด': 'org',
#     'เขตบริการ': 'region',
#     'รหัสจังหวัด': 'provcode',
#     'จังหวัด': 'prov',
#     'รหัสอำเภอ': 'distcode',
#     'อำเภอ/เขต': 'dist',
#     'รหัสตำบล': 'subdistcode',
#     'ตำบล/แขวง': 'subdist',
#     'รหัสไปรษณีย์': 'postcode'
# }, inplace=True)

# # Handle NaN values for 'provcode', 'distcode', and 'subdistcode' before converting
# columns_to_fix = ['provcode', 'distcode', 'subdistcode']
# for col in columns_to_fix:
#     health_office_df[col] = health_office_df[col].fillna(0).astype(int).astype(str).replace('0', np.nan)

# # Select relevant columns for the final DataFrame
# selected_office_df = health_office_df[['code9', 'code5', 'name', 'prigov', 'type', 'org', 'region', 'provcode', 'prov', 'distcode', 'dist', 'subdistcode', 'subdist']]

# selected_office_df.head()

Unnamed: 0,code9,code5,name,prigov,type,org,region,provcode,prov,distcode,dist,subdistcode,subdist
0,2713000,27130,คลินิกเชียงใหม่,เอกชน,คลินิกเอกชน,เอกชน,7,45,ร้อยเอ็ด,4508,โพธิ์ชัย,450802,เชียงใหม่
1,2803800,28038,คลินิกแพทย์วุฑฒา,เอกชน,คลินิกเอกชน,เอกชน,5,72,สุพรรณบุรี,7201,เมืองสุพรรณบุรี,720117,สวนแตง
2,2819900,28199,คลินิกพินิจการพยาบาลและการผดุงครรภ์,เอกชน,คลินิกเอกชน,เอกชน,5,72,สุพรรณบุรี,7208,สามชุก,720804,หนองผักนาก
3,2818800,28188,คลินิกหมอพรณรงค์,เอกชน,คลินิกเอกชน,เอกชน,5,72,สุพรรณบุรี,7208,สามชุก,720803,สามชุก
4,2814000,28140,คลินิกแพทย์ละเอียด,เอกชน,คลินิกเอกชน,เอกชน,5,72,สุพรรณบุรี,7203,ด่านช้าง,720301,หนองมะค่าโมง


In [34]:
# selected_office_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33339 entries, 0 to 33338
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   code9        33339 non-null  object
 1   code5        33339 non-null  object
 2   name         33339 non-null  object
 3   prigov       33339 non-null  object
 4   type         33339 non-null  object
 5   org          33339 non-null  object
 6   region       33339 non-null  object
 7   provcode     33337 non-null  object
 8   prov         33337 non-null  object
 9   distcode     33337 non-null  object
 10  dist         33337 non-null  object
 11  subdistcode  33337 non-null  object
 12  subdist      33337 non-null  object
dtypes: object(13)
memory usage: 3.3+ MB


In [35]:
# # Export the DataFrame to a CSV file
# selected_office_df.to_csv('material/selected_office.csv', index=False)

### Get all `hoscode` from `hospcode` of report_data and store in `hospcode_list.csv`

In [11]:
# import requests
# import json

# # Base URL for the RESTful Web Service
# url = "https://opendata.moph.go.th/api/report_data"

# # The headers to indicate that the payload is in JSON format
# headers = {
#     "Content-Type": "application/json"
# }

# # Initialize a variable to store data for all provinces
# all_province_data = []

# # Loop through province codes 11 to 99
# for province_code in range(11, 100):
#     # The data you want to send in the POST request
#     data = {
#         "tableName": "s_epi_complete",
#         "year": "2567",  # Specify other years as needed
#         "province": str(province_code),
#         "type": "json"
#     }

#     # Making the POST request
#     response = requests.post(url, headers=headers, data=json.dumps(data))

#     # Checking if the request was successful
#     if response.status_code in [200, 201]:
#         # The request was successful; process and store the response data
#         print(f"Data retrieved successfully for province code {province_code}!")
#         all_province_data.append(response.json())
#     else:
#         # There was an error with the request
#         print(f"Failed to retrieve data for province code {province_code}. Status code: {response.status_code}")

#     # Consider adding a short delay if necessary to avoid overwhelming the server or hitting rate limits
#     time.sleep(0.1)

In [12]:
# all_province_data

In [13]:
# # Initialize a set to store unique hospcode values
# unique_hospcode_set = set()

# # Iterate through each response (assuming each response is a list of dictionaries)
# for response in all_province_data:
#     for hospital_data in response:
#         # Extract the hospcode and add it to the set
#         unique_hospcode_set.add(hospital_data['hospcode'])

# # Convert the set to a list if you need an ordered collection
# unique_hospcode_list = list(unique_hospcode_set)

# print(unique_hospcode_list)

In [14]:
# # Convert the set of unique hospcode values to a DataFrame for easy export
# unique_hospcode_df = pd.DataFrame(unique_hospcode_list, columns=['hospcode'])

# # Export the DataFrame to a CSV file
# unique_hospcode_df.to_csv('hospcode_list.csv', index=False)

In [15]:
# # Convert the set of unique hospcode values to a DataFrame for easy export
# unique_hospcode_df = pd.DataFrame(unique_hospcode_list, columns=['hospcode'])

# # Export the DataFrame to a Excel file
# unique_hospcode_df.to_excel('hospcode_list.xlsx', index=False)

### Create new `hospital_df` from `hospcode_list` left join with `selected_office`

In [36]:
# Load the hospcode list and selected office data from CSV files
hospcode_list_df = pd.read_csv('material/hospcode_list.csv', dtype={'hospcode': str})
selected_office_df = pd.read_csv('material/selected_office.csv', dtype={'code9': str,
                                                                        'code5': str,
                                                                        'region': str,
                                                                        'provcode': str,
                                                                        'distcode': str,
                                                                        'subdistcode': str})

# Perform a left join on 'hospcode' from hospcode_list_df and 'code5' from selected_office_df
hospital_df = pd.merge(hospcode_list_df, selected_office_df, left_on='hospcode', right_on='code5', how='left')
hospital_df.drop(columns=['hospcode'], inplace=True)

# Display the first few rows of the joined DataFrame
hospital_df.head()

Unnamed: 0,code9,code5,name,prigov,type,org,region,provcode,prov,distcode,dist,subdistcode,subdist
0,195800,1958,โรงพยาบาลส่งเสริมสุขภาพตำบลบ้านนา,รัฐบาล,ศูนย์บริการสาธารณสุข อปท.,องค์กรปกครองส่วนท้องถิ่น,6,21,ระยอง,2103,แกลง,210308,บ้านนา
1,562100,5621,โรงพยาบาลส่งเสริมสุขภาพตำบลบ้านโคกสว่าง ตำบลโค...,รัฐบาล,โรงพยาบาลส่งเสริมสุขภาพตำบล,กระทรวงสาธารณสุข,8,48,นครพนม,4802,ปลาปาก,480204,โคกสว่าง
2,1045000,10450,โรงพยายาบส่งเสริมสุขภาพตำบลบ้านโป่งกลางน้ำ,รัฐบาล,ศูนย์บริการสาธารณสุข อปท.,องค์กรปกครองส่วนท้องถิ่น,1,57,เชียงราย,5710,แม่สรวย,571006,วาวี
3,174800,1748,โรงพยาบาลส่งเสริมสุขภาพตำบลบ้านหลวง,รัฐบาล,โรงพยาบาลส่งเสริมสุขภาพตำบล,กระทรวงสาธารณสุข,4,19,สระบุรี,1907,ดอนพุด,190703,บ้านหลวง
4,161200,1612,โรงพยาบาลส่งเสริมสุขภาพตำบลบ้านธัญญอุดม,รัฐบาล,โรงพยาบาลส่งเสริมสุขภาพตำบล,กระทรวงสาธารณสุข,3,18,ชัยนาท,1801,เมืองชัยนาท,180106,หาดท่าเสา


In [37]:
# # Export the DataFrame to a CSV file
# hospital_df.to_csv('hospital.csv', index=False)

# [Map Service](https://opendata.moph.go.th/th/services/map) Data Response Structure

The Map Service provides GeoJSON data for various administrative divisions in Thailand, including health districts, provinces, and districts. Below is the description of the response attributes from the GeoJSON service endpoint.

| Attribute                     | Attribute Type | Attribute Definition |
|-------------------------------|----------------|----------------------|
| `features`                    | JSON Object    | A collection of data used to create maps. |
| `features.geometry`           | JSON Object    | The data set that forms the map shape file, including latitude and longitude coordinates. |
| `features.geometry.type`      | String         | The type of shape. |
| `features.geometry.coordinates` | String       | An array of latitude and longitude coordinates that form the shape of each area. |
| `features.type`               | String         | The type of feature. |
| `features.properties`         | JSON Object    | Additional data related to the feature. |
| `features.properties.name`    | String         | The name or details about the shape file. |
| `features.properties.id`      | String         | The ID code of the shape file. |
| `features.properties.type`    | String         | The type of the shape file. |
| `features.type`               | String         | The type of Features. |

## Data Endpoints

- **Health Districts (เขตสุขภาพ)**: `https://opendata-service.moph.go.th/gis/v1/geojson/1`
- **Provinces (จังหวัด)**: `https://opendata-service.moph.go.th/gis/v1/geojson/2`
- **Districts (อำเภอ)**: `https://opendata-service.moph.go.th/gis/v1/geojson/3`

## Fetch the GeoJSON Data

### Level 1: เขตสุขภาพ Region

In [50]:
import requests
import geopandas as gpd

# URL for the GeoJSON data
url = 'https://opendata-service.moph.go.th/gis/v1/geojson/1'

# Fetch the GeoJSON data
response = requests.get(url)
geojson = response.json()

# Load GeoJSON into a GeoDataFrame
gdf1 = gpd.GeoDataFrame.from_features(geojson['features'])

In [43]:
gdf1

Unnamed: 0,geometry,data,zone,name,id,type
0,"MULTIPOLYGON (((98.69195 17.77530, 98.68068 17...",80,,เขตสุขภาพที่ 01,1,1
1,"MULTIPOLYGON (((101.26396 17.08764, 101.27981 ...",80,,เขตสุขภาพที่ 02,2,1
2,"MULTIPOLYGON (((100.34397 15.10966, 100.29725 ...",80,,เขตสุขภาพที่ 03,3,1
3,"MULTIPOLYGON (((101.28452 14.49621, 101.28632 ...",80,,เขตสุขภาพที่ 04,4,1
4,"MULTIPOLYGON (((98.59070 15.64435, 98.62094 15...",80,,เขตสุขภาพที่ 05,5,1
5,"MULTIPOLYGON (((102.25305 12.28667, 102.24917 ...",80,,เขตสุขภาพที่ 06,6,1
6,"MULTIPOLYGON (((103.58606 17.09754, 103.62259 ...",80,,เขตสุขภาพที่ 07,7,1
7,"MULTIPOLYGON (((102.09970 18.21428, 102.11396 ...",80,,เขตสุขภาพที่ 08,8,1
8,"MULTIPOLYGON (((101.57202 16.72447, 101.58926 ...",80,,เขตสุขภาพที่ 09,9,1
9,"MULTIPOLYGON (((104.98066 16.27747, 105.01872 ...",80,,เขตสุขภาพที่ 10,10,1


In [78]:
# # Export GeoDataFrame to a GeoJSON file
# gdf1.to_file("gdf1.geojson", driver='GeoJSON')

### Level 2: จังหวัด Province

In [45]:
import requests
import geopandas as gpd

# URL for the GeoJSON data
url = 'https://opendata-service.moph.go.th/gis/v1/geojson/2'

# Fetch the GeoJSON data
response = requests.get(url)
geojson = response.json()

# Load GeoJSON into a GeoDataFrame
gdf2 = gpd.GeoDataFrame.from_features(geojson['features'])

In [46]:
gdf2

Unnamed: 0,geometry,data,zone,name,id,type
0,"POLYGON ((100.55902 13.91443, 100.57404 13.954...",80,,กรุงเทพมหานคร,10,2
1,"POLYGON ((100.59733 13.53899, 100.55342 13.504...",80,,สมุทรปราการ,11,2
2,"POLYGON ((100.55035 13.87748, 100.54376 13.849...",80,,นนทบุรี,12,2
3,"POLYGON ((100.46971 13.96625, 100.35569 14.000...",80,,ปทุมธานี,13,2
4,"POLYGON ((100.45491 14.11890, 100.34429 14.115...",80,,พระนครศรีอยุธยา,14,2
...,...,...,...,...,...,...
72,"MULTIPOLYGON (((99.49066 7.35269, 99.48354 7.2...",80,,ตรัง,92,2
73,"MULTIPOLYGON (((100.39970 7.26496, 100.38390 7...",80,,พัทลุง,93,2
74,"MULTIPOLYGON (((101.32271 6.92791, 101.30151 6...",80,,ปัตตานี,94,2
75,"POLYGON ((101.38779 6.33559, 101.40332 6.23924...",80,,ยะลา,95,2


In [79]:
# # Export GeoDataFrame to a GeoJSON file
# gdf2.to_file("gdf2.geojson", driver='GeoJSON')

### Level 3: อำเภอ District

In [55]:
import requests
import geopandas as gpd

# URL for the GeoJSON data
url = 'https://opendata-service.moph.go.th/gis/v1/geojson/3'

# Fetch the GeoJSON data
response = requests.get(url)
geojson = response.json()

# Ensure each feature has a 'geometry' key
for feature in geojson['features']:
    if 'geometry' not in feature:
        feature['geometry'] = None  # or an appropriate default value

# Now, attempt to load the GeoJSON into a GeoDataFrame
gdf3 = gpd.GeoDataFrame.from_features(geojson['features'])

# If the above doesn't resolve the KeyError, print out the problematic features
# This is for debugging purposes
for i, feature in enumerate(geojson['features']):
    if 'geometry' not in feature:
        print(f"Feature at index {i} is missing 'geometry': {feature}")

In [56]:
gdf3

Unnamed: 0,geometry,data,zone,name,id,type
0,"MULTIPOLYGON (((100.49910 13.74493, 100.49016 ...",80,,พระนคร,1001,3
1,"MULTIPOLYGON (((100.51851 13.80202, 100.53607 ...",80,,ดุสิต,1002,3
2,"MULTIPOLYGON (((100.91400 13.93371, 100.90740 ...",80,,หนองจอก,1003,3
3,"MULTIPOLYGON (((100.52193 13.72298, 100.51068 ...",80,,บางรัก,1004,3
4,"MULTIPOLYGON (((100.61126 13.88885, 100.65247 ...",80,,บางเขน,1005,3
...,...,...,...,...,...,...
987,"MULTIPOLYGON (((102.04683 6.14443, 102.06540 6...",80,,สุไหงโก-ลก,9610,3
988,"MULTIPOLYGON (((101.92505 6.22168, 102.02249 6...",80,,สุไหงปาดี,9611,3
989,"MULTIPOLYGON (((101.70986 6.18786, 101.71491 6...",80,,จะแนะ,9612,3
990,"MULTIPOLYGON (((101.85486 6.29823, 101.89596 6...",80,,เจาะไอร้อง,9613,3


In [80]:
# # Export GeoDataFrame to a GeoJSON file
# gdf3.to_file("gdf3.geojson", driver='GeoJSON')

# search_{}

In [39]:
import pandas as pd

# Load the data from the uploaded files with specific data types for the codes
search_hospital_df = pd.read_csv('search_{}/search_hospital/search_hospital.csv', dtype={'Login Code': str})
search_tambon_df = pd.read_csv('search_{}/search_tambon/search_tambon.csv', dtype={'Login Code': str})                    # Ensure 6 characters
search_village_df = pd.read_csv('search_{}/search_village/search_village.csv', dtype={'Login Code': str})                 # Ensure 8 characters
search_school_df = pd.read_csv('search_{}/search_school/search_school.csv', dtype={'Login Code': str, 'รหัส รพ.สต': str})  # Ensure 10 characters for Login Code

In [40]:
# Renaming columns to match the DBML format
search_hospital_df.rename(columns={'จังหวัด': 'prov', 'อำเภอ': 'dist', 'ตำบล': 'subdist', 'ชื่อหน่วยงาน/รพสต.': 'hospname', 'Login Code': 'code9'}, inplace=True)
search_tambon_df.rename(columns={'จังหวัด': 'prov', 'อำเภอ': 'dist', 'ตำบล': 'subdist', 'Login Code': 'subdistcode'}, inplace=True)
search_village_df.rename(columns={'จังหวัด': 'prov', 'อำเภอ': 'dist', 'ตำบล': 'subdist', 'ชื่อหมู่บ้าน': 'vill', 'Login Code': 'villcode', 'รหัส รพ.สต': 'code9'}, inplace=True)
search_school_df.rename(columns={'จังหวัด': 'prov', 'อำเภอ': 'dist', 'ตำบล': 'subdist', 'ชื่อโรงเรียน': 'sch', 'Login Code': 'schcode', 'รหัส รพ.สต': 'code9'}, inplace=True)

# Vaccine coverage API

In [27]:
# Base URL for the RESTful Web Service
url = "https://opendata.moph.go.th/api/report_data"

# Headers indicating that the payload is in JSON format
headers = {"Content-Type": "application/json"}

# List of all province codes excluding specific ones
# starting from 11-96 exclude 10 (Bangkok, which is เขตสุขภาพที่ 13) and 28/29/59/68/69/78/79/87/88/89 (null)
excluded_provinces = ['10', '28', '29', '59', '68', '69', '78', '79', '87', '88', '89']
province_codes = [f"{i:02d}" for i in range(11, 97) if f"{i:02d}" not in excluded_provinces]

# Initialize a DataFrame to store all data
s_epi_complete_data = pd.DataFrame()

for province_code in province_codes:
    # Data payload for the POST request
    data = {
        "tableName": "s_epi_complete",
        "year": "2567",  # Specify other years as needed 2557-2567
        "province": province_code,
        "type": "json"
    }

    # Make the POST request
    response = requests.post(url, headers=headers, data=json.dumps(data))

    # Check if the request was successful
    if response.status_code in [200, 201]:
        # Convert the JSON response to a pandas DataFrame and append to the s_epi_complete_data DataFrame
        temp_df = pd.json_normalize(response.json())
        s_epi_complete_data = pd.concat([s_epi_complete_data, temp_df], ignore_index=True)
    else:
        print(f"Failed to retrieve data for province code {province_code}. Status code: {response.status_code}")

In [28]:
s_epi_complete_data.head(10)

Unnamed: 0,id,hospcode,areacode,date_com,b_year,target,result,target10,result10,target11,...,target05,result05,target06,result06,target07,result07,target08,result08,target09,result09
0,f033ab37c30201f73f142449d037028d,933,11010204,202402171639,2567,20,17,2,2,1,...,4,4,2,2,4,4,2,2,0,0
1,35f4a8d465e6e1edc05f3d8ab658c551,933,11010204,202402171639,2567,25,19,4,4,1,...,1,1,2,2,0,0,1,0,2,0
2,35f4a8d465e6e1edc05f3d8ab658c551,933,11010203,202402171639,2567,20,12,3,3,1,...,0,0,1,1,1,1,6,0,0,0
3,d1fe173d08e959397adf34b1d77e88d7,933,11010203,202402171639,2567,26,20,2,2,2,...,2,2,1,1,1,1,5,2,3,0
4,d1fe173d08e959397adf34b1d77e88d7,933,11010204,202402171639,2567,33,27,6,6,3,...,2,2,1,1,2,1,3,2,4,0
5,f033ab37c30201f73f142449d037028d,933,11010203,202402171639,2567,8,4,1,1,0,...,0,0,1,1,0,0,0,0,0,0
6,f033ab37c30201f73f142449d037028d,934,11010307,202402171639,2567,23,11,2,0,4,...,1,1,1,1,2,2,1,1,3,0
7,f033ab37c30201f73f142449d037028d,934,11010302,202402171639,2567,25,4,1,0,4,...,2,0,3,1,1,0,3,0,2,1
8,35f4a8d465e6e1edc05f3d8ab658c551,934,11010302,202402171639,2567,25,15,1,1,1,...,1,0,0,0,2,0,2,0,5,0
9,35f4a8d465e6e1edc05f3d8ab658c551,934,11010307,202402171639,2567,17,9,0,0,1,...,3,1,1,0,2,0,1,0,2,0


As we can see from the id here, which is referred to as the Report ID, according to the [รหัสอ้างอิงรายงาน.xlsx](https://dmd-ict.moph.go.th/main/download), it represents the following:

| id                               | Report name                                                              | map |
|----------------------------------|--------------------------------------------------------------------------|-----|
| 28dd2c7955ce926456240b2ff0100bde | ความครอบคลุมการได้รับวัคซีนแต่ละชนิดครบตามเกณฑ์ในเด็กอายุครบ 1 ปี (fully immunized) | s_epi_complete_1yr |
| 35f4a8d465e6e1edc05f3d8ab658c551 | ความครอบคลุมการได้รับวัคซีนแต่ละชนิดครบตามเกณฑ์ในเด็กอายุครบ 2 ปี (fully immunized) | s_epi_complete_2yr |
| d1fe173d08e959397adf34b1d77e88d7 | ความครอบคลุมการได้รับวัคซีนแต่ละชนิดครบตามเกณฑ์ในเด็กอายุครบ 3 ปี (fully immunized) | s_epi_complete_3yr |
| f033ab37c30201f73f142449d037028d | ความครอบคลุมการได้รับวัคซีนแต่ละชนิดครบตามเกณฑ์ในเด็กอายุครบ 5 ปี (fully immunized) | s_epi_complete_5yr |
| 30f72fc853a2cc02ef953dc97f36f596 | ความครอบคลุมการได้รับวัคซีนแต่ละชนิดครบตามเกณฑ์ในเด็กอายุครบ 7 ปี (fully immunized) | s_epi_complete_7yr |

Unfortunately, this Opendata do not provide data for `s_epi_complete_7yr`.

In [29]:
# Updating the id_to_name dictionary with English descriptions
id_to_name = {
    "28dd2c7955ce926456240b2ff0100bde": "s_epi_complete_1yr",
    "35f4a8d465e6e1edc05f3d8ab658c551": "s_epi_complete_2yr",
    "d1fe173d08e959397adf34b1d77e88d7": "s_epi_complete_3yr",
    "f033ab37c30201f73f142449d037028d": "s_epi_complete_5yr",
    "30f72fc853a2cc02ef953dc97f36f596": "s_epi_complete_7yr"
}

# Mapping the 'id' column to the new short report names using 'id_to_name_short' dictionary
s_epi_complete_data['report_name'] = s_epi_complete_data['id'].map(id_to_name)

s_epi_complete_data.head(10)

Unnamed: 0,id,hospcode,areacode,date_com,b_year,target,result,target10,result10,target11,...,result05,target06,result06,target07,result07,target08,result08,target09,result09,report_name
0,f033ab37c30201f73f142449d037028d,933,11010204,202402171639,2567,20,17,2,2,1,...,4,2,2,4,4,2,2,0,0,s_epi_complete_5yr
1,35f4a8d465e6e1edc05f3d8ab658c551,933,11010204,202402171639,2567,25,19,4,4,1,...,1,2,2,0,0,1,0,2,0,s_epi_complete_2yr
2,35f4a8d465e6e1edc05f3d8ab658c551,933,11010203,202402171639,2567,20,12,3,3,1,...,0,1,1,1,1,6,0,0,0,s_epi_complete_2yr
3,d1fe173d08e959397adf34b1d77e88d7,933,11010203,202402171639,2567,26,20,2,2,2,...,2,1,1,1,1,5,2,3,0,s_epi_complete_3yr
4,d1fe173d08e959397adf34b1d77e88d7,933,11010204,202402171639,2567,33,27,6,6,3,...,2,1,1,2,1,3,2,4,0,s_epi_complete_3yr
5,f033ab37c30201f73f142449d037028d,933,11010203,202402171639,2567,8,4,1,1,0,...,0,1,1,0,0,0,0,0,0,s_epi_complete_5yr
6,f033ab37c30201f73f142449d037028d,934,11010307,202402171639,2567,23,11,2,0,4,...,1,1,1,2,2,1,1,3,0,s_epi_complete_5yr
7,f033ab37c30201f73f142449d037028d,934,11010302,202402171639,2567,25,4,1,0,4,...,0,3,1,1,0,3,0,2,1,s_epi_complete_5yr
8,35f4a8d465e6e1edc05f3d8ab658c551,934,11010302,202402171639,2567,25,15,1,1,1,...,0,0,0,2,0,2,0,5,0,s_epi_complete_2yr
9,35f4a8d465e6e1edc05f3d8ab658c551,934,11010307,202402171639,2567,17,9,0,0,1,...,1,1,0,2,0,1,0,2,0,s_epi_complete_2yr


In [30]:
# Define the list of report names to filter
report_names = ["s_epi_complete_1yr", "s_epi_complete_2yr", "s_epi_complete_3yr", "s_epi_complete_5yr"]

total_filtered_rows = 0

# Loop through each report name, filter the data, and sum the number of rows
for report_name in report_names:
    filtered_data = s_epi_complete_data[s_epi_complete_data['report_name'] == report_name].shape[0]
    total_filtered_rows += filtered_data
    print(f"Number of rows for {report_name}: {filtered_data}")

# Print total rows in s_epi_complete_data
total_rows = s_epi_complete_data.shape[0]

print(f"Total rows filtered: {total_filtered_rows}")
print(f"Total rows in s_epi_complete_data: {total_rows}")

# Verify if the sum matches
if total_filtered_rows == total_rows:
    print("All rows are accounted for, no data is missing.")
else:
    print("There is a discrepancy in the row counts, some data might be missing.")


Number of rows for s_epi_complete_1yr: 67755
Number of rows for s_epi_complete_2yr: 69098
Number of rows for s_epi_complete_3yr: 70955
Number of rows for s_epi_complete_5yr: 73556
Total rows filtered: 281364
Total rows in s_epi_complete_data: 281364
All rows are accounted for, no data is missing.


In [31]:
# # Export the DataFrame to a CSV file
# s_epi_complete_data.to_csv('s_epi_complete_data.csv', index=False)

In [33]:
# Rename areacode to villcode and extract hierarchical codes
s_epi_complete_data['villcode'] = s_epi_complete_data['areacode']
s_epi_complete_data['region'] = s_epi_complete_data['areacode'].str[:1]
s_epi_complete_data['provcode'] = s_epi_complete_data['areacode'].str[:2]
s_epi_complete_data['distcode'] = s_epi_complete_data['areacode'].str[:4]
s_epi_complete_data['subdistcode'] = s_epi_complete_data['areacode'].str[:6]
s_epi_complete_data.drop('areacode', axis=1, inplace=True)

In [41]:
s_epi_complete_data

Unnamed: 0,id,hospcode,date_com,b_year,target,result,target10,result10,target11,result11,...,target08,result08,target09,result09,report_name,villcode,region,provcode,distcode,subdistcode
0,f033ab37c30201f73f142449d037028d,00933,202402171639,2567,20,17,2,2,1,0,...,2,2,0,0,s_epi_complete_5yr,11010204,1,11,1101,110102
1,35f4a8d465e6e1edc05f3d8ab658c551,00933,202402171639,2567,25,19,4,4,1,1,...,1,0,2,0,s_epi_complete_2yr,11010204,1,11,1101,110102
2,35f4a8d465e6e1edc05f3d8ab658c551,00933,202402171639,2567,20,12,3,3,1,1,...,6,0,0,0,s_epi_complete_2yr,11010203,1,11,1101,110102
3,d1fe173d08e959397adf34b1d77e88d7,00933,202402171639,2567,26,20,2,2,2,2,...,5,2,3,0,s_epi_complete_3yr,11010203,1,11,1101,110102
4,d1fe173d08e959397adf34b1d77e88d7,00933,202402171639,2567,33,27,6,6,3,3,...,3,2,4,0,s_epi_complete_3yr,11010204,1,11,1101,110102
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
281359,28dd2c7955ce926456240b2ff0100bde,77729,202402131034,2567,6,4,1,1,0,0,...,0,0,0,0,s_epi_complete_1yr,96100115,9,96,9610,961001
281360,35f4a8d465e6e1edc05f3d8ab658c551,77729,202402131034,2567,6,2,0,0,1,1,...,1,0,0,0,s_epi_complete_2yr,96100119,9,96,9610,961001
281361,d1fe173d08e959397adf34b1d77e88d7,77729,202402131034,2567,6,2,1,0,0,0,...,0,0,0,0,s_epi_complete_3yr,96100118,9,96,9610,961001
281362,28dd2c7955ce926456240b2ff0100bde,77729,202402131034,2567,2,2,0,0,0,0,...,0,0,0,0,s_epi_complete_1yr,96100103,9,96,9610,961001


In [35]:
# Load search_tambon.csv
search_tambon_df = pd.read_csv('search_{}/search_tambon/search_tambon.csv', dtype={'Login Code': 'str'})
# Rename 'Login Code' to 'subdistcode' for easier reference
search_tambon_df.rename(columns={'Login Code': 'subdistcode'}, inplace=True)

In [36]:
search_tambon_df

Unnamed: 0,จังหวัด,อำเภอ,ตำบล,subdistcode
0,กรุงเทพมหานคร,พระนคร,เขตพระนคร,100100
1,กรุงเทพมหานคร,พระนคร,พระบรมมหาราชวัง,100101
2,กรุงเทพมหานคร,พระนคร,วังบูรพาภิรมย์,100102
3,กรุงเทพมหานคร,พระนคร,วัดราชบพิธ,100103
4,กรุงเทพมหานคร,พระนคร,สำราญราษฎร์,100104
...,...,...,...,...
7506,นราธิวาส,จะแนะ,ผดุงมาตร,961203
7507,นราธิวาส,จะแนะ,ช้างเผือก,961204
7508,นราธิวาส,เจาะไอร้อง,จวบ,961301
7509,นราธิวาส,เจาะไอร้อง,บูกิต,961302


In [40]:
s_epi_complete_data_all.shape

(3255615, 31)

In [41]:
#s_epi_complete_data_all.to_csv('s_epi_complete_data_all.csv', index=False)