![Banner logo](https://github.com/zackyndra23/Data_Science/blob/main/Banner1.jpg?raw=true)

# Insightful 3D Data Visualization with PyDeck (Indonesia Waste Piles in 2022): A Solution to the Limitations of Folium and Choropleth Flatmaps
*by Zaky Indra Satria Putra, student at* &nbsp;
<a href="https://purwadhika.com" target="_blank">
    <img src="https://github.com/zackyndra23/Data_Science/blob/main/Logo%20Purwadhika.png?raw=true" width="20%">
</a>

<br>

# Context

<br>

<div style="text-align:center;">
    <a href="https://medium.com/@lorenzoperozzi/visual-exploratory-analysis-with-pydeck-19423f679aa4" target="_blank">
        <center><img src="https://github.com/zackyndra23/Insightful-3D-Data-Visualization-Exploratory-Analysis-With-Pydeck-in-Phyton/blob/main/(01)%20Visual%20Exploratory%20analysis%20with%20pydeck.png?raw=true" width="60%">
        <figcaption>Created by Zaky Indra Satria Putra on his article (click to explore) </figcaption> </center>
    </a>
</div>

<br>

The increasing complexity of datasets in **data science** needs sophisticated visualization tools. In response to this demand, various tools and libraries have been developed **to help data analysts present data in a more intuitive and informative way**. In this project, we will explore the domain of **3D visualization using PyDeck** in an application to **illustrate the number of waste piles in Indonesia during 2022**.

# Problem Determination

<br>

<div style="text-align:center;">
    <a href="https://www.kompas.id/baca/english/2023/06/27/en-mayoritas-sampah-di-indonesia-adalah-sampah-makanan" target="_blank">
        <center><img src="https://github.com/zackyndra23/Insightful-3D-Data-Visualization-Exploratory-Analysis-With-Pydeck-in-Phyton/blob/main/(02)%20Piles%20of%20Waste%20in%20Batargebang.jpg?raw=true" width="60%">
        <figcaption>Taken by Fakhri Fadlurrohman on his article (click to read) </figcaption> </center>
    </a>
</div>

<br>

Data visualization in three dimensions provides a richer and deeper perspective. However, in choosing a tool for the challenge, it's often difficult to decide whether a 3D approach is more effective than a traditional 2D approach. **What are the advantages and obstacles of each in understanding the number of waste piles spatial distribution of Indonesia in 2022?**

# Methodology

## Data Preparation & Data Wrangling to Prepare Data

#### Defining the library

In [1]:
import pandas as pd
import pydeck as pdk
from pydeck.types import String
import matplotlib.pyplot as plt
import requests
import folium
import plotly.graph_objects as go

import warnings
warnings.filterwarnings('ignore')

Exploring data from various sources regarding [the number of Micro, Small and Medium Enterprises (UMKM)](https://se2016.bps.go.id/umkumb/index.php/site/tabel?tid=22&wid=0), [the number of workers](https://se2016.bps.go.id/umkumb/index.php/site/tabel?tid=28&wid=0), and [the number of waste piles](https://sipsn.menlhk.go.id/sipsn/public/data/timbulan) for each regency and province in Indonesia.

#### Importing all .csv files that will be used for data analysis and visualization, then integrating the data into complete informations.

In [2]:
# importing .csv files containing the number of Micro, Small and Medium Enterprises (UMKM), the number of workers, and total waste piles in tons per province
indo_waste = "https://raw.githubusercontent.com/zackyndra23/Insightful-3D-Data-Visualization-Exploratory-Analysis-With-Pydeck-in-Phyton/main/Indo_Waste.csv"
df = pd.read_csv(indo_waste, sep=';')

# Importing .csv file containing coordinates of each provinces
indo_waste2 = "https://raw.githubusercontent.com/zackyndra23/Insightful-3D-Data-Visualization-Exploratory-Analysis-With-Pydeck-in-Phyton/main/Indo_Waste2.csv"
df2 = pd.read_csv(indo_waste2, sep=';')

# Importing .csv file containing coordinates of each province
province = "https://raw.githubusercontent.com/zackyndra23/Insightful-3D-Data-Visualization-Exploratory-Analysis-With-Pydeck-in-Phyton/main/province.csv"
df3 = pd.read_csv(province)

# Importing .csv file containing coordinates of each regency
regency = "https://raw.githubusercontent.com/zackyndra23/Insightful-3D-Data-Visualization-Exploratory-Analysis-With-Pydeck-in-Phyton/main/regencies.csv"
df4 = pd.read_csv(regency)


#### Viewing all columns of the imported dataframe to understand the whole  information

In [3]:
df.head(1)

Unnamed: 0,province,regencies,wasteperday
0,ACEH,KABUPATEN ACEH SELATAN,96.49


In [4]:
df2.head(1)

Unnamed: 0,province,umk_count,workers_count,wasteperday_count
0,ACEH,494,869,1640.1


In [5]:
df3.head(1)

Unnamed: 0,id,name,alt_name,latitude,longitude
0,11,ACEH,ACEH,4.36855,97.0253


In [6]:
df4.head(1)

Unnamed: 0,id,province_id,name,alt_name,latitude,longitude
0,1101,11,KABUPATEN SIMEULUE,KABUPATEN SIMEULUE,2.61667,96.08333


#### Cleaning up the data in df3 so that it can be integrated with the main df . Then, information related to province_id can be unified

In [7]:
df3[df3['name'] == 'DKI JAKARTA']

Unnamed: 0,id,name,alt_name,latitude,longitude
10,31,DKI JAKARTA,DKI JAKARTA,6.1745,106.8227


In [8]:
# Preparing the data for df3 dataframe containing province_id
# Dictionary containing old and new column name pairs
mapping_df3 = {'id': 'province_id', 'name': 'province', 'longitude': 'longitude_prov', 'latitude': 'latitude_prov'}

df3.rename(columns=mapping_df3, inplace=True)     # Renaming the column for integrating
df3 = df3.drop(columns=['alt_name'])              # Deleting the column 'alt_name'

# Changing the data 'DI Yogyakarta' to 'D.I. Yogyakarta' and changing the coordinate data 
df3['province'] = df3['province'].replace({'DI YOGYAKARTA': 'D.I. YOGYAKARTA'})
df3.loc[10, 'longitude_prov'] = 106.8451; df3.loc[10, 'latitude_prov'] = -6.2115

# Adding province data 'SOUTH PAPUA', 'CENTRAL PAPUA', and 'PAPUA' 
new_datas = [
    {'province_id': 90, 'province': 'PAPUA SELATAN', 'latitude_prov': -6.5008, 'longitude_prov': 139.3835},
    {'province_id': 92, 'province': 'PAPUA TENGAH', 'latitude_prov': -4.5435, 'longitude_prov': 136.5655},
    {'province_id': 93, 'province': 'PAPUA BARAT DAYA', 'latitude_prov': -1.1000, 'longitude_prov': 131.5166}
]

# Adding some new rows of data using a loop
for data in new_datas:
    df3 = df3.append(data, ignore_index=True)

df = pd.merge(df, df3, how='left', left_on='province', right_on='province')  # Retrieve only province_id data in dataframe df3
df = df[['province_id','province','regencies','wasteperday','longitude_prov','latitude_prov']]   # Rearranging the column order
df.head(1)

Unnamed: 0,province_id,province,regencies,wasteperday,longitude_prov,latitude_prov
0,11,ACEH,KABUPATEN ACEH SELATAN,96.49,97.0253,4.36855


In [9]:
df[df['province'] == 'DKI JAKARTA'].head(2)

Unnamed: 0,province_id,province,regencies,wasteperday,longitude_prov,latitude_prov
91,31,DKI JAKARTA,KABUPATEN ADM. KEP. SERIBU,17.89,106.8451,-6.2115
92,31,DKI JAKARTA,KOTA ADM. JAKARTA PUSAT,850.05,106.8451,-6.2115


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 310 entries, 0 to 309
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   province_id     310 non-null    int64  
 1   province        310 non-null    object 
 2   regencies       310 non-null    object 
 3   wasteperday     310 non-null    float64
 4   longitude_prov  310 non-null    float64
 5   latitude_prov   310 non-null    float64
dtypes: float64(3), int64(1), object(2)
memory usage: 17.0+ KB


#### Preparing the df4 dataframe in order to be integrated with the main df. Then, information related to regency_id, longitude and latitude of regency can be unified.

In [11]:
# Preparing df4 dataframe for gaining regency_id, longitude, and latitude information for each regency.
# Rename the columns 'id','name','latitude','longitude' to 'regency_id','regencies','latitude_reg','longitude_reg'
mapping_df4 = {'id': 'regency_id', 'name': 'regencies', 'latitude': 'latitude_reg', 'longitude': 'longitude_reg'}

df4.rename(columns=mapping_df4, inplace=True)     # Renaming the column for integrating
df4 = df4.drop(columns=['alt_name'])              # Deleting the column 'alt_name'

# Changing the data in df4 so that it can be integrated properly
reg_change = {'KABUPATEN TOBA SAMOSIR': 'KABUPATEN TOBA', 'KOTA PADANG SIDEMPUAN': 'KOTA PADANGSIDIMPUAN',
              'KOTA SAWAH LUNTO': 'KOTA SAWAHLUNTO', 'KABUPATEN MUSI BANYU ASIN': 'KABUPATEN MUSI BANYUASIN',
              'KABUPATEN BANYU ASIN': 'KABUPATEN BANYUASIN', 'KABUPATEN KEPULAUAN SERIBU': 'KABUPATEN ADM. KEP. SERIBU',
              'KOTA JAKARTA PUSAT': 'KOTA ADM. JAKARTA PUSAT', 'KOTA JAKARTA UTARA': 'KOTA ADM. JAKARTA UTARA',
              'KOTA JAKARTA BARAT': 'KOTA ADM. JAKARTA BARAT', 'KOTA JAKARTA SELATAN': 'KOTA ADM. JAKARTA SELATAN',
              'KOTA JAKARTA TIMUR': 'KOTA ADM. JAKARTA TIMUR', 'KABUPATEN GUNUNG KIDUL': 'KABUPATEN GUNUNGKIDUL',
              'KABUPATEN KARANG ASEM': 'KABUPATEN KARANGASEM', 'KOTA PALANGKA RAYA': 'KOTA PALANGKARAYA',
              'KABUPATEN KOTA BARU': 'KABUPATEN KOTABARU', 'KOTA BANJAR BARU': 'KOTA BANJARBARU',
              'KABUPATEN MAHAKAM HULU': 'KABUPATEN MAHAKAM ULU', 'KABUPATEN SIAU TAGULANDANG BIARO': 'KABUPATEN KEP. SIAU TAGULANDANG BIARO',
              'KABUPATEN PANGKAJENE DAN KEPULAUAN': 'KABUPATEN PANGKAJENE KEPULAUAN', 'KOTA PARE-PARE': 'KOTA PAREPARE',
              'KOTA BAUBAU': 'KOTA BAU BAU'}
df4['regencies'] = df4['regencies'].replace(reg_change)

# Retrieving regency_id, latitude, and longitude data for each regency from df4
df = pd.merge(df, df4[['regency_id','regencies','latitude_reg','longitude_reg']], how='left', left_on='regencies', right_on='regencies')
df = df[['province_id','regency_id','province','regencies','wasteperday','longitude_prov','latitude_prov','longitude_reg','latitude_reg']]    # Merapihkan urutan kolom
df.head(1)

Unnamed: 0,province_id,regency_id,province,regencies,wasteperday,longitude_prov,latitude_prov,longitude_reg,latitude_reg
0,11,1103,ACEH,KABUPATEN ACEH SELATAN,96.49,97.0253,4.36855,97.41667,3.16667


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 310 entries, 0 to 309
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   province_id     310 non-null    int64  
 1   regency_id      310 non-null    int64  
 2   province        310 non-null    object 
 3   regencies       310 non-null    object 
 4   wasteperday     310 non-null    float64
 5   longitude_prov  310 non-null    float64
 6   latitude_prov   310 non-null    float64
 7   longitude_reg   310 non-null    float64
 8   latitude_reg    310 non-null    float64
dtypes: float64(5), int64(2), object(2)
memory usage: 24.2+ KB


#### Preparing the df2 dataframe in order to be integrated with the main df. Then, information related to the number of Micro, Small and Medium Enterprises (UMKM) and the number of workers/labors related to landfill management, wastewater, food waste, waste recycling, and remediation can be unified.

In [13]:
# Changing the data in the df2 in order to be integrated properly
prov_change = {'KEP. BANGKA BELITUNG': 'KEPULAUAN BANGKA BELITUNG', 'DI YOGYAKARTA': 'D.I. YOGYAKARTA'}
df2['province'] = df2['province'].replace(prov_change)

# Creating news province datas 'SOUTH PAPUA', 'CENTRAL PAPUA', and 'WEST PAPUA' (dictionary in a list)
new_datas = [
    {'province': 'PAPUA SELATAN', 'umk_count': 186, 'workers_count': 461, 'wasteperday_count': 439.72},
    {'province': 'PAPUA TENGAH', 'umk_count': 186, 'workers_count': 461, 'wasteperday_count': 439.72},
    {'province': 'PAPUA BARAT DAYA', 'umk_count': 186, 'workers_count': 461, 'wasteperday_count': 439.72}
]

# Adding some new rows of datas using a loop
for data in new_datas:
    df2 = df2.append(data, ignore_index=True)

# Retrieving data on the number of UMK's and labors from df2
df = pd.merge(df, df2[['umk_count','workers_count','wasteperday_count','province']], how='left', left_on='province', right_on='province')

# Creating new columns 'coordinates_prov' and 'coordinates_reg' containing a list of (longitude, latitude)
df['coordinates_prov'] = list(zip(df['longitude_prov'], df['latitude_prov']))
df['coordinates_reg'] = list(zip(df['longitude_reg'], df['latitude_reg']))

# Converting the 'coordinates' columns into a list containing two coordinates
df['coordinates_prov'] = df['coordinates_prov'].apply(lambda coord: list(coord))
df['coordinates_reg'] = df['coordinates_reg'].apply(lambda coord: list(coord))

# Creating a new column for percentage of total to province and national
df['local_perc'] = ((df['wasteperday'] / df['wasteperday_count']) * 100).round(2)
df['nat_perc'] = ((df['wasteperday'] / df['wasteperday'].sum()) * 100).round(2)

# Rearranging the order of the columns
df = df[['province_id','regency_id','province','regencies','umk_count','workers_count','wasteperday','wasteperday_count',
         'local_perc','nat_perc','longitude_prov', 'latitude_prov','longitude_reg','latitude_reg','coordinates_prov','coordinates_reg']]

df.head(1)

Unnamed: 0,province_id,regency_id,province,regencies,umk_count,workers_count,wasteperday,wasteperday_count,local_perc,nat_perc,longitude_prov,latitude_prov,longitude_reg,latitude_reg,coordinates_prov,coordinates_reg
0,11,1103,ACEH,KABUPATEN ACEH SELATAN,494,869,96.49,1640.1,5.88,0.1,97.0253,4.36855,97.41667,3.16667,"[97.0253, 4.36855]","[97.41667, 3.16667]"


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 310 entries, 0 to 309
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   province_id        310 non-null    int64  
 1   regency_id         310 non-null    int64  
 2   province           310 non-null    object 
 3   regencies          310 non-null    object 
 4   umk_count          310 non-null    int64  
 5   workers_count      310 non-null    int64  
 6   wasteperday        310 non-null    float64
 7   wasteperday_count  310 non-null    float64
 8   local_perc         310 non-null    float64
 9   nat_perc           310 non-null    float64
 10  longitude_prov     310 non-null    float64
 11  latitude_prov      310 non-null    float64
 12  longitude_reg      310 non-null    float64
 13  latitude_reg       310 non-null    float64
 14  coordinates_prov   310 non-null    object 
 15  coordinates_reg    310 non-null    object 
dtypes: float64(8), int64(4), o

In [15]:
# Adding ranking columns for 'wasteperday','umk_count' and 'workers_count' columns with 'dense' method
df['rank_wasteperday'] = df['wasteperday'].rank(method='dense', ascending=False)
df['rank_umk'] = df['umk_count'].rank(method='dense', ascending=False)
df['rank_workers'] = df['workers_count'].rank(method='dense', ascending=False)

# Changing the rank data type to integer
df['rank_wasteperday'] = df['rank_wasteperday'].astype(int)
df['rank_umk'] = df['rank_umk'].astype(int)
df['rank_workers'] = df['rank_workers'].astype(int)

# Sort from regencies with the highest waste pile production rate
df.sort_values(by='rank_wasteperday', ascending=True).head(10)[['province_id','province','regencies','wasteperday','rank_wasteperday','umk_count','rank_umk','workers_count','rank_workers']]

Unnamed: 0,province_id,province,regencies,wasteperday,rank_wasteperday,umk_count,rank_umk,workers_count,rank_workers
96,31,DKI JAKARTA,KOTA ADM. JAKARTA TIMUR,2313.02,1,7864,4,14925,4
178,36,BANTEN,KABUPATEN TANGERANG,2305.47,2,3645,6,8179,5
104,32,JAWA BARAT,KABUPATEN BEKASI,2250.35,3,21648,1,45162,1
94,31,DKI JAKARTA,KOTA ADM. JAKARTA BARAT,2023.42,4,7864,4,14925,4
95,31,DKI JAKARTA,KOTA ADM. JAKARTA SELATAN,1954.25,5,7864,4,14925,4
108,32,JAWA BARAT,KOTA BEKASI,1830.63,6,21648,1,45162,1
175,35,JAWA TIMUR,KOTA SURABAYA,1783.68,7,17080,2,35477,2
26,12,SUMATERA UTARA,KOTA MEDAN,1722.6,8,4448,5,7675,6
107,32,JAWA BARAT,KOTA BANDUNG,1594.18,9,21648,1,45162,1
180,36,BANTEN,KOTA TANGERANG,1381.53,10,3645,6,8179,5


In [16]:
df.sort_values(by='rank_wasteperday', ascending=True).head(5)

Unnamed: 0,province_id,regency_id,province,regencies,umk_count,workers_count,wasteperday,wasteperday_count,local_perc,nat_perc,longitude_prov,latitude_prov,longitude_reg,latitude_reg,coordinates_prov,coordinates_reg,rank_wasteperday,rank_umk,rank_workers
96,31,3172,DKI JAKARTA,KOTA ADM. JAKARTA TIMUR,7864,14925,2313.02,8527.07,27.13,2.34,106.8451,-6.2115,106.884,-6.2521,"[106.8451, -6.2115]","[106.884, -6.2521]",1,4,4
178,36,3603,BANTEN,KABUPATEN TANGERANG,3645,8179,2305.47,7199.63,32.02,2.33,106.13756,-6.44538,106.46667,-6.2,"[106.13756, -6.44538]","[106.46667, -6.2]",2,6,5
104,32,3216,JAWA BARAT,KABUPATEN BEKASI,21648,45162,2250.35,13410.01,16.78,2.27,107.64047,-6.88917,107.10833,-6.24667,"[107.64047, -6.88917]","[107.10833, -6.24667]",3,1,1
94,31,3174,DKI JAKARTA,KOTA ADM. JAKARTA BARAT,7864,14925,2023.42,8527.07,23.73,2.05,106.8451,-6.2115,106.7673,-6.1676,"[106.8451, -6.2115]","[106.7673, -6.1676]",4,4,4
95,31,3171,DKI JAKARTA,KOTA ADM. JAKARTA SELATAN,7864,14925,1954.25,8527.07,22.92,1.98,106.8451,-6.2115,106.8135,-6.266,"[106.8451, -6.2115]","[106.8135, -6.266]",5,4,4


In [17]:
dfZ = df.sort_values(by='rank_workers', ascending=True)
dfZ[dfZ['rank_workers'] <= 7].tail(50)

Unnamed: 0,province_id,regency_id,province,regencies,umk_count,workers_count,wasteperday,wasteperday_count,local_perc,nat_perc,longitude_prov,latitude_prov,longitude_reg,latitude_reg,coordinates_prov,coordinates_reg,rank_wasteperday,rank_umk,rank_workers
137,33,3329,JAWA TENGAH,KABUPATEN BREBES,14176,26630,1005.31,15500.91,6.49,1.02,110.00441,-7.30324,108.9,-7.05,"[110.00441, -7.30324]","[108.9, -7.05]",21,3,3
135,33,3327,JAWA TENGAH,KABUPATEN PEMALANG,14176,26630,593.68,15500.91,3.83,0.6,110.00441,-7.30324,109.4,-7.03333,"[110.00441, -7.30324]","[109.4, -7.03333]",40,3,3
134,33,3326,JAWA TENGAH,KABUPATEN PEKALONGAN,14176,26630,390.6,15500.91,2.52,0.39,110.00441,-7.30324,109.624,-7.0319,"[110.00441, -7.30324]","[109.624, -7.0319]",78,3,3
133,33,3325,JAWA TENGAH,KABUPATEN BATANG,14176,26630,567.28,15500.91,3.66,0.57,110.00441,-7.30324,109.88333,-7.03333,"[110.00441, -7.30324]","[109.88333, -7.03333]",47,3,3
132,33,3324,JAWA TENGAH,KABUPATEN KENDAL,14176,26630,410.01,15500.91,2.65,0.41,110.00441,-7.30324,110.1685,-7.0256,"[110.00441, -7.30324]","[110.1685, -7.0256]",71,3,3
131,33,3323,JAWA TENGAH,KABUPATEN TEMANGGUNG,14176,26630,400.14,15500.91,2.58,0.4,110.00441,-7.30324,110.11667,-7.25,"[110.00441, -7.30324]","[110.11667, -7.25]",75,3,3
130,33,3322,JAWA TENGAH,KABUPATEN SEMARANG,14176,26630,529.92,15500.91,3.42,0.54,110.00441,-7.30324,110.44139,-7.20667,"[110.00441, -7.30324]","[110.44139, -7.20667]",53,3,3
129,33,3321,JAWA TENGAH,KABUPATEN DEMAK,14176,26630,722.37,15500.91,4.66,0.73,110.00441,-7.30324,110.6122,-6.8993,"[110.00441, -7.30324]","[110.6122, -6.8993]",35,3,3
128,33,3320,JAWA TENGAH,KABUPATEN JEPARA,14176,26630,412.37,15500.91,2.66,0.42,110.00441,-7.30324,110.76667,-6.58333,"[110.00441, -7.30324]","[110.76667, -6.58333]",70,3,3
127,33,3319,JAWA TENGAH,KABUPATEN KUDUS,14176,26630,448.16,15500.91,2.89,0.45,110.00441,-7.30324,110.86667,-6.8,"[110.00441, -7.30324]","[110.86667, -6.8]",64,3,3


#### Creating a new dataframe for visualization based on province

In [18]:
# Using groupby function and mode for grouping by frequent occurring values
df_prov = df.groupby('province_id')['province','umk_count','workers_count','wasteperday_count','longitude_prov', 'latitude_prov','coordinates_prov'].apply(lambda x: x.mode().iloc[0])
df_prov = df_prov.reset_index()

# Creating a new column for the percentage of province waste piles to the total in Indonesia.
df_prov['prov_perc'] = ((df_prov['wasteperday_count'] / df_prov['wasteperday_count'].sum()) * 100).round(2)

df_prov.head(1)

Unnamed: 0,province_id,province,umk_count,workers_count,wasteperday_count,longitude_prov,latitude_prov,coordinates_prov,prov_perc
0,11,ACEH,494,869,1640.1,97.0253,4.36855,"[97.0253, 4.36855]",1.64


In [19]:
df_prov.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   province_id        37 non-null     int64  
 1   province           37 non-null     object 
 2   umk_count          37 non-null     int64  
 3   workers_count      37 non-null     int64  
 4   wasteperday_count  37 non-null     float64
 5   longitude_prov     37 non-null     float64
 6   latitude_prov      37 non-null     float64
 7   coordinates_prov   37 non-null     object 
 8   prov_perc          37 non-null     float64
dtypes: float64(4), int64(3), object(2)
memory usage: 2.7+ KB


#### Creating folium and choropleth maps to visualize the amount of waste production of each province per day

In [20]:
# Importing geojson files (province geospatial datas)
provinces_geo = requests.get(
    "https://raw.githubusercontent.com/zackyndra23/Insightful-3D-Data-Visualization-Exploratory-Analysis-With-Pydeck-in-Phyton/main/indoprovinces.geojson"
).json()

In [21]:
# Creating a map using Folium with center coordinates (-2.2331, 117.2841) and an initial zoom of 5
indo_waste = folium.Map(location=[-2.2331, 117.2841], zoom_start=5)

# Setting a Choropleth layer to the map using geospatial data (.geojson) and provincial data (df_prov)
folium.Choropleth(
    geo_data=provinces_geo,
    name="choropleth",
    data=df_prov,
    columns=["province_id", "wasteperday_count"],   # The column used to color the map is "wasteperday_count" with YlGn color scale
    key_on="feature.properties.province_id",        # Associating the "province_id" column in the DataFrame with geospatial data
    fill_color="YlGn",                              # Setting the color clarity and boundary lines on the map
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Waste per day (tons)",
).add_to(indo_waste)

# An option to add a dark map layer (cartodbdark_matter)
# folium.TileLayer('cartodbdark_matter').add_to(indo_waste)
# Adding a layer control (LayerControl) that allows users to turn off or on the map layer
folium.LayerControl().add_to(indo_waste)

# Showing the map that has been created
indo_waste

#### Create a new dataframe for interactive visualization of choropleth map using plotly .go

In [22]:
# dataframe with columns referenced in question
df_gjson = pd.DataFrame(
    {"province": pd.json_normalize(provinces_geo["features"])["properties.state"]}
).assign(province_strcount=lambda d: d["province"].str.len())
df_gjson.head(1)

Unnamed: 0,province,province_strcount
0,ACEH,4


In [23]:
list(df_gjson['province'].sort_values(ascending=True))

['ACEH',
 'BALI',
 'BANTEN',
 'BENGKULU',
 'DAERAH ISTIMEWA YOGYAKARTA',
 'DKI JAKARTA',
 'GORONTALO',
 'JAMBI',
 'JAWA BARAT',
 'JAWA TENGAH',
 'JAWA TIMUR',
 'KALIMANTAN BARAT',
 'KALIMANTAN SELATAN',
 'KALIMANTAN TENGAH',
 'KALIMANTAN TIMUR',
 'KALIMANTAN UTARA',
 'KEPULAUAN BANGKA BELITUNG',
 'KEPULAUAN RIAU',
 'LAMPUNG',
 'MALUKU',
 'MALUKU UTARA',
 'NUSA TENGGARA BARAT',
 'NUSA TENGGARA TIMUR',
 'PAPUA',
 'PAPUA BARAT',
 'RIAU',
 'SULAWESI BARAT',
 'SULAWESI SELATAN',
 'SULAWESI TENGAH',
 'SULAWESI TENGGARA',
 'SULAWESI UTARA',
 'SUMATERA BARAT',
 'SUMATERA SELATAN',
 'SUMATERA UTARA']

In [24]:
df_cho = df_prov.copy()      # Duplicate the main dataframe (df)

# Deleting province datas that is not existed in the geojson file
prov_drop = ['PAPUA BARAT DAYA','PAPUA SELATAN','PAPUA TENGAH']
df_cho = df_cho[~df_cho['province'].isin(prov_drop)]

# Renaming the province to match the geojson file
prov_change = {'D.I. YOGYAKARTA': 'DAERAH ISTIMEWA YOGYAKARTA'}
df_cho['province'] = df_cho['province'].replace(prov_change)

df_cho.head(1)

Unnamed: 0,province_id,province,umk_count,workers_count,wasteperday_count,longitude_prov,latitude_prov,coordinates_prov,prov_perc
0,11,ACEH,494,869,1640.1,97.0253,4.36855,"[97.0253, 4.36855]",1.64


In [25]:
df_cho.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 34 entries, 0 to 36
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   province_id        34 non-null     int64  
 1   province           34 non-null     object 
 2   umk_count          34 non-null     int64  
 3   workers_count      34 non-null     int64  
 4   wasteperday_count  34 non-null     float64
 5   longitude_prov     34 non-null     float64
 6   latitude_prov      34 non-null     float64
 7   coordinates_prov   34 non-null     object 
 8   prov_perc          34 non-null     float64
dtypes: float64(4), int64(3), object(2)
memory usage: 2.7+ KB


In [26]:
fig8 = go.Figure(
    data=go.Choropleth(
        geojson=provinces_geo,
        locations=df_cho["province"],  # Spatial coordinates
        featureidkey="properties.state",
        z=df_cho["wasteperday_count"],  # Data to be color-coded
        colorscale="greens",
        colorbar_title="Waste per Day (Tons)",
    )
)
fig8.update_geos(fitbounds="locations", visible=False)

# Changing the background into black
fig8.update_layout(
    geo=dict(
        bgcolor="black",
        lakecolor="black",
        landcolor="black",
        showland=True,
        showlakes=True,
        showocean=True,
        showrivers=True,
        showcountries=True,
    )
)

fig8

Fig. before is the result of a 2D choropleth map from the Folium Library while Figure above is the result of the Plotly Library. The difference between those two is **the level of interactivity**. Fig. above, when the cursor is moved to the chosen province, it will show information on the number of waste piles per day in tons and also the name of the province. Meanwhile, Figure before can only be interpreted using the color index so that **provincial interpretation errors can occur**. Then, from the level of color difference, fig. above is **more distinguishable** than fig. before.

#### When the dataframe has no missing values, columns with integer and float data types are correct, then 3D Data Visualization using PyDeck can be started.

#### Visualizing 3D data using PyDeck [(ScatterplotLayer)](https://deckgl.readthedocs.io/en/latest/gallery/scatterplot_layer.html)

In [27]:
# Define a layer to display on a map (3D Visualization based on provinces)
layer = pdk.Layer(
    "ScatterplotLayer",
    df_prov,
    pickable=True,
    opacity=0.8,
    stroked=True,
    filled=True,
    radius_scale=5,
    radius_min_pixels=1,
    radius_max_pixels=100,
    line_width_min_pixels=1,
    get_position="coordinates_prov",
    get_radius="umk_count",
    get_fill_color=[255, 140, 0],
    get_line_color=[0, 0, 0],
)

# Set the viewport location
view_state = pdk.ViewState(
    longitude=117.2841,
    latitude=-2.2331,
    zoom=2,
    min_zoom=4,
    max_zoom=15,
    pitch=20,
    bearing=-10)

tooltip = {
    "html": "<span style='color: blue;'>{province}</span> Province <br> There are <span style='color: red;'>{umk_count}</span> UMK's and <span style='color: red;'>{workers_count}</span> workers, producing <span style='color: red;'>{wasteperday_count}</span> (<span style='color: orange;'>{prov_perc}</span>% national) tons of waste per day",
    "style": {"background": "grey", "color": "white", "font-family": '"Helvetica Neue", Arial', "z-index": "10000"},
}

# Render
r = pdk.Deck(layers=[layer], initial_view_state=view_state, tooltip=tooltip)
r.to_html("scatterplot_layer.html")

ScatterplotLayer helps in **understanding the spatial distribution patterns** of a collection of data points. This layer provides a **level of interactivity** that allows users to explore the data further by hovering the cursor or performing other interactions on the map.

Based on figure (3.D), we can see the provinces with **the highest number of UMK’s**, such as **Jawa Barat** Province (21648 UMK’s), **Jawa Timur** (17080), **Jawa Tengah** (14176), **DKI Jakarta** (7864) and **Sumatera Utara** (4448). On Kalimantan Island, **Kalimantan Selatan** Province is the area the highest number of UMK’s **related to the management of industrial, water, household waste, food waste and remediation** (1490). On Sumatra Island, the highest number is in **Sumatera Utara** Province (4448). And on Sulawesi Island, the highest number is in **Sulawesi Selatan** Province (1823).

#### Visualizing 3D data using PyDeck [(GridLayer)](https://deckgl.readthedocs.io/en/latest/gallery/grid_layer.html)

In [28]:
df.head(1)

Unnamed: 0,province_id,regency_id,province,regencies,umk_count,workers_count,wasteperday,wasteperday_count,local_perc,nat_perc,longitude_prov,latitude_prov,longitude_reg,latitude_reg,coordinates_prov,coordinates_reg,rank_wasteperday,rank_umk,rank_workers
0,11,1103,ACEH,KABUPATEN ACEH SELATAN,494,869,96.49,1640.1,5.88,0.1,97.0253,4.36855,97.41667,3.16667,"[97.0253, 4.36855]","[97.41667, 3.16667]",220,22,22


In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 310 entries, 0 to 309
Data columns (total 19 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   province_id        310 non-null    int64  
 1   regency_id         310 non-null    int64  
 2   province           310 non-null    object 
 3   regencies          310 non-null    object 
 4   umk_count          310 non-null    int64  
 5   workers_count      310 non-null    int64  
 6   wasteperday        310 non-null    float64
 7   wasteperday_count  310 non-null    float64
 8   local_perc         310 non-null    float64
 9   nat_perc           310 non-null    float64
 10  longitude_prov     310 non-null    float64
 11  latitude_prov      310 non-null    float64
 12  longitude_reg      310 non-null    float64
 13  latitude_reg       310 non-null    float64
 14  coordinates_prov   310 non-null    object 
 15  coordinates_reg    310 non-null    object 
 16  rank_wasteperday   310 non

In [30]:
df.describe()

Unnamed: 0,province_id,regency_id,umk_count,workers_count,wasteperday,wasteperday_count,local_perc,nat_perc,longitude_prov,latitude_prov,longitude_reg,latitude_reg,rank_wasteperday,rank_umk,rank_workers
count,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0
mean,42.383871,4267.332258,5210.225806,10407.574194,319.168806,5210.733742,11.240968,0.322645,111.877215,-3.043904,111.791181,-3.200731,155.364516,13.416129,12.803226
std,24.495645,2453.64331,6930.776692,14111.994088,399.685079,5539.670338,13.550531,0.403959,9.504734,3.792117,9.589806,3.944016,89.430739,9.592898,9.046267
min,11.0,1101.0,33.0,119.0,13.2,56.87,0.21,0.01,97.0253,-8.56568,95.31086,-10.21667,1.0,1.0,1.0
25%,18.0,1826.75,494.0,913.0,85.7325,1331.35,3.375,0.09,104.58037,-6.96851,104.646785,-7.03333,78.25,4.0,4.0
50%,35.0,3511.0,1260.0,2416.0,163.15,2413.83,6.32,0.165,111.09689,-3.12668,111.03125,-3.2281,155.5,12.0,12.0
75%,64.0,6403.75,7864.0,14925.0,388.7125,8527.07,12.9825,0.39,117.63696,-0.13224,117.233333,-0.016703,232.75,22.0,21.0
max,94.0,9471.0,21648.0,45162.0,2313.02,15500.91,84.52,2.34,139.3835,7.7956,140.77779,5.82164,309.0,34.0,33.0


In [31]:
# 2. Definisikan Layer GridLayer
layer2 = pdk.Layer(
    "GridLayer",
    df,
    pickable=True,
    extruded=True,
    cell_size=100000,  # Adjust to the wanted grid size
    elevation_scale=15,
    elevation_range=[119, 45162],
    get_position=["longitude_prov", "latitude_prov"],
    get_elevation="workers_count",
)

# Set the viewport location
view_state = pdk.ViewState(
    longitude=117.2841,
    latitude=-2.2331,
    zoom=2,
    min_zoom=4,
    max_zoom=15,
    pitch=40.5,
    bearing=-27.36)

# Map Rendering
deck = pdk.Deck(
    layers=[layer2],
    initial_view_state=view_state,
    tooltip={"text": "Coordinates: {position}\nCount : {count}"},
)

# Saving Render Result to HTML File
deck.to_html("grid_layer_waste.html")

GridLayer shows **the data intensity** in various provinces. Each tile on the grid is represented by a specific numerical value, and the color of the grid will reflect that intensity level and help in **analyzing data distribution patterns across the map**. This gridlayer allows **visual adjustments** as well such as color, grid density, and other styles, so we can customize the appearance according to our visual needs and preferences.

Based on figure (3.B), we can see the provinces with **the highest number of workers/laborers in managing waste, household waste, industry, food waste, water and remediation**, such as the **Jawa Barat** Province, **Jawa Timur**, **Jawa Tengah**, **DKI Jakarta**, **Banten**, **Sumatera Utara**, and **Riau** Province. On Kalimantan Island, **Kalimantan Selatan** Province is the area with the largest number of workers (2416 people from 11 regencies). On the island of Sumatra, the highest productivity is in **Sumatera Utara** Province (7675 people from 18 regencies). And on Sulawesi Island, the highest productivity is in **Sulawesi Selatan** Province (3791 people from 17 regencies).

#### Creating a folium map containing whole information related to the amount of waste per day, the number of UMK's, and the number of workers/labors in 2022-2023

In [32]:
indo_waste = folium.Map(location=[-2.2331, 117.2841], zoom_start=5)

def marker(placeMap, index_data):
    folium.Marker([df['latitude_reg'][index_data], df['longitude_reg'][index_data]],
                  popup = f'''
                    {df['province'][index_data]},{df['regencies'][index_data]}
                    There are {df['umk_count'][index_data]} UMK's and {df['workers_count'][index_data]} workers,
                    producing {df['wasteperday'][index_data]} ({df['local_perc'][index_data]}% local and {df['nat_perc'][index_data]}% national) tons of waste per day''',
                  tooltip =df['regencies'][index_data],
                  icon = folium.Icon(color = 'red', icon = 'info-sign')
                  ).add_to(placeMap)


for i in range(len(df)):
    marker(indo_waste, i)

folium.TileLayer('cartodbdark_matter').add_to(indo_waste)
indo_waste

In this fig, using the Folium Library is quite **interactive**, when we click on the icon for the selected regency, it shows information on 3 parameters (number of waste piles, number UMK’s, and number of workers/labors). **Additional provincial and national percentage information data** can also be added and displayed. In this visualization **we cannot find patterns** so it just provides the infos from user configuration.

In [33]:
df.describe()

Unnamed: 0,province_id,regency_id,umk_count,workers_count,wasteperday,wasteperday_count,local_perc,nat_perc,longitude_prov,latitude_prov,longitude_reg,latitude_reg,rank_wasteperday,rank_umk,rank_workers
count,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0
mean,42.383871,4267.332258,5210.225806,10407.574194,319.168806,5210.733742,11.240968,0.322645,111.877215,-3.043904,111.791181,-3.200731,155.364516,13.416129,12.803226
std,24.495645,2453.64331,6930.776692,14111.994088,399.685079,5539.670338,13.550531,0.403959,9.504734,3.792117,9.589806,3.944016,89.430739,9.592898,9.046267
min,11.0,1101.0,33.0,119.0,13.2,56.87,0.21,0.01,97.0253,-8.56568,95.31086,-10.21667,1.0,1.0,1.0
25%,18.0,1826.75,494.0,913.0,85.7325,1331.35,3.375,0.09,104.58037,-6.96851,104.646785,-7.03333,78.25,4.0,4.0
50%,35.0,3511.0,1260.0,2416.0,163.15,2413.83,6.32,0.165,111.09689,-3.12668,111.03125,-3.2281,155.5,12.0,12.0
75%,64.0,6403.75,7864.0,14925.0,388.7125,8527.07,12.9825,0.39,117.63696,-0.13224,117.233333,-0.016703,232.75,22.0,21.0
max,94.0,9471.0,21648.0,45162.0,2313.02,15500.91,84.52,2.34,139.3835,7.7956,140.77779,5.82164,309.0,34.0,33.0


#### Visualizing 3D data using PyDeck [(ColumnLayer)](https://deckgl.readthedocs.io/en/latest/gallery/column_layer.html)

In [34]:
# Define a layer to display on a map (3D Visualization based on regencies)
layer3 = pdk.Layer(
    'ColumnLayer',
    data=df,
    get_position=['longitude_reg', 'latitude_reg'],
    get_elevation='wasteperday',
    auto_highlight=True,
    elevation_scale=200,
    get_radius=100,
    get_fill_color=["wasteperday * 10", "wasteperday", "wasteperday * 10", 319.16],
    pickable=True,
    extruded=True,
    coverage=1
)

# Set the viewport location
view_state = pdk.ViewState(
    longitude=117.2841,
    latitude=-2.2331,
    zoom=2,
    min_zoom=5,
    max_zoom=15,
    pitch=40.5,
    bearing=-27.36)

tooltip = {
    "html": "<span style='color: blue;'>{province}</span>, <span style='color: blue;'>{regencies}</span> <br> There are <span style='color: red;'>{umk_count}</span> UMK's and <span style='color: red;'>{workers_count}</span> workers, producing <span style='color: red;'>{wasteperday}</span> (<span style='color: orange;'>{local_perc}</span>% local and <span style='color: orange;'>{nat_perc}</span>% national) tons of waste per day",
    "style": {"background": "grey", "color": "white", "font-family": '"Helvetica Neue", Arial', "z-index": "10000"},
}

r = pdk.Deck(layers=layer3, initial_view_state=view_state, tooltip=tooltip)
r.to_html('IndoWaste.html')

This layer helps users to understand **the comparison between numerical values in various locations**. For example, in figure (3.A), we can use ColumnLayer to show the number of waste piles per day in tons in various regencies. Then this layer also provides **interactivity on the map**, allowing users to explore the data further by hovering over the bars. Regarding design, this layer also allows **visual adjustments** such as color, bar height, and other styles, so we can customize the appearance according to our needs and visual preferences.

Based on fig. (3.A), we can see the regencies with **the highest number of piles of waste**, such as **Jakarta Timur** Regency, **Tangerang**, **Bekasi**, **Jakarta Barat** and **Jakarta Selatan** Regency. On Kalimantan Island, **Samarinda** Regency in East Kalimantan Province is the area that produces the most piles of waste (587.25 tons/day). On Sumatra Island, the highest production is in **Medan** Regency in North Sumatra Province (1722.6 tons/day). On Sulawesi Island, the highest production is in **Bone** Regency, South Sulawesi Province (405.8 tons/day). And on Papua Island, the highest production is in **Jayapura** Regency in Papua Province (217.9 tons/day)

#### Visualizing 3D data using PyDeck [(HeatmapLayer)](https://deckgl.readthedocs.io/en/latest/gallery/heatmap_layer.html)

In [35]:
df.describe()

Unnamed: 0,province_id,regency_id,umk_count,workers_count,wasteperday,wasteperday_count,local_perc,nat_perc,longitude_prov,latitude_prov,longitude_reg,latitude_reg,rank_wasteperday,rank_umk,rank_workers
count,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0,310.0
mean,42.383871,4267.332258,5210.225806,10407.574194,319.168806,5210.733742,11.240968,0.322645,111.877215,-3.043904,111.791181,-3.200731,155.364516,13.416129,12.803226
std,24.495645,2453.64331,6930.776692,14111.994088,399.685079,5539.670338,13.550531,0.403959,9.504734,3.792117,9.589806,3.944016,89.430739,9.592898,9.046267
min,11.0,1101.0,33.0,119.0,13.2,56.87,0.21,0.01,97.0253,-8.56568,95.31086,-10.21667,1.0,1.0,1.0
25%,18.0,1826.75,494.0,913.0,85.7325,1331.35,3.375,0.09,104.58037,-6.96851,104.646785,-7.03333,78.25,4.0,4.0
50%,35.0,3511.0,1260.0,2416.0,163.15,2413.83,6.32,0.165,111.09689,-3.12668,111.03125,-3.2281,155.5,12.0,12.0
75%,64.0,6403.75,7864.0,14925.0,388.7125,8527.07,12.9825,0.39,117.63696,-0.13224,117.233333,-0.016703,232.75,22.0,21.0
max,94.0,9471.0,21648.0,45162.0,2313.02,15500.91,84.52,2.34,139.3835,7.7956,140.77779,5.82164,309.0,34.0,33.0


In [36]:
df_prov.sort_values(by='wasteperday_count', ascending=False).head(5)

Unnamed: 0,province_id,province,umk_count,workers_count,wasteperday_count,longitude_prov,latitude_prov,coordinates_prov,prov_perc
12,33,JAWA TENGAH,14176,26630,15500.91,110.00441,-7.30324,"[110.00441, -7.30324]",15.52
14,35,JAWA TIMUR,17080,35477,13801.5,113.98005,-6.96851,"[113.98005, -6.96851]",13.82
11,32,JAWA BARAT,21648,45162,13410.01,107.64047,-6.88917,"[107.64047, -6.88917]",13.42
10,31,DKI JAKARTA,7864,14925,8527.07,106.8451,-6.2115,"[106.8451, -6.2115]",8.54
15,36,BANTEN,3645,8179,7199.63,106.13756,-6.44538,"[106.13756, -6.44538]",7.21


In [37]:
def add_trace(x1, name1, color1):
    fig.add_trace(go.Box(
    x=x1,
    name=name1,
    marker_color=color1
    ))

# top 5 waste
centraljava = df[df['province'] == "JAWA TENGAH"]['wasteperday']
eastjava = df[df['province'] == "JAWA TIMUR"]['wasteperday']
westjava = df[df['province'] == "JAWA BARAT"]['wasteperday']
jakarta = df[df['province'] == "DKI JAKARTA"]['wasteperday']
banten = df[df['province'] == "BANTEN"]['wasteperday']

fig = go.Figure()
add_trace(centraljava, "JAWA TENGAH", '#3D9970')
add_trace(eastjava, "JAWA TIMUR", '#FF4136')
add_trace(westjava, "JAWA BARAT", '#FF851B')
add_trace(jakarta, "DKI JAKARTA", '#1FDABF')
add_trace(banten, "BANTEN", '#E8274B')

fig.update_layout(
    xaxis=dict(title='Indonesia Total Waste per Day in 2022-2023 (Tons per Day)', zeroline=False),
    yaxis=dict(title="5 Top Provinces", zeroline=False)
)

fig.update_traces(orientation='h') # horizontal box plots
fig.show()

In [38]:
# Defining the upper limit of the wasteperday column as the assumed maximum tolerance
q1_waste = df['wasteperday'].quantile(0.25)
q3_waste = df['wasteperday'].quantile(0.75)

iqr_waste = q3_waste - q1_waste
upp_boundWaste = q3_waste + 1.5 * iqr_waste

# Showing count of outlier data that exceeds the upper limit of 'price' column
print(f'The upper bound of wasteperday is {upp_boundWaste}')

The upper bound of wasteperday is 843.1825000000001


In [39]:
# Determining upper and lower binning limits
bins_wasteperday = [13.20,163.150,843.1825000000001,2313.020]
labels_wasteperday = ['low', 'medium', 'high']

# Making new columnd and applying the binning function
df['concern_level'] = pd.cut(df['wasteperday'], bins=bins_wasteperday, labels=labels_wasteperday)

df.head(1)

Unnamed: 0,province_id,regency_id,province,regencies,umk_count,workers_count,wasteperday,wasteperday_count,local_perc,nat_perc,longitude_prov,latitude_prov,longitude_reg,latitude_reg,coordinates_prov,coordinates_reg,rank_wasteperday,rank_umk,rank_workers,concern_level
0,11,1103,ACEH,KABUPATEN ACEH SELATAN,494,869,96.49,1640.1,5.88,0.1,97.0253,4.36855,97.41667,3.16667,"[97.0253, 4.36855]","[97.41667, 3.16667]",220,22,22,low


In [40]:
df[df['concern_level'] == 'high'].shape[0]

29

In [41]:
df_low = df[df["concern_level"]=="low"]
df_medium = df[df["concern_level"]=="medium"]
df_high= df[df["concern_level"]=="high"]

# view_state = pdk.data_utils.compute_view(df_low[["longitude_reg", "latitude_reg"]])
# view_state.zoom = 4

low = pdk.Layer(
    "HeatmapLayer",
    df_low,
    opacity=0.9,
    get_position=["longitude_reg", "latitude_reg"],
    aggregation=pdk.types.String("MEAN"),
    color_range=[[240, 249, 232],[204, 235, 197],[168, 221, 181]],
    threshold=1,
    get_weight="wasteperday",
    pickable=True,
)

medium = pdk.Layer(
    "HeatmapLayer",
    data=df_medium,
    opacity=0.9,
    get_position=["longitude_reg", "latitude_reg"],
    threshold=0.75,
    color_range=[[123, 204, 196],[67, 162, 202],[8, 104, 172]],
    aggregation=pdk.types.String("MEAN"),
    get_weight="wasteperday",
    pickable=True,
)

high = pdk.Layer(
    "HeatmapLayer",
    data=df_high,
    opacity=0.9,
    get_position=["longitude_reg", "latitude_reg"],
    color_range=[[231,225,239],[201,148,199],[221,28,119]],
    threshold=0.75,
    aggregation=pdk.types.String("MEAN"),
    get_weight="wasteperday",
    pickable=True,
)

# Set the viewport location
view_state = pdk.ViewState(
    longitude=117.2841,
    latitude=-2.2331,
    zoom=2,
    min_zoom=3,
    max_zoom=15,
    pitch=40.5,
    bearing=-27.36)

r = pdk.Deck(
    layers=[low,medium,high],
    initial_view_state=view_state,
#     map_provider="mapbox",
#     map_style=pdk.map_styles.CARTO_ROAD,
    tooltip={"text": "Green brightness with low concern, Blue brightness with medium concern, pink brightness with high concern"},
)

r.to_html("heatmap_layer.html")
# display(HTML('<h1 style="text-align:center"><u>MUGLA REGION WILDFIRES(heatmap)</u></h1>'))
# IFrame(src='./heatmap_layer.html',width=1400,height=600)

Visualization of **the density or intensity of data** on a map can also be represented by HeatmapLayer. Each area on the map is colored to reflect the level of concern from the 'wasteperday' data in each regencies. It can be seen from Figure 3.C, on the island of Java, the reddest data (areas **with high levels of concern**) are in the **Jawa Barat** and **Jawa Timur** Provinces, then move to the east (**Jawa Tengah** to **Bali** Province). Meanwhile, on Sumatra Island, there are 4 points, they are **Medan** Regency in Sumatera Utara Province, **Pekanbaru** Regency in Riau Province, **Batam** Regency in Kepulauan Riau Province, and **Palembang** Regency in Sumatera Selatan Province. These are the areas that **require more regular handling and intensive treatment** by **increasing the number of workers/laborers and waste management sites/banks**, as well as **triggering local and national government for creating local regulations that regulate waste production** so that the environment in those areas is maintained.

### Weak points and advantages of using Pydeck in 3D data visualization

#### Advantages:
* **Captivating 3D Visualization**
<br>PyDeck provides the advantage of more engaging and detailed 3D data visualization. This allows users to more deeply understand the spatial distribution of waste.
* **Enhanced Interactivity**
<br>PyDeck allows for better interactivity, giving users the opportunity to explore data and gain further insights through the use of features such as tooltips.
* **Flexible Map Configurations**
<br>With PyDeck, users have greater control over the appearance of the map and can easily set visual properties such as color, size, and height.

#### Weak Points:
* **Likely Requiring Advanced Coding Skills**
<br>PyDeck, as a powerful Python library, may require a higher level of coding expertise compared to simpler visualization tools like Folium.
* **Limitations on Some Use Cases**
<br>In some use cases, especially for simple presentation purposes or basic mapping, 2D approaches such as Folium may be effective enough without requiring the complexity of 3D visualization.

#### Brief Insights:
Through the using of PyDeck, analysis of the spatial distribution of waste in Indonesia can be carried out in more depth. The resulting 3D visualization **provides a richer and more contextual understanding of waste patterns and density**, which can be the basis for more effective decision making in waste management in the future. While PyDeck has its complexities, the advantages it provides in spatial **understanding can be the key to more targeted and sustainable solutions**.


# Conclusion and Recommendation

Through this 3D data explorations, the article aims **to provide detailed insight into how visualization technologies such as PyDeck can help understand the spatial distribution of Indonesia waste piles in 2022**. By comparing with 2D approaches such as Folium and Choropleth maps, users will gain a **better understanding of the advantages and limitations of each approach in the context of increasingly complex waste management problems in Indonesia**.

### Recommendation
* Using 3D visualization by PyDeck, we can present **routes or locations of waste storage areas in a more informative and interesting way**. For example, we can use scatterplots to show the locations of waste storage areas and depict routes between locations.
* Using spatial data, we can visualize the route or location of waste banks in 3D. This can **help the community or related organizations in planning waste collection routes or designing waste management programs** by indicating the strategic location of waste banks.
* Leverage visualization to **increase stakeholder engagement**. This can be done by providing an **interactive platform or including features such as tooltips to provide additional information** about the data displayed.
* We can **recommend effective programs by identifying areas with the highest levels of waste production but whose human and organizational resources are still not sufficient to reach the goals** of sustainable development and good environmental management.


# References

[1] [https://www.kaggle.com/discussions/general/331623](https://www.kaggle.com/discussions/general/331623)

[2] [https://deckgl.readthedocs.io/](https://deckgl.readthedocs.io/)

[3] [https://github.com/UnfoldedInc/pydeck](https://github.com/UnfoldedInc/pydeck)