# Data analysis demonstration of Thantham's master thesis

This demonstration is to show up how the fused lidar metrics and muiltispectral features looks like within forestry data in order to perform tree species classification

In [71]:
!pip install pandas matplotlib scikit-learn geopandas folium mapclassify --quiet

In [72]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

## Read pre-saved tree metrics with multispectral features

In [73]:
data = pd.read_csv('./tree_DBH50_metrics_spctrl.csv')

Show tubular data table of tree data

In [74]:
data

Unnamed: 0,Tag,SpeciesNam,DBH,treeXCoord,treeYCoord,treeID,i_mn_fr,h_max,i_mn_sn,i_mean,...,i_iq,ndgi,src,si,b2norm,avi,rr,ndwi,ndvi,b4norm
0,32376,Adinandra integerrima,52.5,754820.29999,1.597838e+06,775,109.338321,32.623,241.574713,77.441200,...,72.0,-0.06162,2.71088,0.96677,0.23942,0.34266,7.08338,0.31263,0.75258,0.11152
1,21160,Aglaia lawii,51.0,754584.90000,1.597701e+06,498,110.581861,36.462,149.174769,73.848466,...,83.0,-0.17606,3.22549,0.97224,0.14996,0.28775,5.77324,0.26748,0.70472,0.13380
2,21453,Aglaia lawii,65.6,754589.60000,1.597405e+06,662,156.379569,33.933,234.489918,108.908314,...,138.0,-0.08789,3.00893,0.96346,0.36062,0.33848,6.23364,0.35341,0.72351,0.12388
3,22233,Aglaia lawii,82.3,754617.80000,1.597479e+06,305,163.558385,39.547,236.933900,117.176872,...,151.0,-0.10811,3.40997,0.97222,0.18918,0.30883,6.78717,0.28637,0.74317,0.11639
4,40518,Aglaia lawii,54.0,754980.29999,1.597423e+06,487,169.232859,36.785,239.587017,119.818710,...,157.5,-0.13586,2.62849,0.96760,0.25950,0.31538,5.97200,0.27964,0.71314,0.12932
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
768,45464,Toona ciliata,52.1,755078.29999,1.597863e+06,1818,133.851341,20.516,187.685306,98.305429,...,118.0,-0.07549,4.21358,0.97121,0.17081,0.33376,7.64407,0.36615,0.76863,0.10522
769,46196,Toona ciliata,65.0,755095.00000,1.597545e+06,708,171.106418,32.880,233.716504,116.341742,...,165.0,-0.08906,3.15863,0.96875,0.24237,0.33558,7.07334,0.32412,0.75227,0.11224
770,29387,Triadica cochinchinensis,51.0,754758.89999,1.597858e+06,486,119.051003,36.394,194.002604,80.183818,...,92.0,-0.04937,2.43277,0.96215,0.31517,0.33173,6.03912,0.30293,0.71587,0.12587
771,225282,Livistona jenkinsiana,53.0,754615.25000,1.597477e+06,305,163.558385,39.547,236.933900,117.176872,...,151.0,-0.09823,3.60662,0.97223,0.20450,0.30821,6.82462,0.29755,0.74440,0.11566


## Simple machine learning model to perform classification

most basic and fastest ML model to be demonstration is Random Forest Classifier

In [75]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

Since the labels are textual, but it is not problamtic for tree-based model, we are good to go using texual labels as tree specie names

In [76]:
y = data['SpeciesNam']
X = data.iloc[:, 6:]

Splitting data of X and y with 30% testing ratio

In [77]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Initialize model

In [78]:
model = RandomForestClassifier()

Train the model with training set `X_train` and `y_train`

In [79]:
model.fit(X_train, y_train)

It would be fast to use RF with short time training. Then, predict Tree species using `X_test`

In [80]:
y_pred = model.predict(X_test)

Show classification report by comparing `y_test` with `y_pred`

In [81]:
print(classification_report(y_test, y_pred))

                            precision    recall  f1-score   support

  Acrocarpus fraxinifolius       0.00      0.00      0.00         1
              Aglaia lawii       0.00      0.00      0.00         2
         Alphonsea boniana       0.00      0.00      0.00         8
          Altingia excelsa       0.00      0.00      0.00         1
   Anthocephalus chinensis       0.00      0.00      0.00         2
        Antiaris toxicaria       0.00      0.00      0.00         2
         Aquilaria crassna       0.00      0.00      0.00         2
          Balakata baccata       0.14      0.12      0.13         8
    Beilschmiedia maingayi       0.00      0.00      0.00         1
             Bhesa robusta       0.00      0.00      0.00         1
    Buchanania arborescens       0.00      0.00      0.00         1
        Canarium euphyllum       0.00      0.00      0.00         1
        Carallia brachiata       0.00      0.00      0.00         3
Castanopsis acuminatissima       0.11      0.09

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Since we seee qustioning result from testing performance, try to report by all dataset

In [83]:
print(classification_report(y, model.predict(X)))

                             precision    recall  f1-score   support

   Acrocarpus fraxinifolius       0.00      0.00      0.00         1
      Adinandra integerrima       1.00      1.00      1.00         1
               Aglaia lawii       1.00      0.60      0.75         5
       Albizia attopeuensis       1.00      1.00      1.00         1
          Alphonsea boniana       0.92      0.60      0.73        20
         Alstonia scholaris       1.00      1.00      1.00         1
           Altingia excelsa       1.00      0.88      0.93         8
    Anthocephalus chinensis       1.00      0.60      0.75         5
         Antiaris toxicaria       0.00      0.00      0.00         2
       Aphananthe cuspidata       1.00      1.00      1.00         3
         Apodytes dimidiata       1.00      1.00      1.00         1
          Aquilaria crassna       0.67      0.67      0.67         6
           Balakata baccata       0.79      0.76      0.77        29
Beilschmiedia affintermedia      

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Training set and Testing set were blended together. result is somehow acceptable. Next, append the predictions to original dataset

In [84]:
data['pred'] = model.predict(X)

Since we are handling semi-geospatial data, we can transform it and show on this jupyter using folium map module

In [85]:
gdf = gpd.GeoDataFrame(
    data, geometry=gpd.points_from_xy(data.treeXCoord, data.treeYCoord), crs="EPSG:32648"
)

In [86]:
m = gdf.explore(tiles = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}',
                attr='ESRI Imagery',
                width='100%',
                height='900px')

In [87]:
m