# Plastic Map

The Plastic Map was created by the Remote Sensing of Coastal and Urban Environments (RESCUE) research group at Federal University of Rio Grande do Sul (UFRGS), with the contributions of researchers Bianca Matos de Barros, Cristiano Lima Hackmann, and Douglas Galimberti Barbosa. If you'd like to get in touch, please send an email to bianca.matos@ufrgs.br.

## Imports

First, we begin by importing the necessary Python libraries for the algorithm. Please ensure that these libraries are installed in your Python environment for the imports to function correctly.

In [None]:
from imblearn.combine import SMOTETomek
from modules import dart_files, tiff_files, rsdata_classification, rsdata_charts
from scipy.stats import ks_2samp
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
from sklearn.metrics import accuracy_score, balanced_accuracy_score, classification_report, confusion_matrix, f1_score, fbeta_score, jaccard_score, log_loss, precision_score, recall_score, roc_auc_score
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder
from tpot import TPOTClassifier
from xgboost import XGBClassifier
#import datetime
import matplotlib.pyplot as plt
#import openpyxl
import os
import pandas as pd
import seaborn as sns
#%matplotlib inline
#import shutil

## Parameters

We are now defining the feature names (columns) that will serve as input for the classifiers.

The 'feature_names' variable contains the names of the MSI/Sentinel-2 sensor bands with spatial resolution equal to 10 meters or 20 meters. The 'radiometric_indexes' variable contains the indices presented by [1] and [2]. 

Note: The NDMI index was also evaluated in [1], but its formula was identical to that of the NDWI index. Therefore, the NDMI index was not considered here.

In [None]:
feature_names = ['Blue', 'Green', 'Red', 'RedEdge1', 'RedEdge2', 'RedEdge3', 'NIR1', 'NIR2', 'SWIR1', 'SWIR2']
radiometric_indexes = ['NDWI', 'WRI', 'NDVI', 'AWEI', 'MNDWI', 'SR', 'PI', 'RNDVI', 'FDI', 'PWDI']

In this repository, files are organized in the following directory structure:
- **charts**: Charts from data analysis and classification.
- **modules**: Python modules developed for this application.
- **files/csv_files**: Reflectance values from compiled datasets, with each CSV file containing a dataset.
- **files/dart_files**: DART (Discrete Anisotropic Radiative Transfer) simulations converted to ASC format.
- **files/tiff_files**: MSI/Sentinel-2 images after atmospheric correction, cropping of regions of interest, and conversion to TIFF format.


## Observed dataset (Copernicus / USGS)

We sought images acquired by orbital remote sensing with in situ confirmation of plastic presence in coastal regions. The selection criteria were the in situ confirmation and the availability of target georeferencing data, to avoid uncertainties or limitations in classification. We selected both datasets that matched these criteria: the 2019 [3] and 2021 [4] Plastic Litter Project (PLP) data. The PLP has carried out annual experiments to implant artificial plastic targets during the overpass of satellites over Greek beaches since 2018. The Marine Remote Sensing Group (MRSG), affiliated with the Department of Marine Sciences at the University of the Aegean, is responsible for the project. All images were acquired using the Sentinel-2/MSI sensor.

### Loading data from coast area 

In [None]:
path = "files/tiff_files/coast"
path, sources, dates = tiff_files.open_folders(path)

In [None]:
os.chdir('../')
os.chdir('../')
os.chdir('../')

In [None]:
tiff_data = tiff_files.get_images(path, sources, dates)

In [None]:
for image in tiff_data:
    image.setAreaLabel(0, (int(image.getXSize()) - 1), 0, (image.getYSize() - 1), "Coast") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent(100)
        pixel.setPolymer("None")

In [None]:
tiff_coastal_dataset = tiff_files.build_dataset(tiff_data)

### Loading data from sea area

In [None]:
path = "files/tiff_files/sea"
path, sources, dates = tiff_files.open_folders(path) 

In [None]:
os.chdir('../')
os.chdir('../')
os.chdir('../')

In [None]:
tiff_data = tiff_files.get_images(path, sources, dates) 

#### Labeling the entire area as Water

In [None]:
for image in tiff_data:
    image.setAreaLabel(0, (int(image.getXSize()) - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1     
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent(100)
        pixel.setPolymer("None")

#### Labeling the artificial targets as plastic and wood

In [None]:
for image in tiff_data:
    if image.getDate() == "2019_04_18":
        #A1 100% Water
        image.setPixelLabel(6, 3, "Plastic")  #A2
        image.setPixelLabel(6, 4, "Plastic")  #A3 
        image.setPixelLabel(7, 2, "Plastic")  #A4
        image.setPixelLabel(7, 3, "Plastic")  #A5 
        image.setPixelLabel(7, 4, "Plastic")  #A6
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 6 and pixel.getColumn() == 3: #A2
                pixel.setCoverPercent(30) #Bags + bottles
                pixel.setPolymer("Bags and Bottles")
            elif pixel.getLine() == 6 and pixel.getColumn() == 4: #A3
                pixel.setCoverPercent(18)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 7 and pixel.getColumn() == 2: #A4
                pixel.setCoverPercent(38)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 7 and pixel.getColumn() == 3: #A5
                pixel.setCoverPercent(33) #Bags + bottles
                pixel.setPolymer("Bags and Bottles")
            elif pixel.getLine() == 7 and pixel.getColumn() == 4: #A6
                pixel.setCoverPercent(1)
                pixel.setPolymer("Bottles")
    elif image.getDate() == "2019_05_03":
        image.setPixelLabel(1, 15, "Plastic")  #A1
        image.setPixelLabel(1, 16, "Plastic")  #A2
        image.setPixelLabel(2, 15, "Plastic")  #A3 
        image.setPixelLabel(2, 16, "Plastic")  #A4
        image.setPixelLabel(3, 10, "Plastic")  #B1
        image.setPixelLabel(3, 11, "Plastic")  #B2 
        image.setPixelLabel(4, 11, "Plastic")  #B3
        image.setPixelLabel(5, 7, "Plastic")  #C1
        image.setPixelLabel(5, 8, "Plastic")  #C2
        image.setPixelLabel(6, 7, "Plastic")  #C3 
        image.setPixelLabel(6, 8, "Plastic")  #C4
        image.setPixelLabel(5, 12, "Plastic")  #D1
        image.setPixelLabel(5, 13, "Plastic")  #D2
        image.setPixelLabel(6, 12, "Plastic")  #D3 
        image.setPixelLabel(6, 13, "Plastic")  #D4
        image.setPixelLabel(9, 2, "Plastic")  #E1
        image.setPixelLabel(9, 3, "Plastic")  #E2
        #E3 and E4 100% Water
        image.setPixelLabel(11, 7, "Plastic")  #F1
        image.setPixelLabel(11, 8, "Plastic")  #F2
        image.setPixelLabel(11, 9, "Plastic")  #F3 
        #F4 and F5 100% Water
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 1 and pixel.getColumn() == 15: #A1
                pixel.setCoverPercent(15)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 1 and pixel.getColumn() == 16: #A2
                pixel.setCoverPercent(43)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 2 and pixel.getColumn() == 15: #A3
                pixel.setCoverPercent(1)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 2 and pixel.getColumn() == 16: #A4
                pixel.setCoverPercent(2)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 3 and pixel.getColumn() == 10: #B1
                pixel.setCoverPercent(1)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 3 and pixel.getColumn() == 11: #B2
                pixel.setCoverPercent(38)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 4 and pixel.getColumn() == 11: #B3
                pixel.setCoverPercent(8)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 5 and pixel.getColumn() == 7: #C1
                pixel.setCoverPercent(9)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 5 and pixel.getColumn() == 8: #C2
                pixel.setCoverPercent(5)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 6 and pixel.getColumn() == 7: #C3
                pixel.setCoverPercent(18)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 6 and pixel.getColumn() == 8: #C4
                pixel.setCoverPercent(14)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 5 and pixel.getColumn() == 12: #D1
                pixel.setCoverPercent(3)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 5 and pixel.getColumn() == 13: #D2
                pixel.setCoverPercent(1)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 6 and pixel.getColumn() == 12: #D3
                pixel.setCoverPercent(2)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 6 and pixel.getColumn() == 13: #D4
                pixel.setCoverPercent(9)
                pixel.setPolymer("Bags") #Reeds ignored
                #elif pixel.getLine() == 6 and pixel.getColumn() == 14: #Reeds ignored
            elif pixel.getLine() == 9 and pixel.getColumn() == 2: #E1
                pixel.setCoverPercent(13)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 9 and pixel.getColumn() == 3: #E2
                pixel.setCoverPercent(27)
                pixel.setPolymer("Bottles")
            #E3 and E4 100% Water
            elif pixel.getLine() == 11 and pixel.getColumn() == 7: #F1
                pixel.setCoverPercent(10)
                pixel.setPolymer("Bottles") 
            elif pixel.getLine() == 11 and pixel.getColumn() == 8: #F2
                pixel.setCoverPercent(21)
                pixel.setPolymer("Bottles") 
            elif pixel.getLine() == 11 and pixel.getColumn() == 9: #F3
                pixel.setCoverPercent(2)
                pixel.setPolymer("Bottles") 
            #F4 and F5 100% Water
    elif image.getDate() == "2019_05_18":
        image.setPixelLabel(16, 2, "Plastic")  #A1  
        #A2 100% Water
        image.setPixelLabel(17, 2, "Plastic")  #A3
        image.setPixelLabel(17, 3, "Plastic")  #A4
        #B1 100% Water
        image.setPixelLabel(12, 5, "Plastic")  #B2
        image.setPixelLabel(13, 4, "Plastic")  #B3  
        image.setPixelLabel(13, 5, "Plastic")  #B4
        image.setPixelLabel(5, 5, "Plastic")  #C1
        image.setPixelLabel(5, 6, "Plastic")  #C2
        image.setPixelLabel(6, 5, "Plastic")  #C3
        image.setPixelLabel(6, 6, "Plastic")  #C4  
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 16 and pixel.getColumn() == 2: #A1
                pixel.setCoverPercent(17)
                pixel.setPolymer("Bags")
            #A2 100% Water  
            elif pixel.getLine() == 17 and pixel.getColumn() == 2: #A3
                pixel.setCoverPercent(27)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 17 and pixel.getColumn() == 3: #A4
                pixel.setCoverPercent(3)
                pixel.setPolymer("Bags")
            #B1 100% Water
            elif pixel.getLine() == 12 and pixel.getColumn() == 5: #B2
                pixel.setCoverPercent(2)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 13 and pixel.getColumn() == 4: #B3
                pixel.setCoverPercent(1)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 13 and pixel.getColumn() == 5: #B4
                pixel.setCoverPercent(10)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 5 and pixel.getColumn() == 5: #C1
                pixel.setCoverPercent(5)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 5 and pixel.getColumn() == 6: #C2
                pixel.setCoverPercent(6)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 6 and pixel.getColumn() == 5: #C3
                pixel.setCoverPercent(10)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 6 and pixel.getColumn() == 6: #C4
                pixel.setCoverPercent(40)
                pixel.setPolymer("Bottles")
    elif image.getDate() == "2019_05_28":
        image.setPixelLabel(0, 8, "Plastic")  #A1  
        image.setPixelLabel(1, 8, "Plastic")  #A2  
        image.setPixelLabel(1, 9, "Plastic")  #A3
        image.setPixelLabel(4, 6, "Plastic")  #B1  
        image.setPixelLabel(5, 6, "Plastic")  #B2
        image.setPixelLabel(7, 3, "Plastic")  #C1  
        #C2 100% Water 
        image.setPixelLabel(8, 3, "Plastic")  #C3
        image.setPixelLabel(8, 4, "Plastic")  #C4  
        #C5 100% Water 
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 0 and pixel.getColumn() == 8: #A1
                pixel.setCoverPercent(7)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 1 and pixel.getColumn() == 8: #A2
                pixel.setCoverPercent(10)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 1 and pixel.getColumn() == 9: #A3
                pixel.setCoverPercent(13)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 4 and pixel.getColumn() == 6: #B1
                pixel.setCoverPercent(5)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 5 and pixel.getColumn() == 6: #B2
                pixel.setCoverPercent(8)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 7 and pixel.getColumn() == 3: #C1
                pixel.setCoverPercent(2)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 8 and pixel.getColumn() == 3: #C3
                pixel.setCoverPercent(35)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 8 and pixel.getColumn() == 4: #C4
                pixel.setCoverPercent(18)
                pixel.setPolymer("Bottles")
    elif image.getDate() == "2019_06_07":
        image.setPixelLabel(1, 4, "Plastic")  #A1  
        image.setPixelLabel(1, 5, "Plastic")  #A2  
        #image.setPixelLabel(2, 5, "Plastic")  #A3
        #image.setPixelLabel(5, 5, "Plastic")  #B1  
        #image.setPixelLabel(5, 6, "Plastic")  #B2
        #image.setPixelLabel(6, 5, "Plastic")  #B3  
        #image.setPixelLabel(6, 6, "Plastic")  #B4
        image.setPixelLabel(9, 2, "Plastic")  #C1  
        image.setPixelLabel(9, 3, "Plastic")  #C2 
        image.setPixelLabel(9, 4, "Plastic")  #C3
        image.setPixelLabel(10, 2, "Plastic")  #C4  
        image.setPixelLabel(10, 3, "Plastic")  #C5 
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 1 and pixel.getColumn() == 4: #A1
                pixel.setCoverPercent(4)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 1 and pixel.getColumn() == 5: #A2
                pixel.setCoverPercent(9)
                pixel.setPolymer("Bottles")
            #A3 100% Water 
            #B1 Reeds ignored
            #B2 Reeds ignored
            #B3 Reeds ignored
            #B4 Reeds ignored
            elif pixel.getLine() == 9 and pixel.getColumn() == 2: #C1
                pixel.setCoverPercent(3)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 9 and pixel.getColumn() == 3: #C2
                pixel.setCoverPercent(55)
                pixel.setPolymer("Bags and Bottles")
            elif pixel.getLine() == 9 and pixel.getColumn() == 4: #C3
                pixel.setCoverPercent(1)
                pixel.setPolymer("Bottles")
            elif pixel.getLine() == 10 and pixel.getColumn() == 2: #C4
                pixel.setCoverPercent(4)
                pixel.setPolymer("Bags")
            elif pixel.getLine() == 10 and pixel.getColumn() == 3: #C5
                pixel.setCoverPercent(15)
                pixel.setPolymer("Bags and Bottles")      

In [None]:
for image in tiff_data:
    if image.getDate() == "2021_06_21":
        image.setPixelLabel(3, 5, "Plastic") 
        image.setPixelLabel(3, 6, "Plastic")
        image.setPixelLabel(4, 4, "Plastic")
        image.setPixelLabel(4, 5, "Plastic") 
        image.setPixelLabel(4, 6, "Plastic")
        image.setPixelLabel(4, 7, "Plastic")
        image.setPixelLabel(5, 4, "Plastic")
        image.setPixelLabel(5, 5, "Plastic") 
        image.setPixelLabel(5, 6, "Plastic")
        image.setPixelLabel(8, 3, "Wood") 
        image.setPixelLabel(8, 4, "Wood")
        image.setPixelLabel(8, 5, "Wood")
        image.setPixelLabel(9, 3, "Wood") 
        image.setPixelLabel(9, 4, "Wood")
        image.setPixelLabel(9, 5, "Wood")
        image.setPixelLabel(9, 6, "Wood") 
        image.setPixelLabel(10, 2, "Wood") 
        image.setPixelLabel(10, 3, "Wood") 
        image.setPixelLabel(10, 4, "Wood")
        image.setPixelLabel(10, 5, "Wood")
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 3: 
                if pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 4: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 7: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 5: 
                if pixel.getColumn() == 4: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 8: 
                if pixel.getColumn() == 3 or pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
            elif pixel.getLine() == 9: 
                if pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                if pixel.getColumn() == 3 or pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 10: 
                if pixel.getColumn() == 2 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
                elif pixel.getColumn() == 3 or pixel.getColumn() == 4: 
                    pixel.setCoverPercent(-100)
    elif image.getDate() == "2021_07_01":
        image.setPixelLabel(3, 3, "Plastic") 
        image.setPixelLabel(3, 4, "Plastic")
        image.setPixelLabel(3, 5, "Plastic")
        image.setPixelLabel(4, 3, "Plastic") 
        image.setPixelLabel(4, 4, "Plastic")
        image.setPixelLabel(4, 5, "Plastic")
        image.setPixelLabel(5, 3, "Plastic") 
        image.setPixelLabel(5, 4, "Plastic")
        image.setPixelLabel(5, 5, "Plastic")
        image.setPixelLabel(6, 3, "Plastic")
        image.setPixelLabel(6, 4, "Plastic") 
        image.setPixelLabel(6, 5, "Plastic")
        image.setPixelLabel(8, 2, "Wood")
        image.setPixelLabel(8, 3, "Wood")
        image.setPixelLabel(9, 1, "Wood") 
        image.setPixelLabel(9, 2, "Wood")
        image.setPixelLabel(9, 3, "Wood")
        image.setPixelLabel(9, 4, "Wood")
        image.setPixelLabel(10, 1, "Wood") 
        image.setPixelLabel(10, 2, "Wood")
        image.setPixelLabel(10, 3, "Wood")
        image.setPixelLabel(10, 4, "Wood")
        image.setPixelLabel(11, 2, "Wood") 
        image.setPixelLabel(11, 3, "Wood") 
        image.setPixelLabel(11, 4, "Wood") 
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 3: 
                if pixel.getColumn() == 3 or pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 4: 
                if pixel.getColumn() == 3: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 4 or pixel.getColumn() == 5:
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 5: 
                if pixel.getColumn() == 3 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 4: 
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 6: 
                if pixel.getColumn() == 3 or pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 8: 
                if pixel.getColumn() == 2 or pixel.getColumn() == 3: 
                    pixel.setCoverPercent(-1)
            elif pixel.getLine() == 9: 
                if pixel.getColumn() == 1: 
                    pixel.setCoverPercent(-1)
                elif pixel.getColumn() == 2 or pixel.getColumn() == 3 or pixel.getColumn() == 4: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 10: 
                if pixel.getColumn() == 1: 
                    pixel.setCoverPercent(-1)
                elif pixel.getColumn() == 2 or pixel.getColumn() == 3 or pixel.getColumn() == 4: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 11: 
                if pixel.getColumn() == 2 or pixel.getColumn() == 3 or pixel.getColumn() == 4: 
                    pixel.setCoverPercent(-1)
    elif image.getDate() == "2021_07_06":
        image.setPixelLabel(2, 5, "Plastic")
        image.setPixelLabel(2, 6, "Plastic")
        image.setPixelLabel(3, 4, "Plastic") 
        image.setPixelLabel(3, 5, "Plastic")
        image.setPixelLabel(3, 6, "Plastic")
        image.setPixelLabel(4, 4, "Plastic") 
        image.setPixelLabel(4, 5, "Plastic")
        image.setPixelLabel(4, 6, "Plastic")
        image.setPixelLabel(5, 4, "Plastic") 
        image.setPixelLabel(5, 5, "Plastic")
        image.setPixelLabel(5, 6, "Plastic") 
        image.setPixelLabel(7, 3, "Wood")
        image.setPixelLabel(7, 4, "Wood")
        image.setPixelLabel(8, 3, "Wood")
        image.setPixelLabel(8, 4, "Wood")
        image.setPixelLabel(8, 5, "Wood")
        image.setPixelLabel(8, 6, "Wood") 
        image.setPixelLabel(9, 3, "Wood")
        image.setPixelLabel(9, 4, "Wood")
        image.setPixelLabel(9, 5, "Wood")
        image.setPixelLabel(10, 3, "Wood")
        image.setPixelLabel(10, 4, "Wood")
        image.setPixelLabel(10, 5, "Wood")
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 2: 
                if pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 3: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 4: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 5: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 7: 
                if pixel.getColumn() == 3 or pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
            elif pixel.getLine() == 8: 
                if pixel.getColumn() == 3 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                if pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 9: 
                if pixel.getColumn() == 3 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
                elif pixel.getColumn() == 4: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 10: 
                if pixel.getColumn() == 3 or pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
    elif image.getDate() == "2021_07_21":
        image.setPixelLabel(2, 6, "Plastic")
        image.setPixelLabel(2, 7, "Plastic")
        image.setPixelLabel(2, 8, "Plastic")
        image.setPixelLabel(3, 6, "Plastic")
        image.setPixelLabel(3, 7, "Plastic")
        image.setPixelLabel(3, 8, "Plastic")
        image.setPixelLabel(4, 6, "Plastic")
        image.setPixelLabel(4, 7, "Plastic")
        image.setPixelLabel(4, 8, "Plastic")
        image.setPixelLabel(5, 6, "Plastic") 
        image.setPixelLabel(5, 7, "Plastic") 
        image.setPixelLabel(7, 4, "Wood")
        image.setPixelLabel(7, 5, "Wood")
        image.setPixelLabel(7, 6, "Wood")
        image.setPixelLabel(8, 4, "Wood")
        image.setPixelLabel(8, 5, "Wood")
        image.setPixelLabel(8, 6, "Wood")
        image.setPixelLabel(8, 7, "Wood")
        image.setPixelLabel(9, 4, "Wood")
        image.setPixelLabel(9, 5, "Wood")
        image.setPixelLabel(9, 6, "Wood")
        image.setPixelLabel(9, 7, "Wood") 
        image.setPixelLabel(10, 5, "Wood")
        image.setPixelLabel(10, 6, "Wood")
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 2: 
                if pixel.getColumn() == 6 or pixel.getColumn() == 7 or pixel.getColumn() == 8: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 3: 
                if pixel.getColumn() == 8: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 6 or pixel.getColumn() == 7: 
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 4: 
                if pixel.getColumn() == 6 or pixel.getColumn() == 8:  
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 7:  
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 5: 
                if pixel.getColumn() == 6 or pixel.getColumn() == 7:  
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 7: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
            elif pixel.getLine() == 8: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 7: 
                    pixel.setCoverPercent(-1)
                if pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 9: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 7: 
                    pixel.setCoverPercent(-1)
                if pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 10: 
                if pixel.getColumn() == 5 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
    elif image.getDate() == "2021_08_25": 
        image.setPixelLabel(1, 4, "Plastic") 
        image.setPixelLabel(1, 5, "Plastic") 
        image.setPixelLabel(2, 4, "Plastic")
        image.setPixelLabel(2, 5, "Plastic")
        image.setPixelLabel(2, 6, "Plastic")
        image.setPixelLabel(3, 4, "Plastic")
        image.setPixelLabel(3, 5, "Plastic")
        image.setPixelLabel(3, 6, "Plastic")
        image.setPixelLabel(4, 4, "Plastic")
        image.setPixelLabel(4, 5, "Plastic")
        image.setPixelLabel(4, 6, "Plastic")
        image.setPixelLabel(6, 3, "Wood") 
        image.setPixelLabel(7, 2, "Wood") 
        image.setPixelLabel(7, 3, "Wood") 
        image.setPixelLabel(7, 4, "Wood")
        image.setPixelLabel(7, 5, "Wood")
        image.setPixelLabel(8, 2, "Wood") 
        image.setPixelLabel(8, 3, "Wood")
        image.setPixelLabel(8, 4, "Wood")
        image.setPixelLabel(8, 5, "Wood")
        image.setPixelLabel(9, 2, "Wood") 
        image.setPixelLabel(9, 3, "Wood")
        image.setPixelLabel(9, 4, "Wood")
        image.setPixelLabel(9, 5, "Wood")
        for pixel in image.getPixels():
            pixel.setLabel(image.getLabelsMap())
            if pixel.getLine() == 1: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            if pixel.getLine() == 2: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 3: 
                if pixel.getColumn() == 6: 
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
                elif pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-100)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 4: 
                if pixel.getColumn() == 4 or pixel.getColumn() == 5 or pixel.getColumn() == 6:  
                    pixel.setCoverPercent(-1)
                    pixel.setPolymer("HDPE mesh")
            elif pixel.getLine() == 6: 
                if pixel.getColumn() == 3: 
                    pixel.setCoverPercent(-1)
            elif pixel.getLine() == 7: 
                if pixel.getColumn() == 2 or pixel.getColumn() == 4 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
                elif pixel.getColumn() == 3: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 8: 
                if pixel.getColumn() == 2 or pixel.getColumn() == 5: 
                    pixel.setCoverPercent(-1)
                if pixel.getColumn() == 3 or pixel.getColumn() == 4: 
                    pixel.setCoverPercent(-100)
            elif pixel.getLine() == 9: 
                if pixel.getColumn() == 2 or pixel.getColumn() == 3 or pixel.getColumn() == 4 or pixel.getColumn() == 5:
                    pixel.setCoverPercent(-1)

In [None]:
tiff_marine_dataset = tiff_files.build_dataset(tiff_data)

In [None]:
tiff_dataset = pd.concat([tiff_marine_dataset, tiff_coastal_dataset], ignore_index=True)
tiff_dataset

In [None]:
tiff_dataset.to_csv("files/csv_files/dataset_usgs.csv")

### Building subdataframes

In [None]:
usgs_2021_acolite_10m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_usgs.csv'), 'USGS 2019/2021 Acolite 10m', feature_names, radiometric_indexes)
usgs_2021_acolite_10m_subdatasets = dart_files.get_subdatasets(usgs_2021_acolite_10m)

## Simulated dataset (DART)

DART files are organized within a directory structure that includes folders identifying the simulations and sensor bands, and files with standard names in MPR or MP# format. To compile reflectance values from the simulated data, you need to convert the files to ASC format. This conversion can be performed using GIS software (such as QGIS). Additionally, you need to rename the files to indicate the simulation and the specific band they contain. In this project, there are two sets of simulated data, one produced in 2021 and another in 2023. 

### 2021 Simulated dataset (DART 2021)

The 2021 dataset was configured with simpler simulations, featuring little variation in elements and environmental conditions. Plastic objects were created in a square shape and distributed on the surface of water bodies with areas of 100 m² (covering 100% of the MSI sensor pixel with a 10m spatial resolution), 64 m² (80%), 36 m² (60%), 16 m² (40%), and 4 m² (20%). The spectral signatures of LDPE, Nylon, PET, PP, PVC, [5] and MicroNapo [6,7] (a mixed signature of microplastics collected in the Pacific Ocean) were applied to these objects in different simulated scenes. The spectral signatures of sand and water were provided by DART itself. The bands with a spatial resolution of 10 and 20 meters were considered (Blue, Green, Red, RedEdge1, RedEdge2, RedEdge3, NIR1, NIR2, SWIR1, SWIR2). The ASC files were stored in a folder structure organized according to the relevant information: polymer type, coverage percentage, and wavelength band. For example, the simulation of the Blue band for LDPE in objects with 100% pixel coverage over water is stored in 'files/dart_files/1200x1200/LDPE/100/LDPE_Blue_100.asc'.

The DART files were converted to the ASC format using QGIS. The following source code compiles the information from all ASC files into a CSV file by:  
1) Extracting relevant information from the directory structure and files;  
2) Labeling pixels (water, sand, or plastic);  
3) Merging the data into a CSV table;  
4) Processing and exporting the data;
5) Resampling (upscaling only) using three different methods for evaluation: nearest neighbor, bilinear interpolation, and cubic interpolation.

In [None]:
path, polymers, cover_percent = dart_files.open_folders('files/dart_files/1200x1200')

#### Resampling 

DART simulations consist of files containing numerical matrices representing reflectance values in each sensor band. Due to the varied spatial resolutions of the simulated sensor bands, the files exhibit matrices of different sizes. Therefore, compiling simulated information necessitates a resampling strategy.

In this section, we create datasets by compiling the 2021 DART simulated database, while experimenting with various resampling strategies for subsequent evaluation.

Resampling 20 meter bands by nearest neighbor

In [None]:
nn_data = dart_files.get_old_images(path, polymers, cover_percent, "nearest")

In [None]:
for image in nn_data:
    image.setAreaLabel(0, (int(image.getXSize() / 2) - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    image.setAreaLabel(int(image.getXSize() / 2), (image.getXSize() - 1), 0, (image.getYSize() - 1), "Sand") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    image.setGridLabel(29, 89, 4, 27, 87, 4, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
dart_nn = dart_files.build_old_dataset(nn_data)

In [None]:
dart_nn['Polymer'] = [x.split('/')[0] for x in dart_nn['Path']]

In [None]:
os.chdir('../../../')

In [None]:
dart_nn.to_csv("files/csv_files/dataset_dart_2021_nn.csv")

Resampling 20 meter bands by bilinear interpolation

In [None]:
os.chdir(path)
bilinear_data = dart_files.get_old_images(path, polymers, cover_percent, "bilinear")

In [None]:
for image in bilinear_data:
    image.setAreaLabel(0, (int(image.getXSize() / 2) - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    image.setAreaLabel(int(image.getXSize() / 2), (image.getXSize() - 1), 0, (image.getYSize() - 1), "Sand") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    image.setGridLabel(29, 89, 4, 27, 87, 4, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
dart_bilinear = dart_files.build_old_dataset(bilinear_data)

In [None]:
dart_bilinear['Polymer'] = [x.split('/')[0] for x in dart_bilinear['Path']]

In [None]:
os.chdir('../../../')

In [None]:
dart_bilinear.to_csv('files/csv_files/dataset_dart_2021_bilinear.csv')

Resampling 20 meter bands by cubic interpolation

In [None]:
os.chdir(path)
cubic_data = dart_files.get_old_images(path, polymers, cover_percent, "cubic")

In [None]:
for image in cubic_data:
    image.setAreaLabel(0, (int(image.getXSize() / 2) - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    image.setAreaLabel(int(image.getXSize() / 2), (image.getXSize() - 1), 0, (image.getYSize() - 1), "Sand") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    image.setGridLabel(29, 89, 4, 27, 87, 4, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
dart_cubic = dart_files.build_old_dataset(cubic_data)

In [None]:
dart_cubic['Polymer'] = [x.split('/')[0] for x in dart_cubic['Path']]

In [None]:
os.chdir('../../../')

In [None]:
dart_cubic.to_csv("files/csv_files/dataset_dart_2021_cubic.csv")

#### Preprocessing

Among the three resampling methods, we chose bilinear interpolation, which showed a slightly lower amplitude compared to the other two methods, making it closer to the observed data distribution. The simulated dataset resampled using bilinear interpolation was used for all subsequent steps.

Labels translation

In [None]:
dataset_dart = pd.read_csv("files/csv_files/dataset_dart_2021_bilinear.csv")
dataset_dart.drop('Unnamed: 0', axis=1, inplace=True)

dart_label = []
for i in range(len(dataset_dart)):
    if dataset_dart.at[i, 'Label'] == 'Water':
        dart_label.append('Água')
    elif dataset_dart.at[i, 'Label'] == 'Sand':
        dart_label.append('Areia')
    elif dataset_dart.at[i, 'Label'] == 'Plastic':
        dart_label.append('Plástico')

dataset_dart['Classe'] = dart_label

Radiometric indexes

In [None]:
dataset_dart['NDWI'] = (dataset_dart['Green'] - dataset_dart['NIR1']) / (dataset_dart['Green'] + dataset_dart['NIR1'])
dataset_dart['WRI'] = (dataset_dart['Green'] + dataset_dart['Red']) / (dataset_dart['NIR1'] + dataset_dart['SWIR2'])
dataset_dart['NDVI'] = (dataset_dart['NIR1'] - dataset_dart['Red']) / (dataset_dart['NIR1'] + dataset_dart['Red'])
dataset_dart['AWEI'] = 4 * (dataset_dart['Green'] - dataset_dart['SWIR2']) - (0.25 * dataset_dart['NIR1'] + 2.75 * dataset_dart['SWIR1'])
dataset_dart['MNDWI'] = (dataset_dart['Green'] - dataset_dart['SWIR2']) / (dataset_dart['Green'] + dataset_dart['SWIR2'])
dataset_dart['SR'] = dataset_dart['NIR1'] / dataset_dart['Red']
dataset_dart['PI'] = dataset_dart['NIR1'] / (dataset_dart['NIR1'] + dataset_dart['Red'])
dataset_dart['RNDVI'] = (dataset_dart['Red'] - dataset_dart['NIR1']) / (dataset_dart['Red'] + dataset_dart['NIR1'])
dataset_dart['FDI'] = dataset_dart['NIR1'] - (dataset_dart['RedEdge2'] + (dataset_dart['SWIR1'] - dataset_dart['RedEdge2']) * ((dataset_dart['NIR1'] - dataset_dart['Red']) / (dataset_dart['SWIR1'] - dataset_dart['Red'])) * 10)

Data cleaning

In [None]:
dividers_dart = dict()
dividers_dart.update({"Green + NIR1": dataset_dart['Green'] + dataset_dart['NIR1']})
dividers_dart.update({"SWIR2 + NIR1": dataset_dart['NIR1'] + dataset_dart['SWIR2']})
dividers_dart.update({"Red + NIR1": dataset_dart['NIR1'] + dataset_dart['Red']})
dividers_dart.update({"0.25 * NIR1 + 2.75 * SWIR1": 0.25 * dataset_dart['NIR1'] + 2.75 * dataset_dart['SWIR1']})
dividers_dart.update({"Swir2 + * Green": dataset_dart['Green'] + dataset_dart['SWIR2']})
dividers_dart.update({"Red": dataset_dart['Red']})
dividers_dart.update({"SWIR1 - Red": dataset_dart['SWIR1'] - dataset_dart['Red']})

zeros_dart = dict()

for key in dividers_dart.keys():
    i = 0
    for value in dividers_dart[key]:
        if value == 0:
            i += 1
    zeros_dart.update({key:i})
zeros_dart

In [None]:
indexes = dataset_dart.query('FDI < -1000').index #deletando amostras com valor -inf derivados da divisâo por zero no FDI
dataset_dart.drop(indexes,  axis=0, inplace=True)
dataset_dart

Preprocessing is also applied to the other subsets for comparison with the subset selected for the subsequent steps.

Labels translation, radiometric indexes and cleaning data

In [None]:
dataset_dart_cubic = pd.read_csv("files/csv_files/dataset_dart_2021_cubic.csv")
dataset_dart_cubic.drop('Unnamed: 0', axis=1, inplace=True)

dart_label = []
for i in range(len(dataset_dart_cubic)):
    if dataset_dart_cubic.at[i, 'Label'] == 'Water':
        dart_label.append('Água')
    elif dataset_dart_cubic.at[i, 'Label'] == 'Sand':
        dart_label.append('Areia')
    elif dataset_dart_cubic.at[i, 'Label'] == 'Plastic':
        dart_label.append('Plástico')

dataset_dart_cubic['Classe'] = dart_label

In [None]:
dataset_dart_cubic['NDWI'] = (dataset_dart_cubic['Green'] - dataset_dart_cubic['NIR1']) / (dataset_dart_cubic['Green'] + dataset_dart_cubic['NIR1'])
dataset_dart_cubic['WRI'] = (dataset_dart_cubic['Green'] + dataset_dart_cubic['Red']) / (dataset_dart_cubic['NIR1'] + dataset_dart_cubic['SWIR2'])
dataset_dart_cubic['NDVI'] = (dataset_dart_cubic['NIR1'] - dataset_dart_cubic['Red']) / (dataset_dart_cubic['NIR1'] + dataset_dart_cubic['Red'])
dataset_dart_cubic['AWEI'] = 4 * (dataset_dart_cubic['Green'] - dataset_dart_cubic['SWIR2']) - (0.25 * dataset_dart_cubic['NIR1'] + 2.75 * dataset_dart_cubic['SWIR1'])
dataset_dart_cubic['MNDWI'] = (dataset_dart_cubic['Green'] - dataset_dart_cubic['SWIR2']) / (dataset_dart_cubic['Green'] + dataset_dart_cubic['SWIR2'])
dataset_dart_cubic['SR'] = dataset_dart_cubic['NIR1'] / dataset_dart_cubic['Red']
dataset_dart_cubic['PI'] = dataset_dart_cubic['NIR1'] / (dataset_dart_cubic['NIR1'] + dataset_dart_cubic['Red'])
dataset_dart_cubic['RNDVI'] = (dataset_dart_cubic['Red'] - dataset_dart_cubic['NIR1']) / (dataset_dart_cubic['Red'] + dataset_dart_cubic['NIR1'])
dataset_dart_cubic['FDI'] = dataset_dart_cubic['NIR1'] - (dataset_dart_cubic['RedEdge2'] + (dataset_dart_cubic['SWIR1'] - dataset_dart_cubic['RedEdge2']) * ((dataset_dart_cubic['NIR1'] - dataset_dart_cubic['Red']) / (dataset_dart_cubic['SWIR1'] - dataset_dart_cubic['Red'])) * 10)

In [None]:
dividers_dart_cubic = dict()
dividers_dart_cubic.update({"Green + NIR1": dataset_dart_cubic['Green'] + dataset_dart_cubic['NIR1']})
dividers_dart_cubic.update({"SWIR2 + NIR1": dataset_dart_cubic['NIR1'] + dataset_dart_cubic['SWIR2']})
dividers_dart_cubic.update({"Red + NIR1": dataset_dart_cubic['NIR1'] + dataset_dart_cubic['Red']})
dividers_dart_cubic.update({"0.25 * NIR1 + 2.75 * SWIR1": 0.25 * dataset_dart_cubic['NIR1'] + 2.75 * dataset_dart_cubic['SWIR1']})
dividers_dart_cubic.update({"Swir2 + * Green": dataset_dart_cubic['Green'] + dataset_dart_cubic['SWIR2']})
dividers_dart_cubic.update({"Red": dataset_dart_cubic['Red']})
dividers_dart_cubic.update({"SWIR1 - Red": dataset_dart_cubic['SWIR1'] - dataset_dart_cubic['Red']})

zeros_dart_cubic = dict()

for key in dividers_dart_cubic.keys():
    i = 0
    for value in dividers_dart_cubic[key]:
        if value == 0:
            i += 1
    zeros_dart_cubic.update({key:i})
zeros_dart_cubic

In [None]:
indexes = dataset_dart_cubic.query('FDI < -1000').index #deletando amostras com valor -inf derivados da divisâo por zero no FDI
dataset_dart_cubic.drop(indexes,  axis=0, inplace=True)
dataset_dart_cubic

In [None]:
dataset_dart_nn = pd.read_csv("files/csv_files/dataset_dart_2021_nn.csv")
dataset_dart_nn.drop('Unnamed: 0', axis=1, inplace=True)

dart_label = []
for i in range(len(dataset_dart_nn)):
    if dataset_dart_nn.at[i, 'Label'] == 'Water':
        dart_label.append('Água')
    elif dataset_dart_nn.at[i, 'Label'] == 'Sand':
        dart_label.append('Areia')
    elif dataset_dart_nn.at[i, 'Label'] == 'Plastic':
        dart_label.append('Plástico')

dataset_dart_nn['Classe'] = dart_label

In [None]:
dataset_dart_nn['NDWI'] = (dataset_dart_nn['Green'] - dataset_dart_nn['NIR1']) / (dataset_dart_nn['Green'] + dataset_dart_nn['NIR1'])
dataset_dart_nn['WRI'] = (dataset_dart_nn['Green'] + dataset_dart_nn['Red']) / (dataset_dart_nn['NIR1'] + dataset_dart_nn['SWIR2'])
dataset_dart_nn['NDVI'] = (dataset_dart_nn['NIR1'] - dataset_dart_nn['Red']) / (dataset_dart_nn['NIR1'] + dataset_dart_nn['Red'])
dataset_dart_nn['AWEI'] = 4 * (dataset_dart_nn['Green'] - dataset_dart_nn['SWIR2']) - (0.25 * dataset_dart_nn['NIR1'] + 2.75 * dataset_dart_nn['SWIR1'])
dataset_dart_nn['MNDWI'] = (dataset_dart_nn['Green'] - dataset_dart_nn['SWIR2']) / (dataset_dart_nn['Green'] + dataset_dart_nn['SWIR2'])
dataset_dart_nn['SR'] = dataset_dart_nn['NIR1'] / dataset_dart_nn['Red']
dataset_dart_nn['PI'] = dataset_dart_nn['NIR1'] / (dataset_dart_nn['NIR1'] + dataset_dart_nn['Red'])
dataset_dart_nn['RNDVI'] = (dataset_dart_nn['Red'] - dataset_dart_nn['NIR1']) / (dataset_dart_nn['Red'] + dataset_dart_nn['NIR1'])
dataset_dart_nn['FDI'] = dataset_dart_nn['NIR1'] - (dataset_dart_nn['RedEdge2'] + (dataset_dart_nn['SWIR1'] - dataset_dart_nn['RedEdge2']) * ((dataset_dart_nn['NIR1'] - dataset_dart_nn['Red']) / (dataset_dart_nn['SWIR1'] - dataset_dart_nn['Red'])) * 10)

In [None]:
dividers_dart_nn = dict()
dividers_dart_nn.update({"Green + NIR1": dataset_dart_nn['Green'] + dataset_dart_nn['NIR1']})
dividers_dart_nn.update({"SWIR2 + NIR1": dataset_dart_nn['NIR1'] + dataset_dart_nn['SWIR2']})
dividers_dart_nn.update({"Red + NIR1": dataset_dart_nn['NIR1'] + dataset_dart_nn['Red']})
dividers_dart_nn.update({"0.25 * NIR1 + 2.75 * SWIR1": 0.25 * dataset_dart_nn['NIR1'] + 2.75 * dataset_dart_nn['SWIR1']})
dividers_dart_nn.update({"Swir2 + * Green": dataset_dart_nn['Green'] + dataset_dart_nn['SWIR2']})
dividers_dart_nn.update({"Red": dataset_dart_nn['Red']})
dividers_dart_nn.update({"SWIR1 - Red": dataset_dart_nn['SWIR1'] - dataset_dart_nn['Red']})

zeros_dart_nn = dict()

for key in dividers_dart_nn.keys():
    i = 0
    for value in dividers_dart_nn[key]:
        if value == 0:
            i += 1
    zeros_dart_nn.update({key:i})
zeros_dart_nn

#### Building subdataframes 

In [None]:
dart_subdatasets = dict()
dart_subdatasets['plastic'] = dataset_dart.loc[dataset_dart['Label'] == "Plastic"].copy()
dart_subdatasets['water'] = dataset_dart.loc[dataset_dart['Label'] == "Water"].copy()
dart_subdatasets['sand'] = dataset_dart.loc[dataset_dart['Label'] == "Sand"].copy()
dart_subdatasets['plastic_and_water'] = dataset_dart.query("Label == 'Water' or Label == 'Plastic'").copy()

In [None]:
dart_plastic_in_water, dart_plastic_in_sand = [], []

for i in dart_subdatasets['plastic'].index:
    if dataset_dart.at[i - 1, 'Label'] == 'Water':
        dart_plastic_in_water.append(dart_subdatasets['plastic'].loc[i])
    elif dataset_dart.at[i - 1, 'Label'] == 'Sand':
        dart_plastic_in_sand.append(dart_subdatasets['plastic'].loc[i])
    else:
        print("Erro")
        
dart_subdatasets['plastic_in_water'], dart_subdatasets['plastic_in_sand'] = pd.DataFrame(dart_plastic_in_water, columns=dart_subdatasets['plastic'].columns), pd.DataFrame(dart_plastic_in_sand, columns=dart_subdatasets['plastic'].columns)

In [None]:
dart_subdatasets['plastic_20'] = dart_subdatasets['plastic'].query("Cover_percent == 20")
dart_subdatasets['plastic_40'] = dart_subdatasets['plastic'].query("Cover_percent == 40")
dart_subdatasets['plastic_60'] = dart_subdatasets['plastic'].query("Cover_percent == 60")
dart_subdatasets['plastic_80'] = dart_subdatasets['plastic'].query("Cover_percent == 80")
dart_subdatasets['plastic_100'] = dart_subdatasets['plastic'].query("Cover_percent == 100")

dart_subdatasets['plastic_ldpe'] = dart_subdatasets['plastic'].query("Polymer == 'LDPE'")
dart_subdatasets['plastic_micronapo'] = dart_subdatasets['plastic'].query("Polymer == 'MicroNapo'")
dart_subdatasets['plastic_nylon'] = dart_subdatasets['plastic'].query("Polymer == 'Nylon'")
dart_subdatasets['plastic_pet'] = dart_subdatasets['plastic'].query("Polymer == 'PET'")
dart_subdatasets['plastic_pp'] = dart_subdatasets['plastic'].query("Polymer == 'PP'")
dart_subdatasets['plastic_pvc'] = dart_subdatasets['plastic'].query("Polymer == 'PVC'")

In [None]:
dart_nn_subdatasets = dict()
dart_nn_subdatasets['plastic'] = dataset_dart_nn.loc[dataset_dart_nn['Label'] == "Plastic"].copy()
dart_nn_subdatasets['water'] = dataset_dart_nn.loc[dataset_dart_nn['Label'] == "Water"].copy()
dart_nn_subdatasets['sand'] = dataset_dart_nn.loc[dataset_dart_nn['Label'] == "Sand"].copy()
dart_nn_subdatasets['plastic_water'] = dataset_dart_nn.query("Label == 'Water' or Label == 'Plastic'").copy()

In [None]:
dart_cubic_subdatasets = dict()
dart_cubic_subdatasets['plastic'] = dataset_dart_cubic.loc[dataset_dart_cubic['Label'] == "Plastic"].copy()
dart_cubic_subdatasets['water'] = dataset_dart_cubic.loc[dataset_dart_cubic['Label'] == "Water"].copy()
dart_cubic_subdatasets['sand'] = dataset_dart_cubic.loc[dataset_dart_cubic['Label'] == "Sand"].copy()
dart_cubic_subdatasets['plastic_water'] = dataset_dart_cubic.query("Label == 'Water' or Label == 'Plastic'").copy()

#### Observing differences between resample methods

Here, we generate graphs that demonstrate the differences between the data distributions produced by each resampling method, justifying the choice of bilinear interpolation. To save the graphs, you need to have created the folders `'charts\english\exploratory_analysis\descriptive_statistics'` and `'charts\portugues\analise_exploratoria\estatisticas_descritivas'`.

In [None]:
datasets_names = ["Means per resampling method", "Standard deviation (STD) per resampling method"]
traces = [
            [
                [dataset_dart_nn[feature].mean() for feature in feature_names], #Nearest neighbor
                [dataset_dart[feature].mean() for feature in feature_names],#Bilinear Interpolation
                [dataset_dart_cubic[feature].mean() for feature in feature_names], #Cubic Interpolation
                [usgs_2021_acolite_10m[feature].mean() for feature in feature_names] #Acolite
             ],
             [
                [dataset_dart_nn[feature].std() for feature in feature_names], #Nearest neighbor
                [dataset_dart[feature].std() for feature in feature_names],#Bilinear Interpolation
                [dataset_dart_cubic[feature].std() for feature in feature_names], #Cubic Interpolation
                [usgs_2021_acolite_10m[feature].std() for feature in feature_names] #Acolite 
             ]
          ]

labels = [[feature_names, feature_names, feature_names, feature_names], [feature_names, feature_names, feature_names, feature_names]]
legends = [['DART nearest neighbor mean', 'DART bilinear interpolation mean', 'DART cubic interpolation mean', 'USGS acolite mean'],
           ['DART nearest neighbor std', 'DART bilinear interpolation std', 'DART cubic interpolation std', 'USGS acolite std']]
modes = [['markers+lines', 'markers+lines', 'markers+lines', 'markers+lines'],
         ['dash', 'dash', 'dash', 'dash']]
colors = [['#c9b207', '#008000', '#5425ff', '#FF0000'], ['#c9b207', '#008000', '#5425ff', '#FF0000']]
chart_title = "Statistics per resampling method"
x_title = "Band"
y_title = "Reflectance"
height = 1400
width = 3600
guidance = "horizontal"

export_name = "charts/english/exploratory_analysis/descriptive_statistics/mean_std_resampling_2021"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
datasets_names = ["Média por método de reamostragem", "Desvio padrão (STD) por método de reamostragem"]

legends = [['DART vizinho mais próx. média', 'DART interpolação bilinear média', 'DART interpolação cúbica média', 'USGS acolite mean'],
           ['DART vizinho mais próx. std', 'DART interpolação bilinear std', 'DART interpolação cúbicastd', 'USGS acolite std']]

chart_title = "Estatísticas por métodos de reamostragem"
x_title = "Banda"
y_title = "Reflectância"

export_name = "charts/portugues/analise_exploratoria/estatisticas_descritivas/media_dpadrao_reamostragem_2021"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

### 2023 Simulated dataset (DART 2023)

The 2023 dataset was built with a greater variety of element conditions, including different colors and degrees of polymer submersion. The simulations were constructed using the spectral signatures of LDPE, PET, and PP under various conditions, as provided by [8]. The spectral signature of water was obtained from the DART signature library [9], and a whitecap signature was obtained from [10]. The simulations depict scenes of 1200 x 600 m with grid-shaped plastic objects and whitecaps distributed over the surface of water bodies, with areas of 100 m² (covering 100% of the MSI sensor pixel with a 10m spatial resolution), 64 m² (80%), 36 m² (60%), and 16 m² (40%), with plastics located on the surface water as well as at depths of 2.5 and 5.0 cm. Bands with 10 and 20 m spatial resolution were considered (Blue, Green, Red, RedEdge1, RedEdge2, RedEdge3, NIR1, NIR2, SWIR1, SWIR2).  

The nearest neighbor, bilinear interpolation, and cubic interpolation methods were again tested—this time for both upscaling and downscaling. After analyzing the data, bilinear interpolation for upscaling was once again selected. The simulations were also converted to .ASC format in QGIS and are compiled into .CSV files in the following steps.

The 'source_folders' variable holds a list of directories containing DART files. 

Each folder should adhere to the following structure:
source_folder/polymer/submersion_depth/color/status (Dry, Wet, or Submerged)/cover_percent/dart .asc files representing each of the sensor bands.

In [None]:
source_folders = [
            'files/dart_files/Sentinel2_Artificial/Limpa/',
            'files/dart_files/Sentinel2_Artificial/LimpaEspuma/',
            'files/dart_files/Sentinel2_Artificial/Whitecaps/'
            ]
paths = dict()

for folder in source_folders:
    tree = dart_files.get_directory_tree(folder)
    paths[folder] = tree
    os.chdir('../')
    os.chdir('../')
    os.chdir('../')
    os.chdir('../')
    
paths

The 'paths' variable represents the whole directory structure that will be used as input for compiling the simulated information.

#### Resampling

DART simulations consist of files containing numerical matrices representing reflectance values in each sensor band. Due to the varied spatial resolutions of the simulated sensor bands, the files exhibit matrices of different sizes. Therefore, compiling simulated information necessitates a resampling strategy.

In this section, we create datasets by compiling the 2023 DART simulated database, while experimenting with various resampling strategies for subsequent evaluation.

Resampling 20 meter bands by nearest neighbor

In [None]:
nn_data2023_10 = dart_files.get_images(paths, "nearest", "up")

In [None]:
nn_data2023_20 = dart_files.get_images(paths, "nearest", "down")

In [None]:
for image in nn_data2023_10:
    image.setAreaLabel(0, (image.getXSize() - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    if(image.getPath().find('Whitecaps') > 0):
        image.setGridLabel(29, 57, 4, 27, 87, 4, "Whitecap")
    else:
        image.setGridLabel(29, 57, 4, 27, 87, 4, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
for image in nn_data2023_20:
    image.setAreaLabel(0, (image.getXSize() - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    if(image.getPath().find('Whitecaps') > 0):
        image.setGridLabel(14, 28, 2, 13, 43, 2, "Whitecap")
    else:
        image.setGridLabel(14, 28, 2, 13, 43, 2, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
dart2023_nn_10 = dart_files.build_dataset(nn_data2023_20)

In [None]:
dart2023_nn_20 = dart_files.build_dataset(nn_data2023_20)

In [None]:
dart2023_nn_10['Submergence'] = ['0' if value == 'S0' else '2 cm' if value == 'S2' else '5 cm' for value in dart2023_nn_10['Submergence']]

dart2023_nn_10

In [None]:
dart2023_nn_20['Submergence'] = ['0' if value == 'S0' else '2 cm' if value == 'S2' else '5 cm' for value in dart2023_nn_20['Submergence']]

dart2023_nn_20

In [None]:
dart2023_nn_10.to_csv('files/csv_files/dataset_dart_2023_nn_10.csv')

dart2023_nn_20.to_csv('files/csv_files/dataset_dart_2023_nn_20.csv')

Resampling 20 meter bands by bilinear interpolation

In [None]:
bilinear_data2023_10 = dart_files.get_images(paths, "bilinear", "up")

In [None]:
bilinear_data2023_20 = dart_files.get_images(paths, "bilinear", "down")

In [None]:
for image in bilinear_data2023_10:
    image.setAreaLabel(0, (image.getXSize() - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    if(image.getPath().find('Whitecaps') > 0):
        image.setGridLabel(29, 57, 4, 27, 87, 4, "Whitecap")
    else:
        image.setGridLabel(29, 57, 4, 27, 87, 4, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
for image in bilinear_data2023_20:
    image.setAreaLabel(0, (image.getXSize() - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    if(image.getPath().find('Whitecaps') > 0):
        image.setGridLabel(14, 28, 2, 13, 43, 2, "Whitecap")
    else:
        image.setGridLabel(14, 28, 2, 13, 43, 2, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
dart2023_bilinear_10 = dart_files.build_dataset(bilinear_data2023_10)

In [None]:
dart2023_bilinear_20 = dart_files.build_dataset(bilinear_data2023_20)

In [None]:
dart2023_bilinear_10['Submergence'] = ['0' if value == 'S0' else '2 cm' if value == 'S2' else '5 cm' for value in dart2023_bilinear_10['Submergence']]

dart2023_bilinear_10

In [None]:
dart2023_bilinear_20['Submergence'] = ['0' if value == 'S0' else '2 cm' if value == 'S2' else '5 cm' for value in dart2023_bilinear_20['Submergence']]

dart2023_bilinear_20

In [None]:
dart2023_bilinear_10.to_csv('files/csv_files/dataset_dart_2023_bilinear_10.csv')

dart2023_bilinear_20.to_csv('files/csv_files/dataset_dart_2023_bilinear_20.csv')

Resampling 20 meter bands by cubic interpolation

In [None]:
cubic_data2023_10 = dart_files.get_images(paths, "cubic", "up")

In [None]:
cubic_data2023_20 = dart_files.get_images(paths, "cubic", "down")

In [None]:
for image in cubic_data2023_10:
    image.setAreaLabel(0, (image.getXSize() - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    if(image.getPath().find('Whitecaps') > 0):
        image.setGridLabel(29, 57, 4, 27, 87, 4, "Whitecap")
    else:
        image.setGridLabel(29, 57, 4, 27, 87, 4, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
for image in cubic_data2023_20:
    image.setAreaLabel(0, (image.getXSize() - 1), 0, (image.getYSize() - 1), "Water") #-1 é porque indice (numero das linhas e colunas) comeca em zero, enquanto len (que informa o tamanho) comeca em 1
    if(image.getPath().find('Whitecaps') > 0):
        image.setGridLabel(14, 28, 2, 13, 43, 2, "Whitecap")
    else:
        image.setGridLabel(14, 28, 2, 13, 43, 2, "Plastic")
    for pixel in image.getPixels():
        pixel.setLabel(image.getLabelsMap())
        pixel.setCoverPercent("Plastic", image.getPlasticCoverPercent())

In [None]:
dart2023_cubic_10 = dart_files.build_dataset(cubic_data2023_10)

In [None]:
dart2023_cubic_20 = dart_files.build_dataset(cubic_data2023_20)

In [None]:
dart2023_cubic_10['Submergence'] = ['0' if value == 'S0' else '2 cm' if value == 'S2' else '5 cm' for value in dart2023_cubic_10['Submergence']]

dart2023_cubic_10

In [None]:
dart2023_cubic_20['Submergence'] = ['0' if value == 'S0' else '2 cm' if value == 'S2' else '5 cm' for value in dart2023_cubic_20['Submergence']]

dart2023_cubic_20

In [None]:
dart2023_cubic_10.to_csv('files/csv_files/dataset_dart_2023_cubic_10.csv')

dart2023_cubic_20.to_csv('files/csv_files/dataset_dart_2023_cubic_20.csv')

### Building data frames

#### Building data frames

Default dataset (Bilinear interpolation) 

In [None]:
dart_2023_bilinear_10m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2023_bilinear_10.csv'), 'DART 2023 Bilinear 10m', feature_names, radiometric_indexes)
dart_2023_bilinear_20m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2023_bilinear_20.csv'), 'DART 2023 Bilinear 20m', feature_names, radiometric_indexes)
dart_2023_cubic_10m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2023_cubic_10.csv'), 'DART 2023 Cubic 10m', feature_names, radiometric_indexes)
dart_2023_cubic_20m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2023_cubic_20.csv'), 'DART 2023 Cubic 20m', feature_names, radiometric_indexes)
dart_2023_nn_10m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2023_nn_10.csv'), 'DART 2023 NN 10m', feature_names, radiometric_indexes)
dart_2023_nn_20m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2023_nn_20.csv'), 'DART 2023 NN 20m', feature_names, radiometric_indexes)
dart_2021_bilinear_10m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2021_bilinear.csv'), 'DART 2021 Bilinear 10m', feature_names, radiometric_indexes)
dart_2021_cubic_10m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2021_cubic.csv'), 'DART 2021 Cubic 10m', feature_names, radiometric_indexes)
dart_2021_nn_10m = dart_files.format_dataset(pd.read_csv('files/csv_files/dataset_dart_2021_nn.csv'), 'DART 2021 NN 10m', feature_names, radiometric_indexes)

Building subdataframes

In [None]:
dart_2023_bilinear_10m_subdatasets = dart_files.get_subdatasets(dart_2023_bilinear_10m)
dart_2023_bilinear_20m_subdatasets = dart_files.get_subdatasets(dart_2023_bilinear_20m)
dart_2023_cubic_10m_subdatasets = dart_files.get_subdatasets(dart_2023_cubic_10m)
dart_2023_cubic_20m_subdatasets = dart_files.get_subdatasets(dart_2023_cubic_20m)
dart_2023_nn_10m_subdatasets = dart_files.get_subdatasets(dart_2023_nn_10m)
dart_2023_nn_20m_subdatasets = dart_files.get_subdatasets(dart_2023_nn_20m)
dart_2021_bilinear_10m_subdatasets = dart_files.get_subdatasets(dart_2021_bilinear_10m)
dart_2021_cubic_10m_subdatasets = dart_files.get_subdatasets(dart_2021_cubic_10m)
dart_2021_nn_10m_subdatasets = dart_files.get_subdatasets(dart_2021_nn_10m)

#### Observing differences between resample methods

Here, we generate graphs that demonstrate the differences between the data distributions produced by each resampling method, justifying the choice of bilinear interpolation. To save the graphs, you need to have created the folders `'charts\english\exploratory_analysis\descriptive_statistics'` and `'charts\portugues\analise_exploratoria\estatisticas_descritivas'`.

In [None]:
dataset_dart = dart_2023_bilinear_10m
dataset_dart_subdatasets = dart_files.get_subdatasets(dataset_dart)
dart = dart_2023_bilinear_10m
dart_subdatasets = dart_files.get_subdatasets(dart)
dataset_usgs = usgs_2021_acolite_10m
dataset_usgs_subdatasets = dart_files.get_subdatasets(dataset_usgs)
usgs = usgs_2021_acolite_10m
usgs_subdatasets = dart_files.get_subdatasets(usgs)

old_dart = dart_2021_bilinear_10m
old_dart_subdatasets = dart_files.get_subdatasets(old_dart)

In [None]:
datasets = [
    dart_2023_bilinear_10m,
    dart_2023_bilinear_20m,
    dart_2023_cubic_10m,
    dart_2023_cubic_20m,
    dart_2023_nn_10m,
    dart_2023_nn_20m,
    dart_2021_bilinear_10m,
    dart_2021_cubic_10m,
    dart_2021_nn_10m,
    usgs_2021_acolite_10m
]

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'mean_std_resampling_2021-2023'

rsdata_charts.check_path(path)

export_name = path+chart_name

datasets_names = ["Means per resampling method", "Standard deviation per resampling method"]

trace_mean = []
trace_std = []
label = []

for dataset in datasets: 
    trace_mean.append([dataset[feature].mean() for feature in feature_names])
    trace_std.append([dataset[feature].std() for feature in feature_names])
traces = [trace_mean, trace_std]

for i in range(len(datasets)):
    label.append(feature_names)
labels = [label, label]

legends = [
            ['DART 2023 bilinear interpolation 10m mean', 
                'DART 2023 bilinear interpolation 20m mean', 
                'DART 2023 cubic interpolation 10m mean', 
                'DART 2023 cubic interpolation 20m mean',
                'DART 2023 nearest neighbor 10m mean', 
                'DART 2023 nearest neighbor 20m mean',
                'DART 2021 bilinear interpolation 10m mean', 
                'DART 2021 cubic interpolation 10m mean', 
                'DART 2021 nearest neighbor 10m mean', 
                'USGS acolite mean'],
            ['DART 2023 bilinear interpolation 10m std', 
                'DART 2023 bilinear interpolation 20m std', 
                'DART 2023 cubic interpolation 10m std', 
                'DART 2023 cubic interpolation 20m std',
                'DART 2023 nearest neighbor 10m std', 
                'DART 2023 nearest neighbor 20m std',
                'DART 2021 bilinear interpolation 10m std', 
                'DART 2021 cubic interpolation 10m std', 
                'DART 2021 nearest neighbor 10m std', 
                'USGS acolite std'] 
          ]

modes = [['dash', 'dot', 'dash', 'dot', 'dash', 'dot', 'dash', 'dash', 'dash',  'markers+lines'],
         ['dash', 'dot', 'dash', 'dot', 'dash', 'dot', 'dash', 'dash', 'dash',  'markers+lines']]

colors = [['#008000', '#008000', '#5425ff','#5425ff', '#c9b207', '#c9b207', '#003c00', '#2500ad', '#877805', '#FF0000'], 
          ['#008000', '#008000', '#5425ff','#5425ff', '#c9b207', '#c9b207', '#003c00', '#2500ad', '#877805', '#FF0000']]

chart_title = "Statistics per resampling method"
x_title = "Band"
y_title = "Reflectance"
height = 1400
width = 3600
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'media_dpadrao_reamostragem_2021-2023'

rsdata_charts.check_path(path)

export_name = path+chart_name

datasets_names = ["Médias por método de reamostragem", "Desvio padrão por método de reamostragem"]
chart_title = "Estatísticas por método de reamostragem"
x_title = "Banda"
y_title = "Reflectância"

legends = [
            ['Média DART 2023 interpolação bilinear 10m', 
                'Média DART 2023 interpolação bilinear 20m', 
                'Média DART 2023 interpolação cúbica 10m', 
                'Média DART 2023 interpolação cúbica 20m',
                'Média DART 2023 vizinho mais próximo 10m', 
                'Média DART 2023 vizinho mais próximo 20m',
                'Média DART 2021 interpolação bilinear 10m', 
                'Média DART 2021 interpolação cúbica 10m', 
                'Média DART 2021 vizinho mais próximo 10m', 
                'Média USGS Acolite 10m'],
            ['Desvio padrão DART 2023 interpolação bilinear 10m', 
                'Desvio padrão DART 2023 interpolação bilinear 20m', 
                'Desvio padrão DART 2023 cubic interpolation 10m', 
                'Desvio padrão DART 2023 cubic interpolation 20m',
                'Desvio padrão DART 2023 nearest neighbor 10m', 
                'Desvio padrão DART 2023 nearest neighbor 20m',
                'Desvio padrão DART 2021 interpolação bilinear 10m', 
                'Desvio padrão DART 2021 cubic interpolation 10m', 
                'Desvio padrão DART 2021 nearest neighbor 10m', 
                'Desvio padrão USGS Acolite 10m'] 
          ]

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'resampling_method_quartiles_2021-2023'

rsdata_charts.check_path(path)

export_name = path+chart_name

datasets_names = ['DART bilinear interpolation quartiles', 
                  'DART cubic interpolation quartiles',
                  'DART nearest neighbor quartiles', 
                  'USGS quartiles']

       
traces = [
            [
                [dart_2023_bilinear_10m[feature].describe()['min'] for feature in feature_names],
                [dart_2023_bilinear_10m[feature].describe()['25%'] for feature in feature_names],
                [dart_2023_bilinear_10m[feature].describe()['50%'] for feature in feature_names],
                [dart_2023_bilinear_10m[feature].describe()['75%'] for feature in feature_names],
                [dart_2023_bilinear_10m[feature].describe()['max'] for feature in feature_names],
                [dart_2023_bilinear_20m[feature].describe()['min'] for feature in feature_names],
                [dart_2023_bilinear_20m[feature].describe()['25%'] for feature in feature_names],
                [dart_2023_bilinear_20m[feature].describe()['50%'] for feature in feature_names],
                [dart_2023_bilinear_20m[feature].describe()['75%'] for feature in feature_names],
                [dart_2023_bilinear_20m[feature].describe()['max'] for feature in feature_names],
                [dart_2021_bilinear_10m[feature].describe()['min'] for feature in feature_names],
                [dart_2021_bilinear_10m[feature].describe()['25%'] for feature in feature_names],
                [dart_2021_bilinear_10m[feature].describe()['50%'] for feature in feature_names],
                [dart_2021_bilinear_10m[feature].describe()['75%'] for feature in feature_names],
                [dart_2021_bilinear_10m[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [dart_2023_cubic_10m[feature].describe()['min'] for feature in feature_names],
                [dart_2023_cubic_10m[feature].describe()['25%'] for feature in feature_names],
                [dart_2023_cubic_10m[feature].describe()['50%'] for feature in feature_names],
                [dart_2023_cubic_10m[feature].describe()['75%'] for feature in feature_names],
                [dart_2023_cubic_10m[feature].describe()['max'] for feature in feature_names],
                [dart_2023_cubic_20m[feature].describe()['min'] for feature in feature_names],
                [dart_2023_cubic_20m[feature].describe()['25%'] for feature in feature_names],
                [dart_2023_cubic_20m[feature].describe()['50%'] for feature in feature_names],
                [dart_2023_cubic_20m[feature].describe()['75%'] for feature in feature_names],
                [dart_2023_cubic_20m[feature].describe()['max'] for feature in feature_names],
                [dart_2021_cubic_10m[feature].describe()['min'] for feature in feature_names],
                [dart_2021_cubic_10m[feature].describe()['25%'] for feature in feature_names],
                [dart_2021_cubic_10m[feature].describe()['50%'] for feature in feature_names],
                [dart_2021_cubic_10m[feature].describe()['75%'] for feature in feature_names],
                [dart_2021_cubic_10m[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [dart_2023_nn_10m[feature].describe()['min'] for feature in feature_names],
                [dart_2023_nn_10m[feature].describe()['25%'] for feature in feature_names],
                [dart_2023_nn_10m[feature].describe()['50%'] for feature in feature_names],
                [dart_2023_nn_10m[feature].describe()['75%'] for feature in feature_names],
                [dart_2023_nn_10m[feature].describe()['max'] for feature in feature_names],
                [dart_2023_nn_20m[feature].describe()['min'] for feature in feature_names],
                [dart_2023_nn_20m[feature].describe()['25%'] for feature in feature_names],
                [dart_2023_nn_20m[feature].describe()['50%'] for feature in feature_names],
                [dart_2023_nn_20m[feature].describe()['75%'] for feature in feature_names],
                [dart_2023_nn_20m[feature].describe()['max'] for feature in feature_names],
                [dart_2021_nn_10m[feature].describe()['min'] for feature in feature_names],
                [dart_2021_nn_10m[feature].describe()['25%'] for feature in feature_names],
                [dart_2021_nn_10m[feature].describe()['50%'] for feature in feature_names],
                [dart_2021_nn_10m[feature].describe()['75%'] for feature in feature_names],
                [dart_2021_nn_10m[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [dataset_usgs[feature].describe()['min'] for feature in feature_names],
                [dataset_usgs[feature].describe()['25%'] for feature in feature_names],
                [dataset_usgs[feature].describe()['50%'] for feature in feature_names],
                [dataset_usgs[feature].describe()['75%'] for feature in feature_names],
                [dataset_usgs[feature].describe()['max'] for feature in feature_names]
             ]
      ]

labels = [[feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names],
          [feature_names, feature_names, feature_names, feature_names, feature_names]]

legends = [
            ['DART 2023 bilinear interpolation (10m) min', 
             'DART 2023 bilinear interpolation (10m) 25%', 
             'DART 2023 bilinear interpolation (10m) 50%', 
             'DART 2023 bilinear interpolation (10m) 75%', 
             'DART 2023 bilinear interpolation (10m) max',
             'DART 2023 bilinear interpolation (20m) min', 
             'DART 2023 bilinear interpolation (20m) 25%', 
             'DART 2023 bilinear interpolation (20m) 50%', 
             'DART 2023 bilinear interpolation (20m) 75%', 
             'DART 2023 bilinear interpolation (20m) max',
             'DART 2021 bilinear interpolation (10m) min', 
             'DART 2021 bilinear interpolation (10m) 25%', 
             'DART 2021 bilinear interpolation (10m) 50%', 
             'DART 2021 bilinear interpolation (10m) 75%', 
             'DART 2021 bilinear interpolation (10m) max'],
            ['DART 2023 cubic interpolation (10m) min', 
             'DART 2023 cubic interpolation (10m) 25%', 
             'DART 2023 cubic interpolation (10m) 50%', 
             'DART 2023 cubic interpolation (10m) 75%', 
             'DART 2023 cubic interpolation (10m) max',
             'DART 2023 cubic interpolation (20m) min', 
             'DART 2023 cubic interpolation (20m) 25%', 
             'DART 2023 cubic interpolation (20m) 50%', 
             'DART 2023 cubic interpolation (20m) 75%', 
             'DART 2023 cubic interpolation (20m) max',
             'DART 2021 cubic interpolation (10m) min', 
             'DART 2021 cubic interpolation (10m) 25%', 
             'DART 2021 cubic interpolation (10m) 50%', 
             'DART 2021 cubic interpolation (10m) 75%', 
             'DART 2021 cubic interpolation (10m) max'],
            ['DART 2023 nearest neighbor (10m) min', 
             'DART 2023 nearest neighbor (10m) 25%', 
             'DART 2023 nearest neighbor (10m) 50%', 
             'DART 2023 nearest neighbor (10m) 75%', 
             'DART 2023 nearest neighbor (10m) max',
             'DART 2023 nearest neighbor (20m) min', 
             'DART 2023 nearest neighbor (20m) 25%', 
             'DART 2023 nearest neighbor (20m) 50%', 
             'DART 2023 nearest neighbor (20m) 75%', 
             'DART 2023 nearest neighbor (20m) max',
             'DART 2021 nearest neighbor (10m) min', 
             'DART 2021 nearest neighbor (10m) 25%', 
             'DART 2021 nearest neighbor (10m) 50%', 
             'DART 2021 nearest neighbor (10m) 75%', 
             'DART 2021 nearest neighbor (10m) max'],
            ['USGS min', 'USGS 25%', 'USGS 50%', 'USGS 75%', 'USGS max']
          ]

modes = [['dot', 'dash', 'dash', 'dash', 'markers+lines', 'dot', 'dash', 'dash', 'dash', 'markers+lines', 'dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines', 'dot', 'dash', 'dash', 'dash', 'markers+lines', 'dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines', 'dot', 'dash', 'dash', 'dash', 'markers+lines', 'dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines']]

colors = [['#008000', '#008000', '#008000', '#008000', '#008000', '#008000', '#008000', '#008000', '#008000', '#008000', '#003c00', '#003c00', '#003c00', '#003c00', '#003c00'], 
          ['#5425ff', '#5425ff', '#5425ff', '#5425ff', '#5425ff', '#5425ff', '#5425ff', '#5425ff', '#5425ff', '#5425ff', '#2500ad', '#2500ad', '#2500ad', '#2500ad', '#2500ad'],
          ['#c9b207', '#c9b207', '#c9b207', '#c9b207', '#c9b207', '#c9b207', '#c9b207', '#c9b207', '#c9b207', '#c9b207', '#877805', '#877805', '#877805', '#877805', '#877805'],
          ['#f44', '#f44', '#d00', '#b00', '#b00']]

chart_title = "Statistics per resampling method"
x_title = "Band"
y_title = "Reflectance"
height = 3100
width = 6500
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'quartis_metodo_reamostragem_2021-2023'

rsdata_charts.check_path(path)

export_name = path+chart_name

datasets_names = ['Quartis DART - interpolação bilinear', 
                  'Quartis DART - vizinho mais próximo', 
                  'Quartis DART - interpolação cúbica', 
                  'Quartis USGS']

legends = [
            ['DART 2023 interpolação bilinear (10m) mín', 
             'DART 2023 interpolação bilinear (10m) 25%', 
             'DART 2023 interpolação bilinear (10m) 50%', 
             'DART 2023 interpolação bilinear (10m) 75%', 
             'DART 2023 interpolação bilinear (10m) máx',
             'DART 2023 interpolação bilinear (20m) mín', 
             'DART 2023 interpolação bilinear (20m) 25%', 
             'DART 2023 interpolação bilinear (20m) 50%', 
             'DART 2023 interpolação bilinear (20m) 75%', 
             'DART 2023 interpolação bilinear (20m) máx',
             'DART 2021 interpolação bilinear (10m) mín', 
             'DART 2021 interpolação bilinear (10m) 25%', 
             'DART 2021 interpolação bilinear (10m) 50%', 
             'DART 2021 interpolação bilinear (10m) 75%', 
             'DART 2021 interpolação bilinear (10m) máx'],
            ['DART 2023 interpolação cúbica (10m) mín', 
             'DART 2023 interpolação cúbica (10m) 25%', 
             'DART 2023 interpolação cúbica (10m) 50%', 
             'DART 2023 interpolação cúbica (10m) 75%', 
             'DART 2023 interpolação cúbica (10m) máx',
             'DART 2023 interpolação cúbica (20m) mín', 
             'DART 2023 interpolação cúbica (20m) 25%', 
             'DART 2023 interpolação cúbica (20m) 50%', 
             'DART 2023 interpolação cúbica (20m) 75%', 
             'DART 2023 interpolação cúbica (20m) máx',
             'DART 2021 interpolação cúbica (10m) mín', 
             'DART 2021 interpolação cúbica (10m) 25%', 
             'DART 2021 interpolação cúbica (10m) 50%', 
             'DART 2021 interpolação cúbica (10m) 75%', 
             'DART 2021 interpolação cúbica (10m) máx'],
            ['DART 2023 vizinho mais próximo (10m) mín', 
             'DART 2023 vizinho mais próximo (10m) 25%', 
             'DART 2023 vizinho mais próximo (10m) 50%', 
             'DART 2023 vizinho mais próximo (10m) 75%', 
             'DART 2023 vizinho mais próximo (10m) máx',
             'DART 2023 vizinho mais próximo (20m) mín', 
             'DART 2023 vizinho mais próximo (20m) 25%', 
             'DART 2023 vizinho mais próximo (20m) 50%', 
             'DART 2023 vizinho mais próximo (20m) 75%', 
             'DART 2023 vizinho mais próximo (20m) máx',
             'DART 2021 vizinho mais próximo (10m) mín', 
             'DART 2021 vizinho mais próximo (10m) 25%', 
             'DART 2021 vizinho mais próximo (10m) 50%', 
             'DART 2021 vizinho mais próximo (10m) 75%', 
             'DART 2021 vizinho mais próximo (10m) máx'],
            ['USGS min', 'USGS 25%', 'USGS 50%', 'USGS 75%', 'USGS máx']
          ]

chart_title = "Estatísticas por método de reamostragem"
x_title = "Banda"
y_title = "Reflectância"
height = 3100
width = 6500
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

## Exploratory analysis

### Kolmogorov-Smirnov

In [None]:
print("DART x USGS")
for feature in feature_names + radiometric_indexes:
    print(feature, ks_2samp(dataset_dart[feature], dataset_usgs[feature]))
print("*******************")
print("                   ")

print("DART (only Plastic and water) x USGS (only Plastic and water)")
for feature in feature_names + radiometric_indexes:
    print(feature, ks_2samp(dart_subdatasets['plastic_and_water'][feature], usgs_subdatasets['plastic_and_water'][feature]))
print("*******************")
print("                   ")

print("DART (only Plastic and water) x USGS 2019 (only Plastic and water)")
for feature in feature_names + radiometric_indexes:
    print(feature, ks_2samp(dart_subdatasets['plastic_and_water'][feature], usgs_subdatasets['plp2019_plastic_water'][feature]))
print("*******************")
print("                   ")

print("DART (only Plastic and water) x USGS 2021 (only Plastic and water)")
for feature in feature_names + radiometric_indexes:
    print(feature, ks_2samp(dart_subdatasets['plastic_and_water'][feature], usgs_subdatasets['plp2021_plastic_water'][feature]))
print("*******************")
print("                   ")


print("USGS 2019 x USGS 2021")
for feature in feature_names + radiometric_indexes:
    print(feature, ks_2samp(usgs_subdatasets['plp2019'][feature], usgs_subdatasets['plp2021'][feature]))
print("*******************")
print("                   ")

In [None]:
print("DART Plastic x USGS Plastic")
for feature in feature_names + radiometric_indexes:
    print(feature, ks_2samp(dart_subdatasets['plastic'][feature], usgs_subdatasets['plastic'][feature]))
print("*******************")
print("                   ")

print("DART Water x USGS Water")
for feature in feature_names + radiometric_indexes:
    print(feature, ks_2samp(dart_subdatasets['water'][feature], usgs_subdatasets['water'][feature]))
print("*******************")
print("                   ")
print("                   ")

### Descriptive statistics

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'classes'

rsdata_charts.check_path(path)

export_name = path+chart_name

rsdata_charts.pie_chart(
    [
        pd.concat([dart_2021_bilinear_10m_subdatasets['plastic'], 
                   dart_2021_bilinear_10m_subdatasets['sand'], 
                   dart_2021_bilinear_10m_subdatasets['water']], 
                   ignore_index=True),
        pd.concat([dart_subdatasets['plastic'], 
                   dart_subdatasets['whitecap'], 
                   dart_subdatasets['water']], 
                   ignore_index=True),
        pd.concat([usgs_subdatasets['plp2019'].query('Label=="Plastic"'), 
                   usgs_subdatasets['plp2019'].query('Label=="Coast"'), 
                   usgs_subdatasets['plp2019'].query('Label=="Water"')], 
                   ignore_index=True),
        pd.concat([usgs_subdatasets['plp2021'].query('Label=="Plastic"'), 
                   usgs_subdatasets['plp2021'].query('Label=="Coast"'), 
                   usgs_subdatasets['plp2021'].query('Label=="Water"'), 
                   usgs_subdatasets['plp2021'].query('Label=="Wood"')], 
                   ignore_index=True)
    ], 
    ['Label', 'Label', 'Label', 'Label'], 
    ["DART 2021", "DART 2023", "USGS 2019", "USGS 2021"], 
    "DART x USGS classes", 1300, 2700, 
    ['#FF69B4', '#FFD700', '#1E90FF'], export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'classes'

rsdata_charts.check_path(path)

export_name = path+chart_name


rsdata_charts.pie_chart(
    [
        pd.concat([dart_2021_bilinear_10m_subdatasets['plastic'], 
                   dart_2021_bilinear_10m_subdatasets['sand'], 
                   dart_2021_bilinear_10m_subdatasets['water']], 
                   ignore_index=True),
        pd.concat([dart_subdatasets['plastic'], 
                   dart_subdatasets['whitecap'], 
                   dart_subdatasets['water']], 
                   ignore_index=True),
        pd.concat([usgs_subdatasets['plp2019'].query('Label=="Plastic"'), 
                   usgs_subdatasets['plp2019'].query('Label=="Coast"'), 
                   usgs_subdatasets['plp2019'].query('Label=="Water"')], 
                   ignore_index=True),
        pd.concat([usgs_subdatasets['plp2021'].query('Label=="Plastic"'), 
                   usgs_subdatasets['plp2021'].query('Label=="Coast"'), 
                   usgs_subdatasets['plp2021'].query('Label=="Water"'), 
                   usgs_subdatasets['plp2021'].query('Label=="Wood"')], 
                   ignore_index=True)
    ], 
    ['Classe', 'Classe', 'Classe', 'Classe'], 
    ["DART 2021", "DART 2023", "USGS 2019", "USGS 2021"], 
    "Classes DART x USGS", 1300, 2700, 
    ['#FF69B4', '#FFD700', '#1E90FF'], export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'polymers_dart'

rsdata_charts.check_path(path)

export_name = path+chart_name

rsdata_charts.pie_chart([dart_2021_bilinear_10m_subdatasets['plastic'], dart_subdatasets['plastic']], 
                        ['Polymer', 'Polymer'], 
                        ["Polymers in DART 2021 data", "Polymers in DART 2023 data"], 
                        " ", 1000, 2200, 
                        ['#2f1b70', '#D81F88', '#FF8825', '#F1C800', '#ADF224', '#1AB1B1'], 
                        export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'polimeros_dart'

rsdata_charts.check_path(path)

export_name = path+chart_name

rsdata_charts.pie_chart([dart_2021_bilinear_10m_subdatasets['plastic'], dart_subdatasets['plastic']], 
                        ['Polymer', 'Polymer'], ["Polímeros DART 2021", "Polímeros DART 2023"], 
                        " ", 1000, 2200, 
                        ['#2f1b70', '#D81F88', '#FF8825', '#F1C800', '#ADF224', '#1AB1B1'], 
                        export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'polymers_usgs'

rsdata_charts.check_path(path)

export_name = path+chart_name

rsdata_charts.pie_chart([usgs_subdatasets['plp2019'].query('Polymer!="None"'), usgs_subdatasets['plp2021'].query('Polymer!="None"')], 
                        ['Polymer', 'Polymer'], ["Polymers in USGS 2019 data", "Polymers in USGS 2021 data"], 
                        " ", 1200, 1400, 
                        ['#49C658','#8945AB', '#FF675F', '#FCFE5E'], 
                        export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'polimeros_usgs'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.pie_chart([usgs_subdatasets['plp2019'].query('Polymer!="None"'), usgs_subdatasets['plp2021'].query('Polymer!="None"')], 
                        ['Polímero', 'Polímero'], ["Polímeros nos dados USGS 2019", "Polímeros nos dados USGS 2021"], 
                        " ", 1200, 1400, 
                        ['#49C658', '#8945AB', '#FF675F', '#FCFE5E'], 
                        export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'usgs_sources'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.pie_chart([dataset_usgs], ['Year'], [" "], "USGS data sources", 1200, 1200, ['#c20', '#ff8d77'], export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'fontes_usgs'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.pie_chart([dataset_usgs], ['Year'], [" "], "Fontes dos dados USGS", 1200, 1200, ['#c20', '#ff8d77'], export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'usgs_dates'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.pie_chart([usgs_subdatasets['plp2019'], usgs_subdatasets['plp2021']], ['Path', 'Path'], 
                        ["Days in 2019", "Days in 2021"], "USGS acquisition dates", 1000, 2000, ['#991900', '#c20', '#f53', '#ff9c88', '#ffc6bb'], 
                        export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'datas_usgs'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.pie_chart([usgs_subdatasets['plp2019'], usgs_subdatasets['plp2021']], ['Path', 'Path'], 
                        ["Dias em 2019", "Dias em 2021"], "Datas de aquisição USGS", 1000, 2000, ['#991900', '#c20', '#f53', '#ff9c88', '#ffc6bb'], 
                        export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'usgs_sources_per_class'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.pie_chart([usgs_subdatasets['plastic'], usgs_subdatasets['water'], usgs_subdatasets['coast'], usgs_subdatasets['wood']], ['Year', 'Year', 'Year', 'Year'], 
                        ["Plastic", "Water", "Coast", "Wood"], "USGS data sources - per class", 1200, 3600, ['#c20', '#ff8d77'], 
                        export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'fontes_usgs_por_classe'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.pie_chart([usgs_subdatasets['plastic'], usgs_subdatasets['water'], usgs_subdatasets['coast'], usgs_subdatasets['wood']], ['Year', 'Year', 'Year', 'Year'], 
                        ["Plástico", "Água", "Costa", "Madeira"], "Fontes dos dados USGS - por classe", 1200, 3600, ['#c20', '#ff8d77'], 
                        export_name)

In [None]:
datasets = [dart_2021_bilinear_10m_subdatasets, 
            dart_subdatasets]

datasets_names = ['DART 2021', 'DART 2023']
y = []

for dataset in datasets:
    y.append([
                [len(dataset['plastic_ldpe'].query("Cover_percent == 20")), 
                 len(dataset['plastic_ldpe'].query("Cover_percent == 40")), 
                 len(dataset['plastic_ldpe'].query("Cover_percent == 60")), 
                 len(dataset['plastic_ldpe'].query("Cover_percent == 80")), 
                 len(dataset['plastic_ldpe'].query("Cover_percent == 100"))],
                [len(dataset['plastic_pet'].query("Cover_percent == 20")), 
                 len(dataset['plastic_pet'].query("Cover_percent == 40")), 
                 len(dataset['plastic_pet'].query("Cover_percent == 60")), 
                 len(dataset['plastic_pet'].query("Cover_percent == 80")), 
                 len(dataset['plastic_pet'].query("Cover_percent == 100"))],
                [len(dataset['plastic_pp'].query("Cover_percent == 20 and Detailed_status == 'Wet' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 40 and Detailed_status == 'Wet' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 60 and Detailed_status == 'Wet' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 80 and Detailed_status == 'Wet' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 100 and Detailed_status == 'Wet' and Color == 'White'"))],
                [len(dataset['plastic_pp'].query("Cover_percent == 20 and Detailed_status == 'Submerged 2cm' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 40 and Detailed_status == 'Submerged 2cm' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 60 and Detailed_status == 'Submerged 2cm' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 80 and Detailed_status == 'Submerged 2cm' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 100 and Detailed_status == 'Submerged 2cm' and Color == 'White'"))],
                [len(dataset['plastic_pp'].query("Cover_percent == 20 and Detailed_status == 'Submerged 5cm' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 40 and Detailed_status == 'Submerged 5cm' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 60 and Detailed_status == 'Submerged 5cm' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 80 and Detailed_status == 'Submerged 5cm' and Color == 'White'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 100 and Detailed_status == 'Submerged 5cm' and Color == 'White'"))],
                [len(dataset['plastic_pp'].query("Cover_percent == 20 and Detailed_status == 'Wet' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 40 and Detailed_status == 'Wet' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 60 and Detailed_status == 'Wet' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 80 and Detailed_status == 'Wet' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 100 and Detailed_status == 'Wet' and Color == 'Orange'"))],
                [len(dataset['plastic_pp'].query("Cover_percent == 20 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 40 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 60 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 80 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 100 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'"))],
                [len(dataset['plastic_pp'].query("Cover_percent == 20 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 40 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 60 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 80 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 100 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'"))],
                [len(dataset['plastic_pp'].query("Cover_percent == 20 and Detailed_status == 'Dry' and Color == 'Transparent'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 40 and Detailed_status == 'Dry' and Color == 'Transparent'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 60 and Detailed_status == 'Dry' and Color == 'Transparent'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 80 and Detailed_status == 'Dry' and Color == 'Transparent'")), 
                 len(dataset['plastic_pp'].query("Cover_percent == 100 and Detailed_status == 'Dry' and Color == 'Transparent'"))],
                [len(dataset['plastic_nylon'].query("Cover_percent == 20")), 
                 len(dataset['plastic_nylon'].query("Cover_percent == 40")), 
                 len(dataset['plastic_nylon'].query("Cover_percent == 60")), 
                 len(dataset['plastic_nylon'].query("Cover_percent == 80")), 
                 len(dataset['plastic_nylon'].query("Cover_percent == 100"))],
                [len(dataset['plastic_pvc'].query("Cover_percent == 20")), 
                 len(dataset['plastic_pvc'].query("Cover_percent == 40")), 
                 len(dataset['plastic_pvc'].query("Cover_percent == 60")), 
                 len(dataset['plastic_pvc'].query("Cover_percent == 80")), 
                 len(dataset['plastic_pvc'].query("Cover_percent == 100"))],
                [len(dataset['plastic_micronapo'].query("Cover_percent == 20")), 
                 len(dataset['plastic_micronapo'].query("Cover_percent == 40")), 
                 len(dataset['plastic_micronapo'].query("Cover_percent == 60")), 
                 len(dataset['plastic_micronapo'].query("Cover_percent == 80")), 
                 len(dataset['plastic_micronapo'].query("Cover_percent == 100"))]
            ])
    
labels_group = ["DART 2021 polymers", "DART 2023 polymers"]

labels = ['Dry LDPE (transparent)', 
          'Dry PET (transparent)', 
          'Wet PP (white)', 
          'Sub 2cm PP (white)', 
          'Sub 5cm PP (white)',
          'Wet PP (orange)', 
          'Sub 2cm PP (orange)', 
          'Sub 5cm PP (orange)',
          'Dry PP (transparent)',
          'Dry Nylon (transparent)',
          'Dry PVC (transparent)',
          'Wet Micronapo (mixed colors)']

legend_x = ['Plastic 20%', 'Plastic 40%', 'Plastic 60%', 'Plastic 80%', 'Plastic 100%']
legends_x = []
for i in range(len(y[0])):
    legends_x.append(legend_x)

path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'dart_polymers_detailed'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.stacked_bar_chart(datasets_names, #names_datasets
                            #x,
                            legends_x,
                            y, 
                            labels, #names, 
                            ['#ADF224',
                             '#3a228b',
                             '#666ce0',
                             '#3c43d7',
                             '#2329ac',
                             '#FFAD69',
                             '#FF9A47',
                             '#FF7F14',
                             '#1AB1B1',
                             '#D81F88',
                             '#FFDD36',
                             '#DC0000'], #colors,
                            'DART - Detailed polymers', #chart_title, 
                            'Cover percents', #x_title, 
                            'Number of pixels', #y_title, 
                            1600, #height, 
                            2200, #width, 
                            labels_group, 
                            'h', #orientation, 
                            'horizontal', #guidance, 
                            export_name
                           )

labels_group = [
        "Polímeros DART 2021", 
        "Polímeros DART 2023"
    ]

labels = ['LDPE seco (transparente)', 
          'PET seco (transparente)', 
          'PP úmido (branco)', 
          'PP sub 2cm (branco)', 
          'PP sub 5cm (branco)',
          'PP úmido (laranja)', 
          'PP sub 2cm (laranja)', 
          'PP sub 5cm (laranja)',
          'PP seco (transparente)',
          'Nylon seco (transparente)',
          'PVC seco (transparente)',
          'Micronapo úmido (cores mistas)']

legend_x = ['Plástico 20%', 'Plástico 40%', 'Plástico 60%', 'Plástico 80%', 'Plástico 100%']
legends_x = []
for i in range(len(y[0])):
    legends_x.append(legend_x)

path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'dart_polimeros_detalhado'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.stacked_bar_chart(datasets_names, #names_datasets
                            #x,
                            legends_x,
                            y, 
                            labels, #names, 
                            ['#ADF224',
                             '#3a228b',
                             '#666ce0',
                             '#3c43d7',
                             '#2329ac',
                             '#FFAD69',
                             '#FF9A47',
                             '#FF7F14',
                             '#1AB1B1',
                             '#D81F88',
                             '#FFDD36',
                             '#DC0000'], #colors, 
                            'DART - polímeros em detalhes', #chart_title, 
                            'Percentuais de cobertura', #x_title, 
                            'Número de pixels', #y_title, 
                            1600, #height, 
                            2200, #width, 
                            labels_group, 
                            'h', #orientation, 
                            'horizontal', #guidance, 
                            export_name)

In [None]:
datasets = [usgs_subdatasets['plp2019'], 
            usgs_subdatasets['plp2021']
           ]

datasets_names = ['USGS 2019', 'USGS 2021']
y = []

for dataset in datasets:
    y.append([
                [len(dataset.query("Polymer == 'Bottles' and Cover_percent >= 0 and Cover_percent <= 20")), 
                 len(dataset.query("Polymer == 'Bottles' and Cover_percent > 20 and Cover_percent <= 40")), 
                 len(dataset.query("Polymer == 'Bottles' and Cover_percent > 40 and Cover_percent <= 60")), 
                 len(dataset.query("Polymer == 'Bottles' and Cover_percent > 60 and Cover_percent <= 80")), 
                 len(dataset.query("Polymer == 'Bottles' and Cover_percent > 80"))],
                [len(dataset.query("Polymer == 'Bags' and Cover_percent >= 0 and Cover_percent <= 20")), 
                 len(dataset.query("Polymer == 'Bags' and Cover_percent > 20 and Cover_percent <= 40")), 
                 len(dataset.query("Polymer == 'Bags' and Cover_percent > 40 and Cover_percent <= 60")), 
                 len(dataset.query("Polymer == 'Bags' and Cover_percent > 60 and Cover_percent <= 80")), 
                 len(dataset.query("Polymer == 'Bags' and Cover_percent > 80"))],
                [len(dataset.query("Polymer == 'Bags and Bottles' and Cover_percent >= 0 and Cover_percent <= 20")), 
                 len(dataset.query("Polymer == 'Bags and Bottles' and Cover_percent > 20 and Cover_percent <= 40")), 
                 len(dataset.query("Polymer == 'Bags and Bottles' and Cover_percent > 40 and Cover_percent <= 60")), 
                 len(dataset.query("Polymer == 'Bags and Bottles' and Cover_percent > 60 and Cover_percent <= 80")), 
                 len(dataset.query("Polymer == 'Bags and Bottles' and Cover_percent > 80"))],
                [len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Floating' and Cover_percent >= 0 and Cover_percent <= 20")), 
                 len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Floating' and Cover_percent > 20 and Cover_percent <= 40")), 
                 len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Floating' and Cover_percent > 40 and Cover_percent <= 60")), 
                 len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Floating' and Cover_percent > 60 and Cover_percent <= 80")), 
                 len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Floating' and Cover_percent > 80"))],
                [len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Partially submerged' and Cover_percent >= 0 and Cover_percent <= 20")), 
                 len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Partially submerged' and Cover_percent > 20 and Cover_percent <= 40")), 
                 len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Partially submerged' and Cover_percent == -1")), 
                 len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Partially submerged' and Cover_percent > 60 and Cover_percent <= 80")), 
                 len(dataset.query("Polymer == 'HDPE mesh' and Detailed_status == 'Partially submerged' and Cover_percent == -100"))]
            ])
    
labels_group = [
        "USGS 2019 polymers", 
        "USGS 2021 polymers"
    ]

labels = ['Floating Bottles (transparent)', 
          'Floating Bags (blue)', 
          'Floating Bags and Bottles (mixed colors)', 
          'Floating HDPE mesh (white) (estimated cover percent)',
          'Partially submerged HDPE mesh (white) (estimated cover percent)']

legend_x = ['Plastic 0%-20%', 'Plastic 21%-40%', 'Plastic 41%-60%', 'Plastic 61%-80%', 'Plastic 81%-100%']
legends_x = []
for i in range(len(y[0])):
    legends_x.append(legend_x)

path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'usgs_polymers_detailed'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.stacked_bar_chart(datasets_names, #names_datasets
                            #x,
                            legends_x,
                            y, 
                            labels, #names, 
                            ['#ADF224',
                             '#3a228b',
                             '#DC0000',
                             '#1AB1B1',
                             '#FFDD36'], #colors,
                            'USGS - Detailed polymers', #chart_title, 
                            'Cover percents', #x_title, 
                            'Number of pixels', #y_title, 
                            1600, #height, 
                            2800, #width, 
                            labels_group, 
                            'h', #orientation, 
                            'horizontal', #guidance, 
                            export_name
                           )

labels_group = [
        "Polímeros USGS 2019", 
        "Polímeros USGS 2021"
    ]

labels = ['Garrafas em flutuação (transparente)', 
          'Sacolas em flutuação (azul)', 
          'Garrafas e sacolas em flutuação (cores mistas)', 
          'Malha de HDPE em flutuação (branca) (percentual estimado)',
          'Malha de HDPE parcialmente submersa (branca) (percentual estimado)']

legend_x = ['Plástico 0-20%', 'Plástico 21%-40%', 'Plástico 41%-60%', 'Plástico 61%-80%', 'Plástico 81%-100%']
legends_x = []

for i in range(len(y[0])):
    legends_x.append(legend_x)

path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'usgs_polimeros_detalhado'
rsdata_charts.check_path(path)
export_name = path+chart_name

rsdata_charts.stacked_bar_chart(datasets_names, #names_datasets
                            #x,
                            legends_x,
                            y, 
                            labels, #names, 
                            ['#ADF224',
                             '#3a228b',
                             '#DC0000',
                             '#1AB1B1',
                             '#FFDD36'], #colors,
                            'USGS - polímeros em detalhes', #chart_title, 
                            'Percentuais de cobertura', #x_title, 
                            'Número de pixels', #y_title, 
                            1600, #height, 
                            2800, #width, 
                            labels_group, 
                            'h', #orientation, 
                            'horizontal', #guidance, 
                            export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'mean_std'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["Mean","Std"]
traces = [
            [
                [old_dart[feature].mean() for feature in feature_names],
                [dart[feature].mean() for feature in feature_names],
                [usgs[feature].mean() for feature in feature_names]
             ],
             [
                [old_dart[feature].std() for feature in feature_names],
                [dataset_dart[feature].std() for feature in feature_names],
                [dataset_usgs[feature].std() for feature in feature_names]
             ]
          ]
labels = [[feature_names, feature_names, feature_names], [feature_names, feature_names, feature_names]]
legends = [['DART 2021 mean', 'DART 2023 mean', 'USGS mean'],
           ['DART 2021 std', 'DART 2023 std', 'USGS std']]
modes = [['markers+lines', 'markers+lines', 'markers+lines'],
         ['dash', 'dash', 'dash']]
colors = [['#0000FF', '#FF0000', '#00FF00'], ['#0000FF', '#FF0000', '#00FF00']]
chart_title = "DART x USGS statistics"
x_title = "Band"
y_title = "Reflectance"
height = 1250
width = 2500
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'media_dpadrao'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["Média","Desvio padrão"]
legends = [['Média DART 2021', 'Média DART 2023', 'Média USGS'],
           ['Desvio padrão DART 2021', 'Desvio padrão DART 2023', 'Desvio padrão USGS']]
chart_title = "Estatísticas DART x USGS"
x_title = "Banda"
y_title = "Reflectância"
width = 2700

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'mean_std_plastic_water'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["Mean (only plastic and water)","Std (only plastic and water)"]
traces = [
            [
                [old_dart_subdatasets['plastic_and_water'][feature].mean() for feature in feature_names],
                [dart_subdatasets['plastic_and_water'][feature].mean() for feature in feature_names], 
                [usgs_subdatasets['plastic_and_water'][feature].mean() for feature in feature_names]
             ],
             [
                [old_dart_subdatasets['plastic_and_water'][feature].std() for feature in feature_names],
                [dart_subdatasets['plastic_and_water'][feature].std() for feature in feature_names], 
                [usgs_subdatasets['plastic_and_water'][feature].std() for feature in feature_names]
             ]
          ]
labels = [[feature_names, feature_names, feature_names], [feature_names, feature_names, feature_names]]
legends = [['DART 2021 mean (only plastic and water)', 'DART 2023 mean (only plastic and water)', 'USGS mean (only plastic and water)'],
           ['DART 2021 std (only plastic and water)', 'DART 2023 std (only plastic and water)', 'USGS std (only plastic and water)']]
modes = [['markers+lines', 'markers+lines', 'markers+lines'],
         ['dash', 'dash', 'dash']]
colors = [['#0000FF', '#FF0000', '#00FF00'], ['#0000FF', '#FF0000', '#00FF00']]
chart_title = "DART x USGS statistics (only plastic and water)"
x_title = "Band"
y_title = "Reflectance"
height = 1000
width = 3500
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'media_dpadrao_plastico_agua'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["Média (apenas plástico e água)","Desvio padrão (apenas plástico e água)"]
legends = [['Média DART 2021', 'Média DART 2023', 'Média USGS'],
           ['Desvio padrão DART 2021', 'Desvio padrão DART 2023', 'Desvio padrão USGS']]
chart_title = "Estatísticas DART x USGS (apenas plástico e água)"
x_title = "Banda"
y_title = "Reflectância"
width = 3600

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'mean_std_plastic_water_per_year'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["Mean (only plastic and water)","Std (only plastic and water)"]
traces = [
            [
                [old_dart_subdatasets['plastic_and_water'][feature].mean() for feature in feature_names],
                [dart_subdatasets['plastic_and_water'][feature].mean() for feature in feature_names], 
                [usgs_subdatasets['plp2019_plastic_water'][feature].mean() for feature in feature_names],
                [usgs_subdatasets['plp2021_plastic_water'][feature].mean() for feature in feature_names]
             ],
             [
                [old_dart_subdatasets['plastic_and_water'][feature].std() for feature in feature_names],
                [dart_subdatasets['plastic_and_water'][feature].std() for feature in feature_names], 
                [usgs_subdatasets['plp2019_plastic_water'][feature].std() for feature in feature_names],
                [usgs_subdatasets['plp2021_plastic_water'][feature].std() for feature in feature_names],
             ]
          ]
labels = [[feature_names, feature_names, feature_names, feature_names], [feature_names, feature_names, feature_names, feature_names]]
legends = [['DART 2021 mean (only plastic and water)', 'DART 2023 mean (only plastic and water)', 'USGS 2019 mean (only plastic and water)', 'USGS 2021 mean (only plastic and water)'],
           ['DART 2021 std (only plastic and water)', 'DART 2023 std (only plastic and water)', 'USGS 2019 std (only plastic and water)', 'USGS 2021 std (only plastic and water)']]
modes = [['markers+lines', 'markers+lines', 'markers+lines', 'markers+lines'],
         ['dash', 'dash', 'dash', 'dash']]
colors = [['#0F0', '#F00', '#00F', '#FF0'], ['#0F0', '#F00', '#00F', '#FF0']]
chart_title = "DART x USGS statistics (only plastic and water)"
x_title = "Band"
y_title = "Reflectance"
height = 1300
width = 2800
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'media_dpadrao_plastico_agua_por ano'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["Média (apenas plástico e água)","Desvio padrão (apenas plástico e água)"]
legends = [['Média DART 2021', 'Média DART 2023', 'Média USGS 2019', 'Média USGS 2021'],
           ['Desvio padrão DART 2021', 'Desvio padrão DART 2023', 'Desvio padrão USGS 2019', 'Desvio padrão USGS 2021']]
chart_title = "Estatísticas DART x USGS (apenas plástico e água)"
x_title = "Banda"
y_title = "Reflectância"
width = 2900

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'quartiles'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021(simulated data)", "DART 2023 (simulated data)", 
                  "USGS 2019 (observed data)", "USGS 2021 (observed data)"]

traces = [
             [
                [old_dart[feature].describe()['min'] for feature in feature_names],
                [old_dart[feature].describe()['25%'] for feature in feature_names],
                [old_dart[feature].describe()['50%'] for feature in feature_names],
                [old_dart[feature].describe()['75%'] for feature in feature_names],
                [old_dart[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [dataset_dart[feature].describe()['min'] for feature in feature_names],
                [dataset_dart[feature].describe()['25%'] for feature in feature_names],
                [dataset_dart[feature].describe()['50%'] for feature in feature_names],
                [dataset_dart[feature].describe()['75%'] for feature in feature_names],
                [dataset_dart[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [usgs_subdatasets['plp2019'][feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['plp2019'][feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['plp2019'][feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['plp2019'][feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['plp2019'][feature].describe()['max'] for feature in feature_names]
             ], 
             [
                [usgs_subdatasets['plp2021'][feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['plp2021'][feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['plp2021'][feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['plp2021'][feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['plp2021'][feature].describe()['max'] for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names, feature_names, feature_names],
          [feature_names, feature_names, feature_names, feature_names, feature_names],
          [feature_names, feature_names, feature_names, feature_names, feature_names],
          [feature_names, feature_names, feature_names, feature_names, feature_names]]

legends = [['DART 2021 min', 'DART 2021 25%', 'DART 2021 50%', 'DART 2021 75%', 'DART 2021 max'],
           ['DART 2023 min', 'DART 2023 25%', 'DART 2023 50%', 'DART 2023 75%', 'DART 2023 max'],
           ['USGS 2019 min', 'USGS 2019 25%', 'USGS 2019 50%', 'USGS 2019 75%', 'USGS 2019 max'],
           ['USGS 2021 min', 'USGS 2021 25%', 'USGS 2021 50%', 'USGS 2021 75%', 'USGS 2021 max']]

modes = [['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines']
        ]

colors = [['#0F0', '#0F0', '#0F0', '#0F0', '#0F0'],
          ['#F00', '#F00', '#F00', '#F00', '#F00'],
          ['#00F', '#00F', '#00F', '#00F', '#00F'],
          ['#FF0', '#FF0', '#FF0', '#FF0', '#FF0']]

chart_title = "DART x USGS quartiles"
x_title = "Band"
y_title = "Reflectance"
height = 1800
width = 4200
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'quartis_ano'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021 (dados simulados)", "DART 2023 (dados simulados)", 
                  "USGS 2019 (dados observados)", "USGS 2023 (dados observados)"]

chart_title = "Quartis DART x USGS"
x_title = "Banda"
y_title = "Reflectância"
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'class_quartiles_dart_2023'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2023 Plastic (simulated)", "DART 2023 Water (simulated)", "DART 2023 Whitecap (simulated)"]

traces = [
            [
                [dart_subdatasets['plastic'][feature].describe()['min'] for feature in feature_names],
                [dart_subdatasets['plastic'][feature].describe()['25%'] for feature in feature_names],
                [dart_subdatasets['plastic'][feature].describe()['50%'] for feature in feature_names],
                [dart_subdatasets['plastic'][feature].describe()['75%'] for feature in feature_names],
                [dart_subdatasets['plastic'][feature].describe()['max'] for feature in feature_names]
             ],
             [
                [dart_subdatasets['water'][feature].describe()['min'] for feature in feature_names],
                [dart_subdatasets['water'][feature].describe()['25%'] for feature in feature_names],
                [dart_subdatasets['water'][feature].describe()['50%'] for feature in feature_names],
                [dart_subdatasets['water'][feature].describe()['75%'] for feature in feature_names],
                [dart_subdatasets['water'][feature].describe()['max'] for feature in feature_names]
             ],
             [
                [dart_subdatasets['whitecap'][feature].describe()['min'] for feature in feature_names],
                [dart_subdatasets['whitecap'][feature].describe()['25%'] for feature in feature_names],
                [dart_subdatasets['whitecap'][feature].describe()['50%'] for feature in feature_names],
                [dart_subdatasets['whitecap'][feature].describe()['75%'] for feature in feature_names],
                [dart_subdatasets['whitecap'][feature].describe()['max'] for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names]]

legends = [
            ['DART Plastic min', 'DART Plastic 25%', 'DART Plastic 50%', 'DART Plastic 75%', 'DART Plastic max'],
            ['DART Water min', 'DART Water 25%', 'DART Water 50%', 'DART Water 75%', 'DART Water max'],
            ['DART Whitecap min', 'DART Whitecap 25%', 'DART Whitecap 50%', 'DART Whitecap 75%', 'DART Whitecap max']
          ]

modes = [['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines']
        ]

colors = [['#ffadd6', '#ffadd6', '#FF69B4', '#ff148a', '#ff148a'], 
          ['#73baff', '#73baff', '#1E90FF', '#005db7', '#005db7'],
          #['#ffe766', '#ffe766', '#FFD700', '#ddba00', '#ddba00']
          ['#4ff', '#4ff', '#0dd', '#099', '#099']]

chart_title = "Class statistics - DART 2023 quartiles"
x_title = "Band"
y_title = "Reflectance"
height = 1500
width = 3600
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'quartis_por_classe_dart_2023'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2023 Plástico (simulado)", "DART 2023 Água (simulado)", "DART 2023 Espuma (simulado)"]

legends = [
            ['DART Plástico mín', 'DART Plástico 25%', 'DART Plástico 50%', 'DART Plástico 75%', 'DART Plástico máx'],
            ['DART Água mín', 'DART Água 25%', 'DART Água 50%', 'DART Água 75%', 'DART Água máx'],
            ['DART Espuma mín', 'DART Espuma 25%', 'DART Espuma 50%', 'DART Espuma 75%', 'DART Espuma máx']
          ]

chart_title = "Estatísticas por classe - Quartis DART 2023"
x_title = "Banda"
y_title = "Reflectância"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'class_quartiles_dart_2021'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021 Plastic (simulated)", "DART 2021 Water (simulated)", "DART 2021 Sand (simulated)"]

traces = [
            [
                [old_dart_subdatasets['plastic'][feature].describe()['min'] for feature in feature_names],
                [old_dart_subdatasets['plastic'][feature].describe()['25%'] for feature in feature_names],
                [old_dart_subdatasets['plastic'][feature].describe()['50%'] for feature in feature_names],
                [old_dart_subdatasets['plastic'][feature].describe()['75%'] for feature in feature_names],
                [old_dart_subdatasets['plastic'][feature].describe()['max'] for feature in feature_names]
             ],
             [
                [old_dart_subdatasets['water'][feature].describe()['min'] for feature in feature_names],
                [old_dart_subdatasets['water'][feature].describe()['25%'] for feature in feature_names],
                [old_dart_subdatasets['water'][feature].describe()['50%'] for feature in feature_names],
                [old_dart_subdatasets['water'][feature].describe()['75%'] for feature in feature_names],
                [old_dart_subdatasets['water'][feature].describe()['max'] for feature in feature_names]
             ],
             [
                [old_dart_subdatasets['sand'][feature].describe()['min'] for feature in feature_names],
                [old_dart_subdatasets['sand'][feature].describe()['25%'] for feature in feature_names],
                [old_dart_subdatasets['sand'][feature].describe()['50%'] for feature in feature_names],
                [old_dart_subdatasets['sand'][feature].describe()['75%'] for feature in feature_names],
                [old_dart_subdatasets['sand'][feature].describe()['max'] for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names]]

legends = [
            ['DART Plastic min', 'DART Plastic 25%', 'DART Plastic 50%', 'DART Plastic 75%', 'DART Plastic max'],
            ['DART Water min', 'DART Water 25%', 'DART Water 50%', 'DART Water 75%', 'DART Water max'],
            ['DART Sand min', 'DART Sand 25%', 'DART Sand 50%', 'DART Sand 75%', 'DART Sand max']
          ]

modes = [['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines']
        ]

colors = [['#ffadd6', '#ffadd6', '#FF69B4', '#ff148a', '#ff148a'], 
          ['#73baff', '#73baff', '#1E90FF', '#005db7', '#005db7'],
          ['#ffe766', '#ffe766', '#FFD700', '#ddba00', '#ddba00']]

chart_title = "Class statistics - DART 2021 quartiles"
x_title = "Band"
y_title = "Reflectance"
height = 1500
width = 3000
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'quartis_por_classe_dart_2021'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021 Plástico (simulado)", "DART 2021 Água (simulado)", "DART 2021 Areia (simulado)"]

legends = [
            ['DART Plástico mín', 'DART Plástico 25%', 'DART Plástico 50%', 'DART Plástico 75%', 'DART Plástico máx'],
            ['DART Água mín', 'DART Água 25%', 'DART Água 50%', 'DART Água 75%', 'DART Água máx'],
            ['DART Areia mín', 'DART Areia 25%', 'DART Areia 50%', 'DART Areia 75%', 'DART Areia máx']
          ]

chart_title = "Estatísticas por classe - Quartis DART 2021"
x_title = "Banda"
y_title = "Reflectância"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'class_quartiles_usgs_2019'
rsdata_charts.check_path(path)
export_name = path+chart_name
datasets_names = ["USGS 2019 Plastic (observed)", "USGS 2019 Water (observed)", "USGS 2019 Coast (observed)"]

traces = [
             [
                [usgs_subdatasets['plastic'].query('Year == "2019"')[feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['plastic'].query('Year == "2019"')[feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['plastic'].query('Year == "2019"')[feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['plastic'].query('Year == "2019"')[feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['plastic'].query('Year == "2019"')[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [usgs_subdatasets['water'].query('Year == "2019"')[feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['water'].query('Year == "2019"')[feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['water'].query('Year == "2019"')[feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['water'].query('Year == "2019"')[feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['water'].query('Year == "2019"')[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [usgs_subdatasets['coast'].query('Year == "2019"')[feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['coast'].query('Year == "2019"')[feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['coast'].query('Year == "2019"')[feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['coast'].query('Year == "2019"')[feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['coast'].query('Year == "2019"')[feature].describe()['max'] for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names]]

legends = [
            ['USGS Plastic min', 'USGS Plastic 25%', 'USGS Plastic 50%', 'USGS Plastic 75%', 'USGS Plastic max'],
            ['USGS Water min', 'USGS Water 25%', 'USGS Water 50%', 'USGS Water 75%', 'USGS Water max'],
            ['USGS Coast min', 'USGS Coast 25%', 'USGS Coast 50%', 'USGS Coast 75%', 'USGS Coast max']
          ]


modes = [['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines']]

colors = [['#ffadd6', '#ffadd6', '#FF69B4', '#ff148a', '#ff148a'], 
          ['#73baff', '#73baff', '#1E90FF', '#005db7', '#005db7'],
          ['#ffe766', '#ffe766', '#FFD700', '#ddba00', '#ddba00']]

chart_title = "Class statistics - USGS 2019 quartiles"
x_title = "Band"
y_title = "Reflectance"
height = 1500
width = 3000
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'quartis_por_classe_usgs_2019'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["USGS 2019 Plástico (observado)", "USGS 2019 Água (observado)", "USGS 2019 Costa (observado)"]

legends = [
            ['USGS Plástico mín', 'USGS Plástico 25%', 'USGS Plástico 50%', 'USGS Plástico 75%', 'USGS Plástico máx'],
            ['USGS Água mín', 'USGS Water 25%', 'USGS Água 50%', 'USGS Água 75%', 'USGS Água máx'],
            ['USGS Costa mín', 'USGS Costa 25%', 'USGS Costa 50%', 'USGS Costa 75%', 'USGS Costa máx']]

chart_title = "Estatísticas por classe - Quartis USGS 2019"
x_title = "Banda"
y_title = "Reflectância"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'class_quartiles_usgs_2021'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["USGS 2021 Plastic (observed)", "USGS 2021 Water (observed)", "USGS 2021 Coast (observed)", "USGS 2021 Wood (observed)"]

traces = [
             [
                [usgs_subdatasets['plastic'].query('Year == "2021"')[feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['plastic'].query('Year == "2021"')[feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['plastic'].query('Year == "2021"')[feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['plastic'].query('Year == "2021"')[feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['plastic'].query('Year == "2021"')[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [usgs_subdatasets['water'].query('Year == "2021"')[feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['water'].query('Year == "2021"')[feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['water'].query('Year == "2021"')[feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['water'].query('Year == "2021"')[feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['water'].query('Year == "2021"')[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [usgs_subdatasets['coast'].query('Year == "2021"')[feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['coast'].query('Year == "2021"')[feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['coast'].query('Year == "2021"')[feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['coast'].query('Year == "2021"')[feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['coast'].query('Year == "2021"')[feature].describe()['max'] for feature in feature_names]
             ],
             [
                [usgs_subdatasets['wood'].query('Year == "2021"')[feature].describe()['min'] for feature in feature_names],
                [usgs_subdatasets['wood'].query('Year == "2021"')[feature].describe()['25%'] for feature in feature_names],
                [usgs_subdatasets['wood'].query('Year == "2021"')[feature].describe()['50%'] for feature in feature_names],
                [usgs_subdatasets['wood'].query('Year == "2021"')[feature].describe()['75%'] for feature in feature_names],
                [usgs_subdatasets['wood'].query('Year == "2021"')[feature].describe()['max'] for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names]]

legends = [
            ['USGS Plastic min', 'USGS Plastic 25%', 'USGS Plastic 50%', 'USGS Plastic 75%', 'USGS Plastic max'],
            ['USGS Water min', 'USGS Water 25%', 'USGS Water 50%', 'USGS Water 75%', 'USGS Water max'],
            ['USGS Coast min', 'USGS Coast 25%', 'USGS Coast 50%', 'USGS Coast 75%', 'USGS Coast max'],
            ['USGS Wood min', 'USGS Wood 25%', 'USGS Wood 50%', 'USGS Wood 75%', 'USGS Wood max']]


modes = [['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines'],
         ['dot', 'dash', 'dash', 'dash', 'markers+lines']
        ]

colors = [['#ffadd6', '#ffadd6', '#FF69B4', '#ff148a', '#ff148a'], 
          ['#73baff', '#73baff', '#1E90FF', '#005db7', '#005db7'],
          ['#ffe766', '#ffe766', '#FFD700', '#ddba00', '#ddba00'],
          ['#a38fd3', '#a38fd3', '#7152bb', '#533990', '#533990']
         ]

chart_title = "Class statistics - USGS 2021 quartiles"
x_title = "Band"
y_title = "Reflectance"
height = 1500
width = 3100
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'quartis_por_classe_usgs_2021'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["USGS 2021 Plástico (observado)", "USGS 2021 Água (observado)", "USGS 2021 Costa (observado)", "USGS 2021 Madeira (observado)"]

legends = [
            ['USGS Plástico mín', 'USGS Plástico 25%', 'USGS Plástico 50%', 'USGS Plástico 75%', 'USGS Plástico máx'],
            ['USGS Água mín', 'USGS Water 25%', 'USGS Água 50%', 'USGS Água 75%', 'USGS Água máx'],
            ['USGS Costa mín', 'USGS Costa 25%', 'USGS Costa 50%', 'USGS Costa 75%', 'USGS Costa máx'],
            ['USGS Madeira mín', 'USGS Madeira 25%', 'USGS Madeira 50%', 'USGS Madeira 75%', 'USGS Madeira máx']]

chart_title = "Estatísticas por classe - Quartis USGS 2021"
x_title = "Banda"
y_title = "Reflectância"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'mean_spectral_signatures'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021", "DART 2023", "USGS", "USGS (minimum plastic: 20%)", "USGS (minimum plastic: 50%)"]

traces = [
             [
                 [old_dart_subdatasets['plastic'][feature].mean() for feature in feature_names],
                 [old_dart_subdatasets['water'][feature].mean() for feature in feature_names],
                 [old_dart_subdatasets['sand'][feature].mean() for feature in feature_names]
             ],
             [
                 [dart_subdatasets['plastic'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['water'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['whitecap'][feature].mean() for feature in feature_names]
             ],
             [
                 [usgs_subdatasets['plastic'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['water'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['coast'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['wood'][feature].mean() for feature in feature_names]
             ],
             [
                 [usgs_subdatasets['plastic_min_20'][feature].mean() for feature in feature_names], 
                 [usgs_subdatasets['water'][feature].mean() for feature in feature_names]
             ],
             [
                 [usgs_subdatasets['plastic_min_50'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['water'][feature].mean() for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names],
          [feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names], 
          [feature_names, feature_names]]

legends = [['Plastic (DART mean)', 'Water (DART mean)', 'Sand (DART mean)'],
           ['Plastic (DART mean)', 'Water (DART mean)', 'Whitecap (DART mean)'],
           ['Plastic (USGS mean)', 'Water (USGS mean)', 'Coast (USGS mean)', 'Wood (USGS mean)'],
           ['Plastic - min 20% (USGS mean)', 'Water (USGS mean)'],
           ['Plastic - min 50% (USGS mean)', 'Water (USGS mean)']]

modes = [['lines', 'lines', 'lines'],
         ['lines', 'lines', 'lines'],
         ['dot', 'dot', 'dot', 'dot'],
         ['dash', 'dash'],
         ['markers+lines', 'markers+lines']]

colors = [['#FF69B4', '#1E90FF', '#FFD700'], 
          ['#FF69B4', '#1E90FF', '#0ff'], 
          ['#FF69B4', '#1E90FF', '#FFD700', '#7152bb'],
          ['#FF69B4', '#1E90FF'],
          ['#FF69B4', '#1E90FF']]

chart_title = "Class statistics - Mean spectral signatures"
x_title = "Band"
y_title = "Reflectance"
height = 1500
width = 4000
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'assinaturas_espectrais_medias'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021", "DART 2023", "USGS", "USGS (cobertura plástica mínima: 20%)", "USGS (cobertura plástica mínima: 50%)"]

legends = [['Plástico (média DART)', 'Água (média DART)', 'Areia (média DART)'],
           ['Plástico (média DART)', 'Água (média DART)', 'Espuma (média DART)'],
           ['Plástico (média USGS)', 'Água (média USGS)', 'Costa (média USGS)', 'Madeira (média USGS)'],
           ['Plástico - min 20% (média USGS)', 'Água (média USGS)'],
           ['Plástico - min 50% (média USGS)', 'Água (média USGS)']]

chart_title = "Estatísticas por classe - Assinaturas espectrais médias"
x_title = "Banda"
y_title = "Reflectância"
height = 1600
width = 4500

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/english/exploratory_analysis/descriptive_statistics/'
chart_name = 'mean_spectral_signatures_per_year'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021", "DART 2023", "USGS 2019", "USGS 2021", "USGS (minimum plastic: 50%)"]

traces = [
             [
                 [old_dart_subdatasets['plastic'][feature].mean() for feature in feature_names],
                 [old_dart_subdatasets['water'][feature].mean() for feature in feature_names],
                 [old_dart_subdatasets['sand'][feature].mean() for feature in feature_names]
             ],         
             [
                 [dart_subdatasets['plastic'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['water'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['whitecap'][feature].mean() for feature in feature_names]
             ],
             [
                 [usgs_subdatasets['plastic'].query('Year == "2019"')[feature].mean() for feature in feature_names],
                 [usgs_subdatasets['water'].query('Year == "2019"')[feature].mean() for feature in feature_names],
                 [usgs_subdatasets['coast'].query('Year == "2019"')[feature].mean() for feature in feature_names],
                 [usgs_subdatasets['wood'].query('Year == "2019"')[feature].mean() for feature in feature_names]
             ],
             [
                 [usgs_subdatasets['plastic'].query('Year == "2021"')[feature].mean() for feature in feature_names],
                 [usgs_subdatasets['water'].query('Year == "2021"')[feature].mean() for feature in feature_names],
                 [usgs_subdatasets['coast'].query('Year == "2021"')[feature].mean() for feature in feature_names],
                 [usgs_subdatasets['wood'].query('Year == "2021"')[feature].mean() for feature in feature_names]
             ],
             [
                 [usgs_subdatasets['plastic_min_50'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['water'][feature].mean() for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names]]

legends = [['Plastic (DART mean)', 'Water (DART mean)', 'Sand (DART mean)'],
           ['Plastic (DART mean)', 'Water (DART mean)', 'Whitecap (DART mean)'],
           ['Plastic (USGS 2019 mean)', 'Water (USGS 2019 mean)', 'Coast (USGS 2019 mean)', 'Wood (USGS 2019 mean)'],
           ['Plastic (USGS 2021 mean)', 'Water (USGS 2021 mean)', 'Coast (USGS 2021 mean)', 'Wood (USGS 2021 mean)'],
           ['Plastic - min 50% (USGS mean)', 'Water (USGS mean)']]

modes = [['lines', 'lines', 'lines'],
         ['lines', 'lines', 'lines'],
         ['dot', 'dot', 'dot', 'dot'],
         ['dash', 'dash', 'dash', 'dash'],
         ['markers+lines', 'markers+lines']]

colors = [['#FF69B4', '#1E90FF', '#FFD700'], 
          ['#FF69B4', '#1E90FF', '#0ff'],
          ['#FF69B4', '#1E90FF', '#FFD700', '#7152bb'],
          ['#FF69B4', '#1E90FF', '#FFD700', '#7152bb'],
          ['#FF69B4', '#1E90FF']]

chart_title = "Class statistics - Mean spectral signatures"
x_title = "Band"
y_title = "Reflectance"
height = 1500
width = 3400
guidance = "horizontal"

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/estatisticas_descritivas/'
chart_name = 'assinaturas_espectrais_medias_por_ano'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021", "DART 2023", "USGS 2019", "USGS 2021", "USGS (cobertura plástica mínima: 50%)"]

legends = [['Plástico (média DART)', 'Água (média DART)', 'Areia (média DART)'],
           ['Plástico (média DART)', 'Água (média DART)', 'Espuma (média DART)'],
           ['Plástico (média USGS 2019)', 'Água (média USGS 2019)', 'Costa (média USGS 2019)', 'Madeira (média USGS 2019)'],
           ['Plástico (média USGS 2021)', 'Água (média USGS 2021)', 'Costa (média USGS 2021)', 'Madeira (média USGS 2021)'],
           ['Plástico - min 50% (média USGS)', 'Água (média USGS)']]

chart_title = "Estatísticas por classe - Assinaturas espectrais médias"
x_title = "Banda"
y_title = "Reflectância"
width = 3700

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

### Correlation statistics

In [None]:
path = 'charts/english/exploratory_analysis/correlation/'
chart_name = 'features_correlation'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["DART 2021 features correlation", 
                  "DART 2023 features correlation", 
                  "USGS 2019 features correlation",
                  "USGS 2021 features correlation"]

datasets = [round(old_dart[feature_names + radiometric_indexes].corr(),2),
            round(dart[feature_names + radiometric_indexes].corr(),2),
            round(usgs_subdatasets['plp2019'][feature_names + radiometric_indexes].corr(),2),
            round(usgs_subdatasets['plp2021'][feature_names + radiometric_indexes].corr(),2)]

x_labels = [feature_names + radiometric_indexes, feature_names + radiometric_indexes,
            feature_names + radiometric_indexes, feature_names + radiometric_indexes]

y_labels = [feature_names + radiometric_indexes, feature_names + radiometric_indexes,
            feature_names + radiometric_indexes, feature_names + radiometric_indexes]
   
colorscale = "RdBu"

chart_title = "Features correlation"

height = 6000

width = 2300

guidance = "vertical"

rsdata_charts.heatmap_chart(datasets_names, datasets, x_labels, y_labels, colorscale, chart_title, height, width,  guidance=guidance, export_name=export_name)

In [None]:
path = 'charts/portugues/analise_exploratoria/correlacao/'
chart_name = 'correlacao'
rsdata_charts.check_path(path)
export_name = path+chart_name

datasets_names = ["Dados DART 2021", "Dados DART 2023", "Dados USGS 2019", "Dados USGS 2021"]

chart_title = "Correlação"

guidance = "vertical"

rsdata_charts.heatmap_chart(datasets_names, datasets, x_labels, y_labels, colorscale, chart_title, height, width,  guidance=guidance, export_name=export_name)

In [None]:
rsdata_charts.heatmap_chart(["DART 2021 plastic features correlation"], 
                                round(old_dart_subdatasets['plastic'][feature_names + radiometric_indexes].corr(),2), 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_name = "charts/english/exploratory_analysis/correlation/dart_plastic_2021"
                                )

rsdata_charts.heatmap_chart(["DART 2021 water features correlation"], 
                                round(old_dart_subdatasets['water'][feature_names + radiometric_indexes].corr(),2),
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical",
                                export_name = "charts/english/exploratory_analysis/correlation/dart_water_2021"
                                )

rsdata_charts.heatmap_chart(["DART 2021 sand features correlation"], 
                                round(old_dart_subdatasets['sand'][feature_names + radiometric_indexes].corr(),2), 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes,
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_name = "charts/english/exploratory_analysis/correlation/dart_sand_2021"
                                )

rsdata_charts.heatmap_chart(["DART 2023 plastic features correlation"], 
                                round(dart_subdatasets['plastic'][feature_names + radiometric_indexes].corr(),2), 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_name = "charts/english/exploratory_analysis/correlation/dart_plastic_2023"
                                )

rsdata_charts.heatmap_chart(["DART 2023 water features correlation"], 
                                round(dart_subdatasets['water'][feature_names + radiometric_indexes].corr(),2),
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical",
                                export_name = "charts/english/exploratory_analysis/correlation/dart_water_2023"
                                )

rsdata_charts.heatmap_chart(["DART 2023 whitecap features correlation"], 
                                round(dart_subdatasets['whitecap'][feature_names + radiometric_indexes].corr(),2), 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes,
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_name = "charts/english/exploratory_analysis/correlation/dart_whitecap_2023"
                                )

In [None]:
rsdata_charts.heatmap_chart(["DART 2021 - Correlação no plástico"], 
                                round(old_dart_subdatasets['plastic'][feature_names + radiometric_indexes].corr(),2), 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_name = "charts/portugues/analise_exploratoria/correlacao/dart_plastico_2021"
                                )

rsdata_charts.heatmap_chart(["DART 2021 - Correlação na água"], 
                                round(old_dart_subdatasets['water'][feature_names + radiometric_indexes].corr(),2),
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical",
                                export_name = "charts/portugues/analise_exploratoria/correlacao/dart_agua_2021"
                                )

rsdata_charts.heatmap_chart(["DART 2021 - Correlação na areia"], 
                                round(old_dart_subdatasets['sand'][feature_names + radiometric_indexes].corr(),2), 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_name = "charts/portugues/analise_exploratoria/correlacao/dart_areia_2021"
                                )

rsdata_charts.heatmap_chart(["DART 2023 - Correlação no plástico"], 
                                round(dart_subdatasets['plastic'][feature_names + radiometric_indexes].corr(),2), 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_name = "charts/portugues/analise_exploratoria/correlacao/dart_plastico_2023"
                                )

rsdata_charts.heatmap_chart(["DART 2023 - Correlação na água"], 
                                round(dart_subdatasets['water'][feature_names + radiometric_indexes].corr(),2),
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical",
                                export_name = "charts/portugues/analise_exploratoria/correlacao/dart_agua_2023"
                                )

rsdata_charts.heatmap_chart(["DART 2023 - Correlação na espuma"], 
                                round(dart_subdatasets['whitecap'][feature_names + radiometric_indexes].corr(),2), 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_name = "charts/portugues/analise_exploratoria/correlacao/dart_espuma_2023"
                                )

In [None]:
datasets_names = [["USGS 2019 plastic features correlation"], 
                  ["USGS 2019 water features correlation"], 
                  ["USGS 2019 coast features correlation"], 
                  ["USGS 2021 plastic features correlation"], 
                  ["USGS 2021 water features correlation"], 
                  ["USGS 2021 coast features correlation"], 
                  ["USGS 2021 wood features correlation"]]

datasets = [round(usgs_subdatasets['plp2019'].query("Label == 'Plastic'")[feature_names + radiometric_indexes].corr(),2),
            round(usgs_subdatasets['plp2019'].query("Label == 'Water'")[feature_names + radiometric_indexes].corr(),2),
            round(usgs_subdatasets['plp2019'].query("Label == 'Coast'")[feature_names + radiometric_indexes].corr(),2),
            round(usgs_subdatasets['plp2021'].query("Label == 'Plastic'")[feature_names + radiometric_indexes].corr(),2),
            round(usgs_subdatasets['plp2021'].query("Label == 'Water'")[feature_names + radiometric_indexes].corr(),2),
            round(usgs_subdatasets['plp2021'].query("Label == 'Coast'")[feature_names + radiometric_indexes].corr(),2),
            round(usgs_subdatasets['plp2021'].query("Label == 'Wood'")[feature_names + radiometric_indexes].corr(),2)]

export_names = ['charts/english/exploratory_analysis/correlation/usgs_plastic_2019', 
                'charts/english/exploratory_analysis/correlation/usgs_water_2019', 
                'charts/english/exploratory_analysis/correlation/usgs_coast_2019', 
                'charts/english/exploratory_analysis/correlation/usgs_plastic_2021', 
                'charts/english/exploratory_analysis/correlation/usgs_water_2021', 
                'charts/english/exploratory_analysis/correlation/usgs_coast_2021', 
                'charts/english/exploratory_analysis/correlation/usgs_wood_2021'] 
                
for i in range(len(datasets_names)):
    rsdata_charts.heatmap_chart(datasets_names[i], 
                                datasets[i], 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_names[i])

In [None]:
datasets_names = [["USGS 2019 - Correlação no plástico"], 
                  ["USGS 2019 - Correlação na água"], 
                  ["USGS 2019 - Correlação na costa"], 
                  ["USGS 2021 - Correlação no plástico"], 
                  ["USGS 2021 - Correlação na água"], 
                  ["USGS 2021 - Correlação na costa"], 
                  ["USGS 2021 - Correlação na madeira"]]

export_names = ['charts/portugues/analise_exploratoria/correlacao/usgs_plastico_2019',
                'charts/portugues/analise_exploratoria/correlacao/usgs_agua_2019',
                'charts/portugues/analise_exploratoria/correlacao/usgs_costa_2019',
                'charts/portugues/analise_exploratoria/correlacao/usgs_plastico_2021',
                'charts/portugues/analise_exploratoria/correlacao/usgs_agua_2021',
                'charts/portugues/analise_exploratoria/correlacao/usgs_costa_2021',
                'charts/portugues/analise_exploratoria/correlacao/usgs_madeira_2021'] 
                     

for i in range(len(datasets_names)):
    rsdata_charts.heatmap_chart(datasets_names[i], 
                                datasets[i], 
                                feature_names + radiometric_indexes, 
                                feature_names + radiometric_indexes, 
                                "RdBu", 
                                " ", 
                                1800, 
                                1800, 
                                "vertical", 
                                export_names[i])

### Histograms, box and scatterplots

In [None]:
path = 'charts/english/exploratory_analysis/histograms/'
rsdata_charts.check_path(path)


datasets_names = ["DART 2021", "DART 2023", "USGS 2019", "USGS 2021",]

traces = [
            [old_dart_subdatasets['plastic'], old_dart_subdatasets['water'], old_dart_subdatasets['sand']],
            [dart_subdatasets['plastic'], dart_subdatasets['water'], dart_subdatasets['whitecap']],
            [usgs_subdatasets['plp2019'].query("Label == 'Plastic'"), 
             usgs_subdatasets['plp2019'].query("Label == 'Water'"), 
             usgs_subdatasets['plp2019'].query("Label == 'Coast'")], 
            [usgs_subdatasets['plp2021'].query("Label == 'Plastic'"), 
             usgs_subdatasets['plp2021'].query("Label == 'Water'"), 
             usgs_subdatasets['plp2021'].query("Label == 'Coast'"),
             usgs_subdatasets['plp2021'].query("Label == 'Wood'")]
            ]
              
labels = [
            ["Plastic", "Water", "Sand"],
            ["Plastic", "Water", "Whitecap"],
            ["Plastic", "Water", "Coast"],
            ["Plastic", "Water", "Coast", "Wood"]
         ]

labels_group = [
            "DART 2021 classes", 
            "DART 2023 classes", 
            "USGS 2019 classes",
            "USGS 2021 classes"
        ]

n_bins = 12

colors = [
            ['#ffadd6', '#73baff', '#ffe766'],        
            ['#ffadd6', '#73baff', '#0ff'],
            ['#ffadd6', '#73baff', '#ffe766'], 
            ['#ffadd6', '#73baff', '#ffe766', '#a38fd3'], 
         ]

y_title = "Reflectance"
height = 1350
width = 4500

for feature in feature_names + radiometric_indexes:
    bands = [feature, feature]
    
    chart_title = "Frequency percent distribution in "+feature
    x_title = feature    
    
    chart_name = feature
    export_name = path+chart_name

    rsdata_charts.stackedbars_chart(datasets_names, traces, bands, n_bins, labels, labels_group, colors, chart_title, x_title, y_title, height, width, export_name)  

In [None]:
labels = [
            ["Plástico", "Água", "Areia"],
            ["Plástico", "Água", "Espuma"],
            ["Plástico", "Água", "Costa"],
            ["Plástico", "Água", "Costa", "Madeira"]
         ]

labels_group = [
            "Classes DART 2021",
            "Classes DART 2023",
            "Classes USGS 2019",
            "Classes USGS 2021",
        ]

y_title = "Reflectância"

path = 'charts/portugues/analise_exploratoria/histogramas/'
rsdata_charts.check_path(path)

for feature in feature_names + radiometric_indexes:
    bands = [feature, feature] #trocar pra só um
    
    chart_title = "Distribuição percentual de frequências no atributo "+feature
    x_title = feature    
    
    chart_name = feature
    export_name = path+chart_name

    rsdata_charts.stackedbars_chart(datasets_names, traces, bands, n_bins, labels, labels_group, colors, chart_title, x_title, y_title, height, width, export_name)  

In [None]:
datasets_names = ["DART 2021 Plastic in Sand", "DART 2021 Plastic in Water", "USGS Plastic in Water"]

labels = [
            ["Sand", "20% plastic", "40% plastic", "60% plastic", "80% plastic", "100% plastic"],    
            ["Water", "20% plastic", "40% plastic", "60% plastic", "80% plastic", "100% plastic"],
            ["Water", "20% plastic", "40% plastic", "60% plastic", "80% plastic", "100% plastic", "Unknown percent"]
         ]

colors = [
            ['#ffe766', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6'],             
            ['#73baff', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6'],
            ['#73baff', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6']            
         ]


y_title = "Reflectance"
height = 2400
width = 3600
guidance="horizontal"

path = 'charts/english/exploratory_analysis/boxplots/2021/'
rsdata_charts.check_path(path)

for feature in feature_names:
    traces = [
                [
                    old_dart_subdatasets['sand'][feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 20")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 40")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 60")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 80")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 100")[feature]
                ],
                [
                    old_dart_subdatasets['water'][feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 20")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 40")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 60")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 80")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 100")[feature]
                ],
                [
                    usgs_subdatasets['water'][feature],
                    usgs_subdatasets['plastic_20'][feature],
                    usgs_subdatasets['plastic_40'][feature],
                    usgs_subdatasets['plastic_60'][feature],
                    usgs_subdatasets['plastic_80'][feature],
                    usgs_subdatasets['plastic_100'][feature],
                    usgs_subdatasets['plastic_unknownpercent'][feature]
                ]
        ]

    x_title = feature
    
    chart_title = feature+" band scattering"

    export_name = path+feature

    rsdata_charts.boxplot_chart(datasets_names, traces, labels, colors, chart_title, x_title, y_title, height, width, guidance, export_name=export_name)

In [None]:
y_title = "Values"

for feature in radiometric_indexes:
    traces = [
                [
                    old_dart_subdatasets['sand'][feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 20")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 40")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 60")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 80")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 100")[feature]
                ],
                [
                    old_dart_subdatasets['water'][feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 20")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 40")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 60")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 80")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 100")[feature]
                ],
                [
                    usgs_subdatasets['water'][feature],
                    usgs_subdatasets['plastic_20'][feature],
                    usgs_subdatasets['plastic_40'][feature],
                    usgs_subdatasets['plastic_60'][feature],
                    usgs_subdatasets['plastic_80'][feature],
                    usgs_subdatasets['plastic_100'][feature],
                    usgs_subdatasets['plastic_unknownpercent'][feature]
                ]
        ]

    x_title = feature
    
    chart_title = feature+" index scattering"

    export_name = path+feature

    rsdata_charts.boxplot_chart(datasets_names, traces, labels, colors, chart_title, x_title, y_title, height, width, guidance, export_name=export_name)

In [None]:
datasets_names = ["DART 2021 - Plástico na areia", "DART 2021 - Plástico na água", "USGS - Plástico na água"]

labels = [
            ["Areia", "20% plástico", "40% plástico", "60% plástico", "80% plástico", "100% plástico"],    
            ["Água", "20% plástico", "40% plástico", "60% plástico", "80% plástico", "100% plástico"],
            ["Água", "20% plástico", "40% plástico", "60% plástico", "80% plástico", "100% plástico", "Percentual desconhecido"]
         ]


y_title = "Reflectância"
height = 1800
width = 3000
guidance="horizontal"

path = 'charts/portugues/analise_exploratoria/boxplots/2021/'
rsdata_charts.check_path(path)

for feature in feature_names:
    traces = [
                [
                    old_dart_subdatasets['sand'][feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 20")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 40")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 60")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 80")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 100")[feature]
                ],
                [
                    old_dart_subdatasets['water'][feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 20")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 40")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 60")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 80")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 100")[feature]
                ],
                [
                    usgs_subdatasets['water'][feature],
                    usgs_subdatasets['plastic_20'][feature],
                    usgs_subdatasets['plastic_40'][feature],
                    usgs_subdatasets['plastic_60'][feature],
                    usgs_subdatasets['plastic_80'][feature],
                    usgs_subdatasets['plastic_100'][feature],
                    usgs_subdatasets['plastic_unknownpercent'][feature]
                ]
        ]


    x_title = feature
    
    chart_title = "Espalhamento na banda "+feature

    export_name = path+feature

    rsdata_charts.boxplot_chart(datasets_names, traces, labels, colors, chart_title, x_title, y_title, height, width, guidance, export_name=export_name)

In [None]:
y_title = "Valores"

for feature in radiometric_indexes:
    traces = [
                [
                    old_dart_subdatasets['sand'][feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 20")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 40")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 60")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 80")[feature],
                    old_dart_subdatasets['plastic_in_sand'].query("Cover_percent == 100")[feature]
                ],
                [
                    old_dart_subdatasets['water'][feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 20")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 40")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 60")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 80")[feature],
                    old_dart_subdatasets['plastic_in_water'].query("Cover_percent == 100")[feature]
                ],
                [
                    usgs_subdatasets['water'][feature],
                    usgs_subdatasets['plastic_20'][feature],
                    usgs_subdatasets['plastic_40'][feature],
                    usgs_subdatasets['plastic_60'][feature],
                    usgs_subdatasets['plastic_80'][feature],
                    usgs_subdatasets['plastic_100'][feature],
                    usgs_subdatasets['plastic_unknownpercent'][feature]
                ]
        ]

    x_title = feature
    
    chart_title = "Espalhamento no índice "+feature

    export_name = path+feature

    rsdata_charts.boxplot_chart(datasets_names, traces, labels, colors, chart_title, x_title, y_title, height, width, guidance, export_name=export_name)

In [None]:
datasets_names = ["DART 2023 Plastic in Whitecap", "DART 2023 Plastic in Water", "USGS Plastic in Water"]

labels = [
            ["Whitecap", "20% plastic", "40% plastic", "60% plastic", "80% plastic", "100% plastic"],    
            ["Water", "20% plastic", "40% plastic", "60% plastic", "80% plastic", "100% plastic"],
            ["Water", "20% plastic", "40% plastic", "60% plastic", "80% plastic", "100% plastic", "Unknown percent"]
         ]

colors = [
            ['#0ff','#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6'],#'#ffe766',             
            ['#73baff', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6'],
            ['#73baff', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6']            
         ]


y_title = "Reflectance"
height = 1800
width = 3100
guidance="horizontal"

path = 'charts/english/exploratory_analysis/boxplots/2023/'
rsdata_charts.check_path(path)

for feature in feature_names:
    traces = [
                [
                    dart_subdatasets['whitecap'][feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 20")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 40")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 60")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 80")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 100")[feature]
                ],
                [
                    dart_subdatasets['water'][feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 20")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 40")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 60")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 80")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 100")[feature]
                ],
                [
                    usgs_subdatasets['water'][feature],
                    usgs_subdatasets['plastic_20'][feature],
                    usgs_subdatasets['plastic_40'][feature],
                    usgs_subdatasets['plastic_60'][feature],
                    usgs_subdatasets['plastic_80'][feature],
                    usgs_subdatasets['plastic_100'][feature],
                    usgs_subdatasets['plastic_unknownpercent'][feature]
                ]
        ]

    x_title = feature
    
    chart_title = feature+" band scattering"

    export_name = path+feature

    rsdata_charts.boxplot_chart(datasets_names, traces, labels, colors, chart_title, x_title, y_title, height, width, guidance, export_name=export_name)

In [None]:
y_title = "Values"

for feature in radiometric_indexes:
    traces = [
                [
                    dart_subdatasets['whitecap'][feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 20")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 40")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 60")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 80")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 100")[feature]
                ],
                [
                    dart_subdatasets['water'][feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 20")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 40")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 60")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 80")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 100")[feature]
                ],
                [
                    usgs_subdatasets['water'][feature],
                    usgs_subdatasets['plastic_20'][feature],
                    usgs_subdatasets['plastic_40'][feature],
                    usgs_subdatasets['plastic_60'][feature],
                    usgs_subdatasets['plastic_80'][feature],
                    usgs_subdatasets['plastic_100'][feature],
                    usgs_subdatasets['plastic_unknownpercent'][feature]
                ]
        ]

    x_title = feature
    
    chart_title = feature+" index scattering"

    export_name = path+feature

    rsdata_charts.boxplot_chart(datasets_names, traces, labels, colors, chart_title, x_title, y_title, height, width, guidance, export_name=export_name)

In [None]:
datasets_names = ["DART - Plástico na espuma", "DART - Plástico na água", "USGS - Plástico na água"]

labels = [
            ["Espuma", "20% plástico", "40% plástico", "60% plástico", "80% plástico", "100% plástico"],    
            ["Água", "20% plástico", "40% plástico", "60% plástico", "80% plástico", "100% plástico"],
            ["Água", "20% plástico", "40% plástico", "60% plástico", "80% plástico", "100% plástico", "Percentual desconhecido"]
         ]

colors = [
            ['#0ff','#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6'],#'#ffe766',             
            ['#73baff', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6'],
            ['#73baff', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6', '#ffadd6']            
         ]


y_title = "Reflectância"
height = 1800
width = 3100
guidance="horizontal"

path = 'charts/portugues/analise_exploratoria/boxplots/2023/'
rsdata_charts.check_path(path)

for feature in feature_names:
    traces = [
                [
                    dart_subdatasets['whitecap'][feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 20")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 40")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 60")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 80")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 100")[feature]
                ],
                [
                    dart_subdatasets['water'][feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 20")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 40")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 60")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 80")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 100")[feature]
                ],
                [
                    usgs_subdatasets['water'][feature],
                    usgs_subdatasets['plastic_20'][feature],
                    usgs_subdatasets['plastic_40'][feature],
                    usgs_subdatasets['plastic_60'][feature],
                    usgs_subdatasets['plastic_80'][feature],
                    usgs_subdatasets['plastic_100'][feature],
                    usgs_subdatasets['plastic_unknownpercent'][feature]
                ]
        ]

    x_title = feature
    
    chart_title = "Espalhamento na banda "+feature

    export_name = path+feature

    rsdata_charts.boxplot_chart(datasets_names, traces, labels, colors, chart_title, x_title, y_title, height, width, guidance, export_name=export_name)

In [None]:
y_title = "Valores"

for feature in radiometric_indexes:
    traces = [
                [
                    dart_subdatasets['whitecap'][feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 20")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 40")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 60")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 80")[feature],
                    dart_subdatasets['plastic_in_whitecap'].query("Cover_percent == 100")[feature]
                ],
                [
                    dart_subdatasets['water'][feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 20")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 40")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 60")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 80")[feature],
                    dart_subdatasets['plastic_in_water'].query("Cover_percent == 100")[feature]
                ],
                [
                    usgs_subdatasets['water'][feature],
                    usgs_subdatasets['plastic_20'][feature],
                    usgs_subdatasets['plastic_40'][feature],
                    usgs_subdatasets['plastic_60'][feature],
                    usgs_subdatasets['plastic_80'][feature],
                    usgs_subdatasets['plastic_100'][feature],
                    usgs_subdatasets['plastic_unknownpercent'][feature]
                ]
        ]

    x_title = feature
    
    chart_title = "Espalhamento no índice "+feature

    export_name = path+feature

    rsdata_charts.boxplot_chart(datasets_names, traces, labels, colors, chart_title, x_title, y_title, height, width, guidance, export_name=export_name)

In [None]:
datasets_names = ["DART 2021", "DART 2023", "USGS 2019", "USGS 2021"]

traces = [
            [old_dart_subdatasets['plastic'], old_dart_subdatasets['water'], old_dart_subdatasets['sand']],
            [dart_subdatasets['plastic'], dart_subdatasets['water'], dart_subdatasets['whitecap']],
            [usgs_subdatasets['plastic'].query('Year == "2019"'), 
             usgs_subdatasets['water'].query('Year == "2019"'), 
             usgs_subdatasets['coast'].query('Year == "2019"')],
            [usgs_subdatasets['plastic'].query('Year == "2021"'), 
             usgs_subdatasets['water'].query('Year == "2021"'), 
             usgs_subdatasets['coast'].query('Year == "2021"'), 
             usgs_subdatasets['wood'].query('Year == "2021"')]
         ]
    
labels = [
            ["Plastic", "Water", "Sand"],
            ["Plastic", "Water", "Whitecap"],
            ["Plastic", "Water", "Coast"],
            ["Plastic", "Water", "Coast", "Wood"]
         ]

labels_group = [
                "DART 2021 classes",
                "DART 2023 classes",
                "USGS 2019 classes",
                "USGS 2021 classes",
                ]


colors = [
            ['#ffadd6', '#73baff', '#ffe766'], 
            ['#ffadd6', '#73baff', '#0ff'],
            ['#ffadd6', '#73baff', '#ffe766'], 
            ['#ffadd6', '#73baff', '#ffe766', '#a38fd3']
         ]


chart_title = "Classes scattering"
height = 1200
width = 4200

x_title = "Pixel Ids"
path = 'charts/english/exploratory_analysis/scatter/'
rsdata_charts.check_path(path)

for feature in feature_names + radiometric_indexes:
    y = feature
    y_title = feature
    
    rsdata_charts.check_path(path+feature+"/")
    export_name = path+feature+"/"+y

    rsdata_charts.scatter_chart_und(datasets_names, traces, y, labels, labels_group, colors, chart_title, x_title, y_title, height, width, legend_orientation="v", guidance="horizontal", export_name=export_name) 

In [None]:
labels = [
            ["Plástico", "Água", "Areia"],
            ["Plástico", "Água", "Espuma"],
            ["Plástico", "Água", "Costa"],
            ["Plástico", "Água", "Costa", "Madeira"]
         ]

labels_group = [
                "Classes DART 2021", 
                "Classes DART 2023", 
                "Classes USGS 2019",
                "Classes USGS 2021"
                ]

chart_title = "Dispersão das classes"
height = 1200
width = 4200

path = 'charts/portugues/analise_exploratoria/dispersao/'
rsdata_charts.check_path(path)

for feature in feature_names + radiometric_indexes:
    y = feature
    y_title = feature
    
    rsdata_charts.check_path(path+feature+"/")
    export_name = path+feature+"/"+y

    rsdata_charts.scatter_chart_und(datasets_names, traces, y, labels, labels_group, colors, chart_title, x_title, y_title, height, width, legend_orientation="v", guidance="horizontal", export_name=export_name) 

In [None]:
datasets_names = ["DART 2021", "DART 2023", "USGS 2019", "USGS 2021"]

traces = [
            [old_dart_subdatasets['plastic'], old_dart_subdatasets['water'], old_dart_subdatasets['sand']],
            [dart_subdatasets['plastic'], dart_subdatasets['water'], dart_subdatasets['whitecap']],
            [usgs_subdatasets['plastic'].query('Year == "2019"'), 
             usgs_subdatasets['water'].query('Year == "2019"'), 
             usgs_subdatasets['coast'].query('Year == "2019"')],
            [usgs_subdatasets['plastic'].query('Year == "2021"'), 
             usgs_subdatasets['water'].query('Year == "2021"'), 
             usgs_subdatasets['coast'].query('Year == "2021"'), 
             usgs_subdatasets['wood'].query('Year == "2021"')]
         ]
    
labels = [
            ["Plastic", "Water", "Sand"],
            ["Plastic", "Water", "Whitecap"],
            ["Plastic", "Water", "Coast"],
            ["Plastic", "Water", "Coast", "Wood"]
         ]

labels_group = [
                "DART 2021 classes",
                "DART 2023 classes",
                "USGS 2019 classes",
                "USGS 2021 classes",
                ]


colors = [
            ['#ffadd6', '#73baff', '#ffe766'], 
            ['#ffadd6', '#73baff', '#0ff'],
            ['#ffadd6', '#73baff', '#ffe766'], 
            ['#ffadd6', '#73baff', '#ffe766', '#a38fd3']
         ]

chart_title = "Classes scattering"
height = 1200
width = 4600

path = 'charts/english/exploratory_analysis/scatter/'

for feature_a in feature_names + radiometric_indexes:
    x = feature_a
    x_title = feature_a
    for feature_b in feature_names + radiometric_indexes:
        y = feature_b
        y_title = feature_b
        rsdata_charts.check_path(path+feature_a+"/")
        export_name = path+feature_a+"/"+x+"_x_"+y

        rsdata_charts.scatter_chart(datasets_names, traces, x, y, labels, labels_group, colors, chart_title, x_title, y_title, height, width, legend_orientation="v", guidance="horizontal", export_name=export_name) 

In [None]:
labels = [
            ["Plástico", "Água", "Areia"],
            ["Plástico", "Água", "Espuma"],
            ["Plástico", "Água", "Costa"],
            ["Plástico", "Água", "Costa", "Madeira"]
         ]

labels_group = [
                "Classes DART 2021", 
                "Classes DART 2023", 
                "Classes USGS 2019",
                "Classes USGS 2021"
                ]

chart_title = "Dispersão das classes"
height = 1200
width = 4600

path = "charts/portugues/analise_exploratoria/dispersao/"

for feature_a in feature_names + radiometric_indexes:
    x = feature_a
    x_title = feature_a
    for feature_b in feature_names + radiometric_indexes:
        y = feature_b
        y_title = feature_b
        
        rsdata_charts.check_path(path+feature_a+"/")
        export_name = path+feature_a+"/"+x+"_x_"+y

        rsdata_charts.scatter_chart(datasets_names, traces, x, y, labels, labels_group, colors, chart_title, x_title, y_title, height, width, legend_orientation="v", guidance="horizontal", export_name=export_name) 

We enhanced the datasets with additional detail to support the generation of visualizations containing more granular information.

In [None]:
old_dart['Status'] = ['Dry' if polymer[0] == 'LDPE' else 'Dry' if polymer[0] == 'Nylon'
                              else 'Dry' if polymer[0] == 'PET'  else 'Dry' if polymer[0] == 'PP'
                              else 'Dry' if polymer[0] == 'PVC' else 'Wet' if polymer[0] == 'MicroNapo'
                              else 'erro' for polymer in zip(old_dart['Polymer'])]

In [None]:
old_dart['Color'] = ['Transparent' if polymer[0] == 'LDPE' 
                             or polymer[0] == 'Nylon'
                             or polymer[0] == 'PET'
                             or polymer[0] == 'PP'
                             or polymer[0] == 'PVC' 
                             else 'Mixed' if polymer[0] == 'MicroNapo'
                             else 'erro' for polymer in zip(old_dart['Polymer'])]
    
old_dart['Detailed_status'] = ['Dry' if status == 'Dry' 
                                   else 'Wet' if status == 'Wet' or status == '-'
                                   else 'Submerged 2cm' if status == 'Submerged' 
                                   and submergence == '2 cm'
                                   else 'Submerged 5cm' if status == 'Submerged' 
                                   and submergence == '5 cm'
                                   else 'erro' for status, submergence in 
                                   zip(old_dart['Status'], old_dart['Submergence'])]

old_dart['Detailed_label'] = ['Water' if label == 'Water' else 'Sand' if label == 'Sand'
                                         else 'Whitecap' if label == 'Whitecap'
                                         else 'Dry LDPE 100% Transparent' if label == 'Plastic' 
                                     and polymer == 'LDPE' and percent == 100 and color == 'Transparent' 
                                         else 'Dry LDPE 80% Transparent' if label == 'Plastic' 
                                     and polymer == 'LDPE' and percent == 80 and color == 'Transparent'
                                         else 'Dry LDPE 60% Transparent' if label == 'Plastic' 
                                     and polymer == 'LDPE' and percent == 60 and color == 'Transparent'
                                         else 'Dry LDPE 40% Transparent' if label == 'Plastic' 
                                     and polymer == 'LDPE' and percent == 40 and color == 'Transparent'
                                         else 'Dry LDPE 20% Transparent' if label == 'Plastic' 
                                     and polymer == 'LDPE' and percent == 20 and color == 'Transparent'
                                         else 'Dry PET 100% Transparent' if label == 'Plastic' 
                                     and polymer == 'PET' and percent == 100 and color == 'Transparent' 
                                         else 'Dry PET 80% Transparent' if label == 'Plastic' 
                                     and polymer == 'PET' and percent == 80 and color == 'Transparent'
                                         else 'Dry PET 60% Transparent' if label == 'Plastic' 
                                     and polymer == 'PET' and percent == 60 and color == 'Transparent'
                                         else 'Dry PET 40% Transparent' if label == 'Plastic' 
                                     and polymer == 'PET' and percent == 40 and color == 'Transparent'
                                         else 'Dry PET 20% Transparent' if label == 'Plastic' 
                                     and polymer == 'PET' and percent == 20 and color == 'Transparent'
                                         else 'Dry PP 100% Transparent' if label == 'Plastic' 
                                     and polymer == 'PP' and percent == 100 and color == 'Transparent' 
                                         else 'Dry PP 80% Transparent' if label == 'Plastic' 
                                     and polymer == 'PP' and percent == 80 and color == 'Transparent'
                                         else 'Dry PP 60% Transparent' if label == 'Plastic' 
                                     and polymer == 'PP' and percent == 60 and color == 'Transparent'
                                         else 'Dry PP 40% Transparent' if label == 'Plastic' 
                                     and polymer == 'PP' and percent == 40 and color == 'Transparent'
                                         else 'Dry PP 20% Transparent' if label == 'Plastic' 
                                     and polymer == 'PP' and percent == 20 and color == 'Transparent'
                                         else 'Dry Nylon 100% Transparent' if label == 'Plastic' 
                                     and polymer == 'Nylon' and percent == 100 and color == 'Transparent' 
                                         else 'Dry Nylon 80% Transparent' if label == 'Plastic' 
                                     and polymer == 'Nylon' and percent == 80 and color == 'Transparent'
                                         else 'Dry Nylon 60% Transparent' if label == 'Plastic' 
                                     and polymer == 'Nylon' and percent == 60 and color == 'Transparent'
                                         else 'Dry Nylon 40% Transparent' if label == 'Plastic' 
                                     and polymer == 'Nylon' and percent == 40 and color == 'Transparent'
                                         else 'Dry Nylon 20% Transparent' if label == 'Plastic' 
                                     and polymer == 'Nylon' and percent == 20 and color == 'Transparent'
                                         else 'Dry PVC 100% Transparent' if label == 'Plastic' 
                                     and polymer == 'PVC' and percent == 100 and color == 'Transparent' 
                                         else 'Dry PVC 80% Transparent' if label == 'Plastic' 
                                     and polymer == 'PVC' and percent == 80 and color == 'Transparent'
                                         else 'Dry PVC 60% Transparent' if label == 'Plastic' 
                                     and polymer == 'PVC' and percent == 60 and color == 'Transparent'
                                         else 'Dry PVC 40% Transparent' if label == 'Plastic' 
                                     and polymer == 'PVC' and percent == 40 and color == 'Transparent'
                                         else 'Dry PVC 20% Transparent' if label == 'Plastic' 
                                     and polymer == 'PVC' and percent == 20 and color == 'Transparent'
                                         else 'Wet MicroNapo 100% Mixed colors' if label == 'Plastic' 
                                     and polymer == 'MicroNapo' and percent == 100 and color == 'Mixed'
                                         else 'Wet MicroNapo 80% Mixed colors' if label == 'Plastic' 
                                     and polymer == 'MicroNapo' and percent == 80 and color == 'Mixed'
                                         else 'Wet MicroNapo 60% Mixed colors' if label == 'Plastic' 
                                     and polymer == 'MicroNapo' and percent == 60 and color == 'Mixed'
                                         else 'Wet MicroNapo 40% Mixed colors' if label == 'Plastic' 
                                     and polymer == 'MicroNapo' and percent == 40 and color == 'Mixed'
                                         else 'Wet MicroNapo 20% Mixed colors' if label == 'Plastic' 
                                     and polymer == 'MicroNapo' and percent == 20 and color == 'Mixed'
                                          else 'erro' for label, percent, polymer, color 
                                     in zip(old_dart['Label'], old_dart['Cover_percent'], old_dart['Polymer'], 
                                            old_dart['Color'])]

In [None]:
for feature_a in feature_names + radiometric_indexes:
    x = feature_a
    x_title = feature_a
    for feature_b in feature_names + radiometric_indexes:
        y = feature_b
        y_title = feature_b

        plt.figure(figsize=(30, 10))
        
        plt.subplot(1,2,1)
        g =sns.scatterplot(x=x, y=y,
                      hue="Detailed_label",
                      data=old_dart,
                      palette=['#2772ff', #azul para água
                                   '#ffe500', #amarelo ldpe 100%
                                   '#000', #preto areia+
                                   '#fff599','#ffef66','#ffea33', '#ffe822', #amarelo ldpes
                                   '#060','#0f0','#0c0','#0a0','#080',#verde escuro micronapo
                                   '#3d2475','#7956cb','#653cc3','#5734a9', '#4a2c8f',#roxo nylon
                                   '#da0068','#ff0e81','#ff96c8','#ff74b6','#ff52a5',#rosa pet
                                   '#b95700','#ff9b42','#ff8920','#fd7700','#db6700',#laranja pp        
                                   '#800','#f44','#f11','#d00','#a00'], #vermelho pvc  
                      legend='auto');

        plt.subplot(1,2,2)
        sns.scatterplot(x=x, y=y,
                          hue="Detailed_label",
                          data=old_dart,
                          palette=['#2772ff', #azul para água
                                   '#ffe500', #amarelo ldpe 100%
                                   '#000', #preto areia+
                                   '#fff599','#ffef66','#ffea33', '#ffe822', #amarelo ldpes
                                   '#060','#0f0','#0c0','#0a0','#080',#verde escuro micronapo
                                   '#3d2475','#7956cb','#653cc3','#5734a9', '#4a2c8f',#roxo nylon
                                   '#da0068','#ff0e81','#ff96c8','#ff74b6','#ff52a5',#rosa pet
                                   '#b95700','#ff9b42','#ff8920','#fd7700','#db6700',#laranja pp        
                                   '#800','#f44','#f11','#d00','#a00'], #vermelho pvc 
                          legend='auto').set(xscale="log");
        
        path = 'charts/english/exploratory_analysis/scatter/detailed_label/dart_2021/'
        rsdata_charts.check_path(path+feature_a+"/")
        export_name = path+feature_a+"/"+x+"_x_"+y

        plt.savefig(path+feature_a+"/"+x+"_x_"+y+".png", format='png')

    #plt.show()

In [None]:
for feature_a in feature_names + radiometric_indexes:
    x = feature_a
    x_title = feature_a
    for feature_b in feature_names + radiometric_indexes:
        y = feature_b
        y_title = feature_b

        plt.figure(figsize=(30, 10))
        
        plt.subplot(1,2,1)
        g =sns.scatterplot(x=x, y=y,
                      hue="Detailed_label",
                      data=old_dart,
                      palette=['#2772ff', #azul para água
                                   '#ffe500', #amarelo ldpe 100%
                                   '#000', #preto areia+
                                   '#fff599','#ffef66','#ffea33', '#ffe822', #amarelo ldpes
                                   '#060','#0f0','#0c0','#0a0','#080',#verde escuro micronapo
                                   '#3d2475','#7956cb','#653cc3','#5734a9', '#4a2c8f',#roxo nylon
                                   '#da0068','#ff0e81','#ff96c8','#ff74b6','#ff52a5',#rosa pet
                                   '#b95700','#ff9b42','#ff8920','#fd7700','#db6700',#laranja pp        
                                   '#800','#f44','#f11','#d00','#a00'], #vermelho pvc  
                      legend='auto');

        plt.subplot(1,2,2)
        sns.scatterplot(x=x, y=y,
                          hue="Detailed_label",
                          data=old_dart,
                          palette=['#2772ff', #azul para água
                                   '#ffe500', #amarelo ldpe 100%
                                   '#000', #preto areia+
                                   '#fff599','#ffef66','#ffea33', '#ffe822', #amarelo ldpes
                                   '#060','#0f0','#0c0','#0a0','#080',#verde escuro micronapo
                                   '#3d2475','#7956cb','#653cc3','#5734a9', '#4a2c8f',#roxo nylon
                                   '#da0068','#ff0e81','#ff96c8','#ff74b6','#ff52a5',#rosa pet
                                   '#b95700','#ff9b42','#ff8920','#fd7700','#db6700',#laranja pp        
                                   '#800','#f44','#f11','#d00','#a00'], #vermelho pvc 
                          legend='auto').set(xscale="log");
        
        path = 'charts/portugues/analise_exploratoria/dispersao/rotulo_detalhado/dart_2021/'
        rsdata_charts.check_path(path+feature_a+"/")
        export_name = path+feature_a+"/"+x+"_x_"+y

        plt.savefig(path+feature_a+"/"+x+"_x_"+y+".png", format='png')

    #plt.show()

In [None]:
for feature_a in feature_names + radiometric_indexes:
    x = feature_a
    x_title = feature_a
    for feature_b in feature_names + radiometric_indexes:
        y = feature_b
        y_title = feature_b

        plt.figure(figsize=(30, 10))
        
        plt.subplot(1,2,1)
        g =sns.scatterplot(x=x, y=y,
                  hue="Detailed_label",
                  data=dataset_dart,
                      palette=['#2772ff',
                               '#4a2c8f','#7956cb','#653cc3','#5734a9',#roxo
                               '#ff0e81','#ff96c8','#ff74b6','#ff52a5',#rosa
                               '#db6700','#ff9b42','#ff8920','#fd7700',#laranja
                               '#ffe500','#fff599','#ffef66','#ffea33',#amarelo
                               '#a00','#f44','#f11','#d00',#vermelho
                               '#0f0','#afa','#6f6','#9f9',#verde claro
                               '#544221','#9d7b3e','#846835','#6c552b',#marrom
                               '#060','#0c0','#0a0','#080',#verde escuro
                               '#04ffff'], legend='auto');

        plt.subplot(1,2,2)
        sns.scatterplot(x=x, y=y,
                  hue="Detailed_label",
                  data=dataset_dart, 
                          palette=['#2772ff',
                                   '#4a2c8f','#7956cb','#653cc3','#5734a9',#roxo
                                   '#ff0e81','#ff96c8','#ff74b6','#ff52a5',#rosa
                                   '#db6700','#ff9b42','#ff8920','#fd7700',#laranja
                                   '#ffe500','#fff599','#ffef66','#ffea33',#amarelo
                                   '#a00','#f44','#f11','#d00',#vermelho
                                   '#0f0','#afa','#6f6','#9f9',#verde claro
                                   '#544221','#9d7b3e','#846835','#6c552b',#marrom
                                   '#060','#0c0','#0a0','#080',#verde escuro
                                   '#04ffff'], legend='auto').set(xscale="log");

        path = 'charts/english/exploratory_analysis/scatter/detailed_label/dart_2023/'
        rsdata_charts.check_path(path+feature_a+"/")
        export_name = path+feature_a+"/"+x+"_x_"+y

        plt.savefig(path+feature_a+"/"+x+"_x_"+y+".png", format='png')
    #plt.show()

In [None]:
for feature_a in feature_names + radiometric_indexes:
    x = feature_a
    x_title = feature_a
    for feature_b in feature_names + radiometric_indexes:
        y = feature_b
        y_title = feature_b

        plt.figure(figsize=(30, 10))
        
        plt.subplot(1,2,1)
        g =sns.scatterplot(x=x, y=y,
                  hue="Detailed_label",
                  data=dataset_dart,
                      palette=['#2772ff',
                               '#4a2c8f','#7956cb','#653cc3','#5734a9',#roxo
                               '#ff0e81','#ff96c8','#ff74b6','#ff52a5',#rosa
                               '#db6700','#ff9b42','#ff8920','#fd7700',#laranja
                               '#ffe500','#fff599','#ffef66','#ffea33',#amarelo
                               '#a00','#f44','#f11','#d00',#vermelho
                               '#0f0','#afa','#6f6','#9f9',#verde claro
                               '#544221','#9d7b3e','#846835','#6c552b',#marrom
                               '#060','#0c0','#0a0','#080',#verde escuro
                               '#04ffff'], legend='auto');

        plt.subplot(1,2,2)
        sns.scatterplot(x=x, y=y,
                  hue="Detailed_label",
                  data=dataset_dart, 
                          palette=['#2772ff',
                                   '#4a2c8f','#7956cb','#653cc3','#5734a9',#roxo
                                   '#ff0e81','#ff96c8','#ff74b6','#ff52a5',#rosa
                                   '#db6700','#ff9b42','#ff8920','#fd7700',#laranja
                                   '#ffe500','#fff599','#ffef66','#ffea33',#amarelo
                                   '#a00','#f44','#f11','#d00',#vermelho
                                   '#0f0','#afa','#6f6','#9f9',#verde claro
                                   '#544221','#9d7b3e','#846835','#6c552b',#marrom
                                   '#060','#0c0','#0a0','#080',#verde escuro
                                   '#04ffff'], legend='auto').set(xscale="log");

        path = 'charts/portugues/analise_exploratoria/dispersao/rotulo_detalhado/dart_2023/'
        rsdata_charts.check_path(path+feature_a+"/")
        export_name = path+feature_a+"/"+x+"_x_"+y

        plt.savefig(path+feature_a+"/"+x+"_x_"+y+".png", format='png')
    #plt.show()

In [None]:
import plotly.express as px

for feature_a in feature_names + radiometric_indexes:
    x = feature_a
    x_title = feature_a
    for feature_b in feature_names + radiometric_indexes:
        y = feature_b
        y_title = feature_b
        
        
        
        path = 'charts/english/exploratory_analysis/scatter/detailed_polymer/dart_2021/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__polymer'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(old_dart, x=x, y=y, color="Cover_percent", facet_col="Polymer")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2021 (simulated)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))
            

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
        
        
        
        path = 'charts/english/exploratory_analysis/scatter/detailed_polymer/dart_2023/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__polymer'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(dataset_dart, x=x, y=y, color="Cover_percent", facet_col="Polymer")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2023 (simulated)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
        
        
        path = 'charts/english/exploratory_analysis/scatter/detailed_polymer/usgs/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__polymer'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(usgs, x=x, y=y, color="Cover_percent", facet_col="Polymer")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='USGS (observed)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
        
        
        
        path = 'charts/english/exploratory_analysis/scatter/detailed_status/dart_2021/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__status'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(old_dart, x=x, y=y, color="Cover_percent", facet_col="Detailed_status")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2021 (simulated)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
            
        path = 'charts/english/exploratory_analysis/scatter/detailed_status/dart_2023/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__status'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(dart, x=x, y=y, color="Cover_percent", facet_col="Detailed_status")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2023 (simulated)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
            
        path = 'charts/english/exploratory_analysis/scatter/detailed_status/usgs/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__status'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(usgs, x=x, y=y, color="Cover_percent", facet_col="Detailed_status")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='USGS (observed)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
            
        
        
        path = 'charts/english/exploratory_analysis/scatter/detailed_color/dart_2021/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__color'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(old_dart, x=x, y=y, color="Cover_percent", facet_col="Color")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2021 (simulated)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
        
        path = 'charts/english/exploratory_analysis/scatter/detailed_color/dart_2023/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__color'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(dart, x=x, y=y, color="Cover_percent", facet_col="Color")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2023 (simulated)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())

        path = 'charts/english/exploratory_analysis/scatter/detailed_color/usgs/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__color'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(dart, x=x, y=y, color="Cover_percent", facet_col="Color")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='USGS (observed)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
            
            
            
            
        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_polimero/dart_2021/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__polimero'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(old_dart, x=x, y=y, color="Cover_percent", facet_col="Polímero")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2021 (simulado)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
        
        
        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_polimero/dart_2023/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__polimero'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(dataset_dart, x=x, y=y, color="Cover_percent", facet_col="Polímero")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2023 (simulado)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
        
        
        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_polimero/usgs/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__polimero'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(usgs, x=x, y=y, color="Cover_percent", facet_col="Polímero")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='USGS (observadp)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
        
        
        
        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_status/dart_2021/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__status'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(old_dart, x=x, y=y, color="Cover_percent", facet_col="Detailed_status")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2021 (simulado)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
            
        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_status/dart_2023/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__status'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(dart, x=x, y=y, color="Cover_percent", facet_col="Detailed_status")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2023 (simulado)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
            
        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_status/usgs/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__status'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(usgs, x=x, y=y, color="Cover_percent", facet_col="Detailed_status")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='USGS (observado)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
            
        
        
        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_cor/dart_2021/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__cor'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(old_dart, x=x, y=y, color="Cover_percent", facet_col="Color")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2021 (simulado)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())
        
        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_cor/dart_2023/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__cor'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(dart, x=x, y=y, color="Cover_percent", facet_col="Color")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='DART 2023 (simulado)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())

        path = 'charts/portugues/analise_exploratoria/dispersao/detalhes_cor/usgs/'
        rsdata_charts.check_path(path)
        export_name = path+feature_a+'/'+feature_a+'_x_'+feature_b+'__cor'
        rsdata_charts.check_path(path+feature_a)
        
        with open(export_name+".html", 'a', encoding='utf-8') as f:
            fig = px.scatter(dart, x=x, y=y, color="Cover_percent", facet_col="Color")#, trendline="ols"

            fig.update_layout(height=1800, width=2400, title_text='USGS (obserado)', template = 'plotly_white')
            fig.update_layout(font=dict(size=36))
            fig.update_layout(annotations=[dict(font=dict(size=36))])
            fig.update_layout(margin=dict(t=150))

            fig.write_image(export_name+".jpeg")
            f.write(fig.to_html())

In [None]:
#21 e usgs
water = dataset_dart.query("Label == 'Water' and Blue == 0.0324 and NIR2 == 0.0207")[feature_names + radiometric_indexes] #Maioria disparada com essa refletancia na Blue (havia alteracoes pequenas nas bandas reamostradas, escolhi valor de NIR2 arbitrariamente)
water = water.drop_duplicates()
whitecap = dataset_dart.query("Label == 'Whitecap'")[feature_names + radiometric_indexes]
whitecap = whitecap.drop_duplicates()
pet_dry_100 = dataset_dart.query("Polymer == 'PET' and Cover_percent == 100")[feature_names + radiometric_indexes]
pet_dry_100 = pet_dry_100.drop_duplicates()
ldpe_dry_100 = dataset_dart.query("Polymer == 'LDPE' and Cover_percent == 100")[feature_names + radiometric_indexes]
ldpe_dry_100 = ldpe_dry_100.drop_duplicates()
pp_wet_orange_100 = dataset_dart.query("Polymer == 'PP' and Color == 'Orange' and Status == 'Wet' and Cover_percent == 100")[feature_names + radiometric_indexes]
pp_wet_orange_100 = pp_wet_orange_100.drop_duplicates()
pp_wet_white_100 = dataset_dart.query("Polymer == 'PP' and Color == 'White' and Status == 'Wet' and Cover_percent == 100")[feature_names + radiometric_indexes]
pp_wet_white_100 = pp_wet_white_100.drop_duplicates()
pp_sub2_orange_100 = dataset_dart.query("Polymer == 'PP' and Color == 'Orange' and Status == 'Submerged' and Submergence == '2 cm' and Cover_percent == 100")[feature_names + radiometric_indexes]
pp_sub2_orange_100 = pp_sub2_orange_100.drop_duplicates()
pp_sub2_white_100 = dataset_dart.query("Polymer == 'PP' and Color == 'White' and Status == 'Submerged' and Submergence == '2 cm' and Cover_percent == 100")[feature_names + radiometric_indexes]
pp_sub2_white_100 = pp_sub2_white_100.drop_duplicates()
pp_sub5_orange_100 = dataset_dart.query("Polymer == 'PP' and Color == 'Orange' and Status == 'Submerged' and Submergence == '5 cm' and Cover_percent == 100")[feature_names + radiometric_indexes]
pp_sub5_orange_100 = pp_sub5_orange_100.drop_duplicates()
pp_sub5_white_100 = dataset_dart.query("Polymer == 'PP' and Color == 'White' and Status == 'Submerged' and Submergence == '5 cm' and Cover_percent == 100")[feature_names + radiometric_indexes]
pp_sub5_white_100 = pp_sub5_white_100.drop_duplicates()

In [None]:
datasets_names = ["DART polymers", "DART polymers"]

traces = [
             [
                 [water[feature].values[0] for feature in feature_names],
                 [whitecap[feature].values[0] for feature in feature_names],
                 [pet_dry_100[feature].values[0] for feature in feature_names],
                 [ldpe_dry_100[feature].values[0] for feature in feature_names],
                 [pp_wet_orange_100[feature].values[0] for feature in feature_names],
                 [pp_wet_white_100[feature].values[0] for feature in feature_names],
                 [pp_sub2_orange_100[feature].values[0] for feature in feature_names],
                 [pp_sub2_white_100[feature].values[0] for feature in feature_names],
                 [pp_sub5_orange_100[feature].values[0] for feature in feature_names],
                 [pp_sub5_white_100[feature].values[0] for feature in feature_names]
             ],
             [
                 [water[ind].values[0] for ind in radiometric_indexes],
                 [whitecap[ind].values[0] for ind in radiometric_indexes],
                 [pet_dry_100[ind].values[0] for ind in radiometric_indexes],
                 [ldpe_dry_100[ind].values[0] for ind in radiometric_indexes],
                 [pp_wet_orange_100[ind].values[0] for ind in radiometric_indexes],
                 [pp_wet_white_100[ind].values[0] for ind in radiometric_indexes],
                 [pp_sub2_orange_100[ind].values[0] for ind in radiometric_indexes],
                 [pp_sub2_white_100[ind].values[0] for ind in radiometric_indexes],
                 [pp_sub5_orange_100[ind].values[0] for ind in radiometric_indexes],
                 [pp_sub5_white_100[ind].values[0] for ind in radiometric_indexes]
             ]
          ]

labels = [
            [feature_names, feature_names, feature_names, feature_names, feature_names, 
             feature_names, feature_names, feature_names, feature_names, feature_names],
            [radiometric_indexes, radiometric_indexes, radiometric_indexes, radiometric_indexes, radiometric_indexes, 
             radiometric_indexes, radiometric_indexes, radiometric_indexes, radiometric_indexes, radiometric_indexes]
            ]

legends = [['Water', 'Whitecap', 'Dry PET (100%)', 'Dry LDPE (100%)', 'Wet Orange PP (100%)',
           'Wet White PP (100%)', 'Sub 2cm Orange PP (100%)', 'Sub 2cm White PP (100%)', 
            'Sub 5cm Orange PP (100%)', 'Sub 5cm White PP (100%)'],
           ['Water', 'Whitecap', 'Dry PET (100%)', 'Dry LDPE (100%)', 'Wet Orange PP (100%)',
           'Wet White PP (100%)', 'Sub 2cm Orange PP (100%)', 'Sub 2cm White PP (100%)', 
            'Sub 5cm Orange PP (100%)', 'Sub 5cm White PP (100%)']]

modes = [
            ['markers+lines', 'markers+lines', 'markers+lines', 'markers+lines', 'markers+lines', 
             'markers+lines', 'markers+lines', 'markers+lines', 'markers+lines', 'markers+lines'],
            ['markers+lines', 'markers+lines', 'markers+lines', 'markers+lines', 'markers+lines', 
             'markers+lines', 'markers+lines', 'markers+lines', 'markers+lines', 'markers+lines']
        ]

colors = [
            ['#0049d1', '#00f9ed', '#ff3093', '#0f0', '#ff9b42', 
             '#6f7376', '#fd7700', '#46484a', '#b95700','#0c0d0d'],
            ['#0049d1', '#00f9ed', '#ff3093', '#0f0', '#ff9b42', 
             '#6f7376', '#fd7700', '#46484a', '#b95700','#0c0d0d']
         ]


chart_title = "Spectral signatures by polymer"
x_title = "Band"
y_title = "Reflectance"
height = 1800
width = 3900
guidance = "horizontal"

rsdata_charts.check_path('charts/english/exploratory_analysis/artificial_band/')
export_name = 'charts/english/exploratory_analysis/artificial_band/polymers_signature'

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

## Unsupervised classification

We used the KMeans algorithm for unsupervised classification with the goal of observing patterns of association between elements present in simulated and observed scenes. We selected different configurations for feature engineering and hyperparameters based on the results of prior exploratory analysis, and generated visualizations for both statistical and spatial evaluation of the unsupervised classification. You must have created the folders `'charts/english/unsupervised_classification/kmeans/'` and `'charts/portugues/classificacao_nao_supervisionada/kmeans/'` to store the visualizations."

#### DART 2021

In [None]:
old_dart['Simple_Path'] = old_dart['Path']
paths = list(set(old_dart['Simple_Path']))

In [None]:
feature_sets = [
                    feature_names + radiometric_indexes,
                    feature_names,
                    radiometric_indexes
                ]

n_clusters = [4]   

ids = ["both", "bands", "indexes"]

identificadores = ["ambos", "bandas", "indices"]

In [None]:
for i in range(len(feature_sets)):

    for n in n_clusters:
        path = 'charts/english/unsupervised_classification/kmeans/'+ids[i]+'/dart_2021/k'+str(n)+'/'
        rsdata_charts.check_path(path)

        caminho = 'charts/portugues/classificacao_nao_supervisionada/kmeans/'+identificadores[i]+'/dart_2021/k'+str(n)+'/'
        rsdata_charts.check_path(caminho)
        
        names_clusters = [] 
        classified_data = rsdata_classification.k_means(old_dart, feature_sets[i], n, 123)
        
        for k in range(n): 
            names_clusters.append("Cluster "+str(k))
        
        labels_group = names_clusters
        y = []
        
        for k in range(n):
            rsdata_charts.pie_chart(
                [classified_data.query('Cluster == '+str(k))], 
                ['Label'], 
                ["Cluster "+str(k)], 
                "", 1500, 1000, 
                ['#1E90FF', '#FF69B4', '#FFD700'], path+'cl'+str(k))
            

            
            rsdata_charts.pie_chart(
                [classified_data.query('Cluster == '+str(k))], 
                ['Classe'], 
                ["Cluster "+str(k)], 
                "", 1500, 1000,
                ['#1E90FF', '#FF69B4', '#FFD700'], caminho+'/cl'+str(k))
                
            y.append(
                [
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 20 and Polymer == 'LDPE'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 40 and Polymer == 'LDPE'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 60 and Polymer == 'LDPE'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 80 and Polymer == 'LDPE'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 100 and Polymer == 'LDPE'"))],
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 20 and Polymer == 'PVC'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 40 and Polymer == 'PVC'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 60 and Polymer == 'PVC'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 80 and Polymer == 'PVC'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 100 and Polymer == 'PVC'"))],
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 20 and Polymer == 'PP'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 40 and Polymer == 'PP'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 60 and Polymer == 'PP'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 80 and Polymer == 'PP'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 100 and Polymer == 'PP'"))],
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 20 and Polymer == 'PET'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 40 and Polymer == 'PET'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 60 and Polymer == 'PET'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 80 and Polymer == 'PET'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 100 and Polymer == 'PET'"))],
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 20 and Polymer == 'MicroNapo'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 40 and Polymer == 'MicroNapo'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 60 and Polymer == 'MicroNapo'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 80 and Polymer == 'MicroNapo'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 100 and Polymer == 'MicroNapo'"))],
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 20 and Polymer == 'Nylon'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 40 and Polymer == 'Nylon'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 60 and Polymer == 'Nylon'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 80 and Polymer == 'Nylon'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == 100 and Polymer == 'Nylon'"))]
                ]
            )
            rsdata_charts.stacked_bar_chart(names_clusters, 
                                        #x,
                                        [
                                            ['Plastic 20%', 'Plastic 40%', 'Plastic 60%', 'Plastic 80%', 'Plastic 100%'],
                                            ['Plastic 20%', 'Plastic 40%', 'Plastic 60%', 'Plastic 80%', 'Plastic 100%'],
                                            ['Plastic 20%', 'Plastic 40%', 'Plastic 60%', 'Plastic 80%', 'Plastic 100%'],
                                            ['Plastic 20%', 'Plastic 40%', 'Plastic 60%', 'Plastic 80%', 'Plastic 100%'],
                                            ['Plastic 20%', 'Plastic 40%', 'Plastic 60%', 'Plastic 80%', 'Plastic 100%'],
                                            ['Plastic 20%', 'Plastic 40%', 'Plastic 60%', 'Plastic 80%', 'Plastic 100%']
                                        ],
                                        y, 
                                        ['LDPE', 'PVC', 'PP', 'PET', 'MicroNapo', 'Nylon'], #names, 
                                        ['#ADF224','#1AB1B1', '#2f1b70', '#D81F88', '#FF8825', '#F1C800'], #colors, 
                                        'Polymers by cluster', #chart_title, 
                                        'Cover percents', #x_title, 
                                        'Number of pixels', #y_title, 
                                        1300, #height, 
                                        2000, #width, 
                                        labels_group, 
                                        'h', #orientation, 
                                        'horizontal', #guidance, 
                                        path+"/polymers" #export_name
                                       )


            rsdata_charts.stacked_bar_chart(names_clusters, 
                                        #x,
                                        [
                                            ['Plástico 20%', 'Plástico 40%', 'Plástico 60%', 'Plástico 80%', 'Plástico 100%'],
                                            ['Plástico 20%', 'Plástico 40%', 'Plástico 60%', 'Plástico 80%', 'Plástico 100%'],
                                            ['Plástico 20%', 'Plástico 40%', 'Plástico 60%', 'Plástico 80%', 'Plástico 100%'],
                                            ['Plástico 20%', 'Plástico 40%', 'Plástico 60%', 'Plástico 80%', 'Plástico 100%'],
                                            ['Plástico 20%', 'Plástico 40%', 'Plástico 60%', 'Plástico 80%', 'Plástico 100%'],
                                            ['Plástico 20%', 'Plástico 40%', 'Plástico 60%', 'Plástico 80%', 'Plástico 100%']
                                        ],
                                        y, 
                                        ['LDPE', 'PVC', 'PP', 'PET', 'MicroNapo', 'Nylon'], #names, 
                                        ['#ADF224','#1AB1B1', '#2f1b70', '#D81F88', '#FF8825', '#F1C800'], #colors, 
                                        'Polímeros por cluster', #chart_title, 
                                        'Percentuais de cobertura', #x_title, 
                                        'Número de pixels', #y_title, 
                                        1300, #height, 
                                        2000, #width, 
                                        labels_group, 
                                        'h', #orientation, 
                                        'horizontal', #guidance, 
                                        caminho+"/polimeros"#export_name
                                       )
            
        for p in paths: 
            rsdata_classification.map_kmeans(p, old_dart, classified_data, path, caminho, 3000, 5000)

#### DART 2023 

In [None]:
dart['Simple_Path'] = [ "/".join(path.split('/')[3:]) for path in dart['Path']]
paths = list(set(dart['Simple_Path']))

In [None]:
feature_sets = [
                    feature_names + radiometric_indexes,
                    feature_names,
                    radiometric_indexes
                ]

n_clusters = [4]   

ids = ["both", "bands", "indexes"]

identificadores = ["ambos", "bandas", "indices"]

In [None]:
for i in range(len(feature_sets)):
    for n in n_clusters:
        path = 'charts/english/unsupervised_classification/kmeans/'+ids[i]+'/dart_2023/k'+str(n)+'/'
        rsdata_charts.check_path(path)

        caminho = 'charts/portugues/classificacao_nao_supervisionada/kmeans/'+identificadores[i]+'/dart_2023/k'+str(n)+'/'
        rsdata_charts.check_path(caminho)
        
        names_clusters = [] 
        classified_data = rsdata_classification.k_means(dataset_dart, feature_sets[i], n, 123)
        
        for k in range(n): 
            names_clusters.append("Cluster "+str(k))
        
        labels_group = names_clusters
        y = []
        
        for k in range(n):
            rsdata_charts.pie_chart(
                [classified_data.query('Cluster == '+str(k))], 
                ['Label'], 
                ["Cluster "+str(k)], 
                "", 1500, 1000, 
                ['#1E90FF', '#FF69B4', '#FFD700'], path+'cl'+str(k))
            

            
            rsdata_charts.pie_chart(
                [classified_data.query('Cluster == '+str(k))], 
                ['Classe'], 
                ["Cluster "+str(k)], 
                "", 1500, 1000, 
                ['#1E90FF', '#FF69B4', '#FFD700'], caminho+'/cl'+str(k))
               
            y.append([
                [len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'LDPE' and Cover_percent == 20")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'LDPE' and Cover_percent == 40")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'LDPE' and Cover_percent == 60")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'LDPE' and Cover_percent == 80")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'LDPE' and Cover_percent == 100"))],
                [len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PET' and Cover_percent == 20")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PET' and Cover_percent == 40")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PET' and Cover_percent == 60")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PET' and Cover_percent == 80")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PET' and Cover_percent == 100"))],
                [len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 20 and Detailed_status == 'Wet' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 40 and Detailed_status == 'Wet' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 60 and Detailed_status == 'Wet' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 80 and Detailed_status == 'Wet' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 100 and Detailed_status == 'Wet' and Color == 'White'"))],
                [len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 20 and Detailed_status == 'Submerged 2cm' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 40 and Detailed_status == 'Submerged 2cm' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 60 and Detailed_status == 'Submerged 2cm' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 80 and Detailed_status == 'Submerged 2cm' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 100 and Detailed_status == 'Submerged 2cm' and Color == 'White'"))],
                [len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 20 and Detailed_status == 'Submerged 5cm' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 40 and Detailed_status == 'Submerged 5cm' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 60 and Detailed_status == 'Submerged 5cm' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 80 and Detailed_status == 'Submerged 5cm' and Color == 'White'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 100 and Detailed_status == 'Submerged 5cm' and Color == 'White'"))],
                [len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 20 and Detailed_status == 'Wet' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 40 and Detailed_status == 'Wet' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 60 and Detailed_status == 'Wet' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 80 and Detailed_status == 'Wet' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 100 and Detailed_status == 'Wet' and Color == 'Orange'"))],
                [len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 20 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 40 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 60 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 80 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 100 and Detailed_status == 'Submerged 2cm' and Color == 'Orange'"))],
                [len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 20 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 40 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 60 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 80 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'")), 
                 len(classified_data.query('Cluster == '+str(k)).query("Polymer == 'PP' and Cover_percent == 100 and Detailed_status == 'Submerged 5cm' and Color == 'Orange'"))]#,
                #[len(dataset['plastic_pp'].query("Polymer == 'PP' and Cover_percent == 20 and Detailed_status == 'Dry' and Color == 'Transparent'")), 
                # len(dataset['plastic_pp'].query("Polymer == 'PP' and Cover_percent == 40 and Detailed_status == 'Dry' and Color == 'Transparent'")), 
                # len(dataset['plastic_pp'].query("Polymer == 'PP' and Cover_percent == 60 and Detailed_status == 'Dry' and Color == 'Transparent'")), 
                # len(dataset['plastic_pp'].query("Polymer == 'PP' and Cover_percent == 80 and Detailed_status == 'Dry' and Color == 'Transparent'")), 
                # len(dataset['plastic_pp'].query("Polymer == 'PP' and Cover_percent == 100 and Detailed_status == 'Dry' and Color == 'Transparent'"))],
                #[len(dataset['plastic_nylon'].query("Polymer == 'Nylon' and Cover_percent == 20")), 
                # len(dataset['plastic_nylon'].query("Polymer == 'Nylon' and Cover_percent == 40")), 
                # len(dataset['plastic_nylon'].query("Polymer == 'Nylon' and Cover_percent == 60")), 
                # len(dataset['plastic_nylon'].query("Polymer == 'Nylon' and Cover_percent == 80")), 
                # len(dataset['plastic_nylon'].query("Polymer == 'Nylon' and Cover_percent == 100"))],
                #[len(dataset['plastic_pvc'].query("Polymer == 'PVC' and Cover_percent == 20")), 
                # len(dataset['plastic_pvc'].query("Polymer == 'PVC' and Cover_percent == 40")), 
                # len(dataset['plastic_pvc'].query("Polymer == 'PVC' and Cover_percent == 60")), 
                # len(dataset['plastic_pvc'].query("Polymer == 'PVC' and Cover_percent == 80")), 
                # len(dataset['plastic_pvc'].query("Polymer == 'PVC' and Cover_percent == 100"))],
                #[len(dataset['plastic_micronapo'].query("Polymer == 'MicroNapo' and Cover_percent == 20")), 
                # len(dataset['plastic_micronapo'].query("Polymer == 'MicroNapo' and Cover_percent == 40")), 
                # len(dataset['plastic_micronapo'].query("Polymer == 'MicroNapo' and Cover_percent == 60")), 
                # len(dataset['plastic_micronapo'].query("Polymer == 'MicroNapo' and Cover_percent == 80")), 
                # len(dataset['plastic_micronapo'].query("Polymer == 'MicroNapo' and Cover_percent == 100"))]
            ])
    
            labels = ['Dry LDPE (transparent)', 
                      'Dry PET (transparent)', 
                      'Wet PP (white)', 
                      'Sub 2cm PP (white)', 
                      'Sub 5cm PP (white)',
                      'Wet PP (orange)', 
                      'Sub 2cm PP (orange)', 
                      'Sub 5cm PP (orange)'#,
                      #'Dry PP (transparent)',
                      #'Dry Nylon (transparent)',
                      #'Dry PVC (transparent)',
                      #'Wet Micronapo (mixed colors)'
                     ]

            legend_x = ['Plastic 20%', 'Plastic 40%', 'Plastic 60%', 'Plastic 80%', 'Plastic 100%']
            legends_x = []
            for i in range(len(y[0])):
                legends_x.append(legend_x)
                      
            colors = ['#ADF224', '#ff74b6', '#666ce0', '#3c43d7', '#2329ac', '#FFAD69',
                      '#FF9A47', '#FF7F14'#,
                      #'#1AB1B1',
                      #'#D81F88',
                      #'#FFDD36',
                      #'#DC0000'
                      ] #colors,
            
                        
            rsdata_charts.stacked_bar_chart(names_clusters, 
                                        #x,
                                        legends_x,
                                        y, 
                                        labels,
                                        colors,
                                        'Polymers by cluster', #chart_title, 
                                        'Cover percents', #x_title, 
                                        'Number of pixels', #y_title, 
                                        1300, #height, 
                                        2000, #width, 
                                        labels_group, 
                                        'h', #orientation, 
                                        'horizontal', #guidance, 
                                        path+"/polymers" )

            labels = ['LDPE seco (transparente)', 
                      'PET seco (transparente)', 
                      'PP úmido (branco)', 
                      'PP sub 2cm (branco)', 
                      'PP sub 5cm (branco)',
                      'PP úmido (laranja)', 
                      'PP sub 2cm (laranja)', 
                      'PP sub 5cm (laranja)',
                      'PP seco (transparente)',
                      'Nylon seco (transparente)',
                      'PVC seco (transparente)',
                      'Micronapo úmido (cores mistas)']

            legend_x = ['Plástico 20%', 'Plástico 40%', 'Plástico 60%', 'Plástico 80%', 'Plástico 100%']
            legends_x = []
            for i in range(len(y[0])):
                legends_x.append(legend_x)

            rsdata_charts.stacked_bar_chart(names_clusters, 
                                    #x,
                                    legends_x,
                                    y, 
                                    labels,
                                    colors, 
                                    'DART - polímeros em detalhes', #chart_title, 
                                    'Percentuais de cobertura', #x_title, 
                                    'Número de pixels', #y_title, 
                                    1400, #height, 
                                    2000, #width, 
                                    labels_group, 
                                    'h', #orientation, 
                                    'horizontal', #guidance, 
                                    caminho+"/polymers")
            
        for p in paths: 
            #print(p)
            #print(dart.query("Simple_path == '"+date+"'"))
            rsdata_classification.map_kmeans(p, dart, classified_data, path, caminho, 3000, 5000)

#### USGS

In [None]:
usgs['Simple_Path'] = usgs['Path']
dates = list(set(usgs['Simple_Path']))

In [None]:
feature_sets = [
                    feature_names + radiometric_indexes,
                    feature_names,
                    radiometric_indexes
                ]

n_clusters = [4]   

ids = ["both", "bands", "indexes"]

identificadores = ["ambos", "bandas", "indices"]

In [None]:

#dates = ['2019_04_18', '2019_05_03', '2019_05_18', '2019_05_28', '2019_06_07', '2021_06_21', '2021_07_01', '2021_07_06', '2021_07_21', '2021_08_25']


for i in range(len(feature_sets)):
    for n in n_clusters:
        path = 'charts/english/unsupervised_classification/kmeans/'+ids[i]+'/usgs/k'+str(n)+'/'
        rsdata_charts.check_path(path)

        caminho = 'charts/portugues/classificacao_nao_supervisionada/kmeans/'+identificadores[i]+'/usgs/k'+str(n)+'/'
        rsdata_charts.check_path(caminho)
        
        names_clusters = [] 
        classified_data = rsdata_classification.k_means(usgs, feature_sets[i], n, 123)
        
        for k in range(n): 
            names_clusters.append("Cluster "+str(k))
        
        labels_group = names_clusters
        x = []
        y = []
        z = []
        
        
        for k in range(n):
            rsdata_charts.pie_chart(
                [classified_data.query('Cluster == '+str(k))], 
                ['Label'], 
                ["Cluster "+str(k)], 
                "", 1500, 1000, 
                ['#1E90FF', '#FF69B4', '#FFD700'], path+'cl'+str(k))
            
            
            rsdata_charts.pie_chart(
                [classified_data.query('Cluster == '+str(k))], 
                ['Classe'], 
                ["Cluster "+str(k)], 
                "", 1500, 1000, 
                ['#1E90FF', '#FF69B4', '#FFD700'], caminho+'/cl'+str(k))
            
            
            x.append(
                [                
                    [len(classified_data.query('Cluster == '+str(k)).query("Path == '2019_04_18'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2019_05_03'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2019_05_18'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2019_05_28'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2019_06_07'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2021_06_21'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2021_07_01'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2021_07_06'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2021_07_21'")), len(classified_data.query('Cluster == '+str(k)).query("Path == '2021_08_25'"))]
                ]
                )
            
            y.append(
                [                
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent <= 20 and Polymer == 'Bags'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 20 and Cover_percent <= 40 and Polymer == 'Bags'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 40 and Cover_percent <= 60 and Polymer == 'Bags'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 60 and Cover_percent <= 80 and Polymer == 'Bags'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 80 and Cover_percent <= 99 and Polymer == 'Bags'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -100 and Polymer == 'Bags'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -1 and Polymer == 'Bags'"))],
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent <= 20 and Polymer == 'Bags and Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 20 and Cover_percent <= 40 and Polymer == 'Bags and Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 40 and Cover_percent <= 60 and Polymer == 'Bags and Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 60 and Cover_percent <= 80 and Polymer == 'Bags and Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 80 and Cover_percent <= 99 and Polymer == 'Bags and Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -100 and Polymer == 'Bags and Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -1 and Polymer == 'Bags and Bottles'"))],
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent <= 20 and Polymer == 'Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 20 and Cover_percent <= 40 and Polymer == 'Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 40 and Cover_percent <= 60 and Polymer == 'Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 60 and Cover_percent <= 80 and Polymer == 'Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 80 and Cover_percent <= 99 and Polymer == 'Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -100 and Polymer == 'Bottles'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -1 and Polymer == 'Bottles'"))],
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent <= 20 and Polymer == 'HDPE mesh'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 20 and Cover_percent <= 40 and Polymer == 'HDPE mesh'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 40 and Cover_percent <= 60 and Polymer == 'HDPE mesh'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 60 and Cover_percent <= 80 and Polymer == 'HDPE mesh'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 80 and Cover_percent <= 99 and Polymer == 'HDPE mesh'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -100 and Polymer == 'HDPE mesh'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -1 and Polymer == 'HDPE mesh'"))]
                ]
                )
            
            z.append(
                [                
                    [len(classified_data.query('Cluster == '+str(k)).query("Cover_percent <= 20 and Polymer == 'Wood'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 20 and Cover_percent <= 40 and Polymer == 'Wood'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 40 and Cover_percent <= 60 and Polymer == 'Wood'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 60 and Cover_percent <= 80 and Polymer == 'Wood'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent > 80 and Cover_percent <= 99 and Polymer == 'Wood'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -100 and Label == 'Wood'")), len(classified_data.query('Cluster == '+str(k)).query("Cover_percent == -1 and Label == 'Wood'"))]
                    
                ]
                )
            
            rsdata_charts.stacked_bar_chart(names_clusters, 
                                        #x,
                                        [
                                            ['Plastic 1% - 20%', 'Plastic 20% - 40%', 'Plastic 40% - 60%', 'Plastic 60% - 80%', 'Plastic 80% - 99%', 'Plastic 100%', 'Unknown percent'],
                                            ['Plastic 1% - 20%', 'Plastic 20% - 40%', 'Plastic 40% - 60%', 'Plastic 60% - 80%', 'Plastic 80% - 99%', 'Plastic 100%', 'Unknown percent'],
                                            ['Plastic 1% - 20%', 'Plastic 20% - 40%', 'Plastic 40% - 60%', 'Plastic 60% - 80%', 'Plastic 80% - 99%', 'Plastic 100%', 'Unknown percent'],
                                            ['Plastic 1% - 20%', 'Plastic 20% - 40%', 'Plastic 40% - 60%', 'Plastic 60% - 80%', 'Plastic 80% - 99%', 'Plastic 100%', 'Unknown percent']
                                        ],
                                        y, 
                                        ['Bags', 'Bags and Bottles', 'Bottles', 'HDPE mesh'], #names, 
                                        ['#49C658','#8945AB', '#FF675F', '#FCFE5E'], #colors, 
                                        'Polymers by cluster', #chart_title, 
                                        'Cover percents', #x_title, 
                                        'Number of pixels', #y_title, 
                                        1300, #height, 
                                        2000, #width, 
                                        labels_group, 
                                        'h', #orientation, 
                                        'horizontal', #guidance, 
                                        path+"/polymers"#export_name
                                           )


            rsdata_charts.stacked_bar_chart(names_clusters, 
                                        #x,
                                        [
                                            ['Plástico 1% a 20%', 'Plástico 20% a 40%', 'Plástico 40% a 60%', 'Plástico 60% a 80%', 'Plástico 80% a 99%', 'Plástico 100%', 'Percentual desconhecido'],
                                            ['Plástico 1% a 20%', 'Plástico 20% a 40%', 'Plástico 40% a 60%', 'Plástico 60% a 80%', 'Plástico 80% a 99%', 'Plástico 100%', 'Percentual desconhecido'],
                                            ['Plástico 1% a 20%', 'Plástico 20% a 40%', 'Plástico 40% a 60%', 'Plástico 60% a 80%', 'Plástico 80% a 99%', 'Plástico 100%', 'Percentual desconhecido'],
                                            ['Plástico 1% a 20%', 'Plástico 20% a 40%', 'Plástico 40% a 60%', 'Plástico 60% a 80%', 'Plástico 80% a 99%', 'Plástico 100%', 'Percentual desconhecido']
                                        ],
                                        y, 
                                        ['Sacolas', 'Sacolas e garrafas', 'Garrafas', 'Malha de HDPE'], #names, 
                                        ['#49C658','#8945AB', '#FF675F', '#FCFE5E'], #colors, 
                                        'Polímeros por cluster', #chart_title, 
                                        'Percentuais de cobertura', #x_title, 
                                        'Número de pixels', #y_title, 
                                        1300, #height, 
                                        2000, #width, 
                                        labels_group, 
                                        'h', #orientation, 
                                        'horizontal', #guidance, 
                                        caminho+"/polimeros"#export_name
                                           )
        
            rsdata_charts.stacked_bar_chart(names_clusters, 
                                            #x,
                                            [
                                                ['Wood 1% - 20%', 'Wood 20% - 40%', 'Wood 40% - 60%', 'Wood 60% - 80%', 'Wood 80% - 99%', 'Wood 100%', 'Unknown percent']
                                            ],
                                            z, 
                                            ['Wood'], #names, 
                                            ['#82431d'], #colors, 
                                            'Wood by cluster', #chart_title, 
                                            'Cover percents', #x_title, 
                                            'Number of pixels', #y_title, 
                                            1300, #height, 
                                            2000, #width, 
                                            labels_group, 
                                            'h', #orientation, 
                                            'horizontal', #guidance, 
                                            path+"/wood"#export_name
                                               )


            rsdata_charts.stacked_bar_chart(names_clusters, 
                                            #x,
                                            [
                                                ['Madeira 1% a 20%', 'Madeira 20% a 40%', 'Madeira 40% a 60%', 'Madeira 60% a 80%', 'Madeira 80% a 99%', 'Madeira 100%', 'Percentual desconhecido']
                                            ],
                                            z, 
                                            ['Madeira'], #names, 
                                            ['#82431d'], #colors, 
                                            'Madeira por cluster', #chart_title, 
                                            'Percentuais de cobertura', #x_title, 
                                            'Número de pixels', #y_title, 
                                            1300, #height, 
                                            2000, #width, 
                                            labels_group, 
                                            'h', #orientation, 
                                            'horizontal', #guidance, 
                                            caminho+"/madeira"#export_name
                                               )

            rsdata_charts.stacked_bar_chart(names_clusters, 
                                            #x,
                                            [
                                                ['18/04/2019', '03/05/2019', '18/05/2019', '28/05/2019', '07/06/2019', '21/06/2021', '01/07/2021', '06/07/2021', '21/07/2021', '25/08/2021']
                                            ],
                                            x, 
                                            ['Dates'], #names, 
                                            ['#49C658','#8945AB', '#FF675F', '#FCFE5E'], #colors, 
                                            'Dates by cluster', #chart_title, 
                                            'Dates', #x_title, 
                                            'Number of pixels', #y_title, 
                                            1300, #height, 
                                            2000, #width, 
                                            labels_group, 
                                            'h', #orientation, 
                                            'horizontal', #guidance, 
                                            path+"/dates"#export_name
                                               )


            rsdata_charts.stacked_bar_chart(names_clusters, 
                                            #x,
                                            [
                                                ['18/04/2019', '03/05/2019', '18/05/2019', '28/05/2019', '07/06/2019', '21/06/2021', '01/07/2021', '06/07/2021', '21/07/2021', '25/08/2021']
                                            ],
                                            x, 
                                            ['Datas'], #names, 
                                            ['#49C658','#8945AB', '#FF675F', '#FCFE5E'], #colors,  
                                            'Datas por cluster', #chart_title, 
                                            'Datas', #x_title, 
                                            'Número de pixels', #y_title, 
                                            1300, #height, 
                                            2000, #width, 
                                            labels_group, 
                                            'h', #orientation, 
                                            'horizontal', #guidance, 
                                            caminho+"/datas"#export_name
                                               )
        
        for d in dates: 
            rsdata_classification.map_kmeans(d, dataset_usgs, classified_data, path, caminho, 1500, 2100)

## Supervised classification

In supervised classification, we initially used Random Forest in different configurations to evaluate model performance. Then, we employed Artificial Neural Networks (Multilayer Perceptron), and finally, an AutoML approach to identify a third model to be compared with the previous ones. For supervised classification, only the classes Plastic and Water were considered, as they were the only ones present in both the simulations and the observed data.

### Random Forest

#### Dart 2021 (training) x USGS (testing)

In [None]:
feature_sets = [
                    feature_names + radiometric_indexes,
                    feature_names,
                    radiometric_indexes
                ]

ids = ["both", "bands", "indexes"]

identificadores = ["ambos", "bandas", "indices"]

dates = list(set(usgs['Simple_Path']))

In [None]:
for feature_set in feature_sets:
    X = old_dart.query("Label == 'Water' or Label == 'Plastic'")[feature_set]
    y = old_dart.query("Label == 'Water' or Label == 'Plastic'")['Label']
    X_real = usgs.query("Label == 'Water' or Label == 'Plastic'")[feature_set]
    y_real = usgs.query("Label == 'Water' or Label == 'Plastic'")['Label']
    
    y_pred, assessment, errors, hits, confusion_matrices, acc, b_acc, f1_macro, f1_weighted, fbeta_macro, fbeta_weighted, pr_macro, rc_macro, pr_weighted, rc_weighted, feature_importances = rsdata_classification.random_forest(X, y, X_real, y_real, feature_set, 100)
    
    path = 'charts/english/supervised_classification/random_forest/dart_2021/'+ids[feature_sets.index(feature_set)]+'/'
    rsdata_charts.check_path(path)
    
    caminho = 'charts/portugues/classificacao_supervisionada/ramdom_forest/dart_2021/'+identificadores[feature_sets.index(feature_set)]+'/'
    rsdata_charts.check_path(caminho)
    
    #METRICS TO CSV
    acc_ = []
    b_acc_ = []
    f1_macro_ = []
    f1_weighted_ = []
    fbeta_macro_ = []
    fbeta_weighted_ = []
    pr_macro_ = []
    rc_macro_ = []
    pr_weighted_ = []
    rc_weighted_ = []
    
    best_acc_value = 0
    best_acc_index = 0
    best_bac_value = 0
    best_bac_index = 0

    for key in acc.keys():
        acc_.append(acc[key])
        b_acc_.append(b_acc[key])
        f1_macro_.append(f1_macro[key])
        f1_weighted_.append(f1_weighted[key])
        fbeta_macro_.append(fbeta_macro[key])
        fbeta_weighted_.append(fbeta_weighted[key])
        pr_macro_.append(pr_macro[key])
        rc_macro_.append(rc_macro[key])
        pr_weighted_.append(pr_weighted[key])
        rc_weighted_.append(rc_weighted[key])
        
        if acc[key] > best_acc_value:
            best_acc_value = acc[key]
            best_acc_index = key
        if b_acc[key] > best_bac_value:
            best_bac_value = b_acc[key]
            best_bac_index = key
    
    acc_ = pd.DataFrame(acc_, columns = ['Acc'])
    b_acc_ = pd.DataFrame(b_acc_, columns = ['B_acc'])
    f1_macro_ = pd.DataFrame(f1_macro_, columns = ['F1M'])
    f1_weighted_ = pd.DataFrame(f1_weighted_, columns = ['F1W'])
    fbeta_macro_ = pd.DataFrame(fbeta_macro_, columns = ['FBM'])
    fbeta_weighted_ = pd.DataFrame(fbeta_weighted_, columns = ['FBW'])
    pr_macro_ = pd.DataFrame(pr_macro_, columns = ['PrM'])
    rc_macro_ = pd.DataFrame(rc_macro_, columns = ['RcM'])
    pr_weighted_ = pd.DataFrame(pr_weighted_, columns = ['PrW'])
    rc_weighted_ = pd.DataFrame(rc_weighted_, columns = ['RcW'])

    mt = pd.DataFrame([acc_['Acc'].mean(), b_acc_['B_acc'].mean(), f1_macro_['F1M'].mean(), f1_weighted_['F1W'].mean(), fbeta_macro_['FBM'].mean(), fbeta_weighted_['FBW'].mean(), pr_macro_['PrM'].mean(), pr_weighted_['PrW'].mean(), rc_macro_['RcM'].mean(), rc_weighted_['RcW'].mean()], index=['Overall accuracy', 'Balanced accuracy', 'F1 macro', 'F1 weighted', 'Fbeta macro', 'Fbeta weighted', 'Precision macro', 'Precision weighted', 'Recall macro', 'Recall weighted'], columns=[ids[feature_sets.index(feature_set)]])
    mt.to_excel(path+'evaluation_metrics.xlsx')
    
    stats_by_polymer, stats_by_label_year, stats_by_plastic_cover_percent, stats_by_date = rsdata_classification.stats_classification(assessment, usgs.query("Label == 'Water' or Label == 'Plastic'"))
    stats_by_polymer.to_excel(path+'evaluation_metrics_by_polymers.xlsx')
    stats_by_label_year.to_excel(path+'evaluation_metrics_by_year.xlsx')
    stats_by_plastic_cover_percent.to_excel(path+'evaluation_metrics_by_cover_percent.xlsx')
    stats_by_date.to_excel(path+'evaluation_metrics_by_date.xlsx')
    
    datasets_names = ["Best Overall Accuraccy - Index " + str(best_acc_index), "Best Balanced Accuraccy - Index " + str(best_bac_index)]
    traces = [feature_importances[best_acc_index], feature_importances[best_bac_index]]
    labels = [feature_importances[best_acc_index].index, feature_importances[best_bac_index].index]
    color = 'rgba(50, 171, 96, 0.6)'
    line_color = 'rgba(50, 171, 96, 1.0)'
    chart_title = 'Feature importances for USGS data'
    x_title = "Importance"
    y_title = "Feature"
    height = 1200
    width = 2400
    orientation='h'
    guidance="horizontal"
    export_name = path+'feature_importances'
    rsdata_charts.bar_chart(datasets_names, traces, labels, color, line_color, chart_title, x_title, y_title, height, width, orientation, guidance=guidance, export_name=export_name)
    #Portugues
    
    #MAPS
    rsdata_charts.check_path(path+'maps/o_acc/')
    rsdata_charts.check_path(path+'maps/b_acc/')
    rsdata_charts.check_path(caminho+'mapas/acc_geral')
    rsdata_charts.check_path(caminho+'mapas/acc_balanceada')
    
    classified_data = usgs.query("Label == 'Water' or Label == 'Plastic'").copy()
    classified_data['Predicted_class'] = y_pred[best_acc_index]
    for d in dates:
        rsdata_charts.map_nn(d, usgs.query("Label == 'Water' or Label == 'Plastic'"), classified_data, path+'maps/o_acc/', caminho+'mapas/acc_geral', 1000, 2000)
        
    classified_data = usgs.query("Label == 'Water' or Label == 'Plastic'").copy()
    classified_data['Predicted_class'] = y_pred[best_bac_index]
    for d in dates:
        rsdata_charts.map_nn(d, usgs.query("Label == 'Water' or Label == 'Plastic'"), classified_data, path+'maps/b_acc/', caminho+'mapas/acc_balanceada', 1000, 2000)

#### Dart 2023 (training) x USGS (testing)

In [None]:
feature_sets = [
                    feature_names + radiometric_indexes,
                    feature_names,
                    radiometric_indexes
                ]

ids = ["both", "bands", "indexes"]

identificadores = ["ambos", "bandas", "indices"]

dates = list(set(usgs['Simple_Path']))

In [None]:
for feature_set in feature_sets:
    X = dart.query("Label == 'Water' or Label == 'Plastic'")[feature_set]
    y = dart.query("Label == 'Water' or Label == 'Plastic'")['Label']
    X_real = usgs.query("Label == 'Water' or Label == 'Plastic'")[feature_set]
    y_real = usgs.query("Label == 'Water' or Label == 'Plastic'")['Label']
    
    y_pred, assessment, errors, hits, confusion_matrices, acc, b_acc, f1_macro, f1_weighted, fbeta_macro, fbeta_weighted, pr_macro, rc_macro, pr_weighted, rc_weighted, feature_importances = rsdata_classification.random_forest(X, y, X_real, y_real, feature_set, 100)
    
    path = 'charts/english/supervised_classification/random_forest/dart_2023/'+ids[feature_sets.index(feature_set)]+'/'
    rsdata_charts.check_path(path)
    
    caminho = 'charts/portugues/classificacao_supervisionada/ramdom_forest/dart_2023/'+identificadores[feature_sets.index(feature_set)]+'/'
    rsdata_charts.check_path(caminho)
    
    #METRICS TO CSV
    acc_ = []
    b_acc_ = []
    f1_macro_ = []
    f1_weighted_ = []
    fbeta_macro_ = []
    fbeta_weighted_ = []
    pr_macro_ = []
    rc_macro_ = []
    pr_weighted_ = []
    rc_weighted_ = []
    
    best_acc_value = 0
    best_acc_index = 0
    best_bac_value = 0
    best_bac_index = 0

    for key in acc.keys():
        acc_.append(acc[key])
        b_acc_.append(b_acc[key])
        f1_macro_.append(f1_macro[key])
        f1_weighted_.append(f1_weighted[key])
        fbeta_macro_.append(fbeta_macro[key])
        fbeta_weighted_.append(fbeta_weighted[key])
        pr_macro_.append(pr_macro[key])
        rc_macro_.append(rc_macro[key])
        pr_weighted_.append(pr_weighted[key])
        rc_weighted_.append(rc_weighted[key])
        
        if acc[key] > best_acc_value:
            best_acc_value = acc[key]
            best_acc_index = key
        if b_acc[key] > best_bac_value:
            best_bac_value = b_acc[key]
            best_bac_index = key
    
    acc_ = pd.DataFrame(acc_, columns = ['Acc'])
    b_acc_ = pd.DataFrame(b_acc_, columns = ['B_acc'])
    f1_macro_ = pd.DataFrame(f1_macro_, columns = ['F1M'])
    f1_weighted_ = pd.DataFrame(f1_weighted_, columns = ['F1W'])
    fbeta_macro_ = pd.DataFrame(fbeta_macro_, columns = ['FBM'])
    fbeta_weighted_ = pd.DataFrame(fbeta_weighted_, columns = ['FBW'])
    pr_macro_ = pd.DataFrame(pr_macro_, columns = ['PrM'])
    rc_macro_ = pd.DataFrame(rc_macro_, columns = ['RcM'])
    pr_weighted_ = pd.DataFrame(pr_weighted_, columns = ['PrW'])
    rc_weighted_ = pd.DataFrame(rc_weighted_, columns = ['RcW'])

    mt = pd.DataFrame([acc_['Acc'].mean(), b_acc_['B_acc'].mean(), f1_macro_['F1M'].mean(), f1_weighted_['F1W'].mean(), fbeta_macro_['FBM'].mean(), fbeta_weighted_['FBW'].mean(), pr_macro_['PrM'].mean(), pr_weighted_['PrW'].mean(), rc_macro_['RcM'].mean(), rc_weighted_['RcW'].mean()], index=['Overall accuracy', 'Balanced accuracy', 'F1 macro', 'F1 weighted', 'Fbeta macro', 'Fbeta weighted', 'Precision macro', 'Precision weighted', 'Recall macro', 'Recall weighted'], columns=[ids[feature_sets.index(feature_set)]])
    mt.to_excel(path+'evaluation_metrics.xlsx')
    
    stats_by_polymer, stats_by_label_year, stats_by_plastic_cover_percent, stats_by_date = rsdata_classification.stats_classification(assessment, usgs.query("Label == 'Water' or Label == 'Plastic'"))
    stats_by_polymer.to_excel(path+'evaluation_metrics_by_polymers.xlsx')
    stats_by_label_year.to_excel(path+'evaluation_metrics_by_year.xlsx')
    stats_by_plastic_cover_percent.to_excel(path+'evaluation_metrics_by_cover_percent.xlsx')
    stats_by_date.to_excel(path+'evaluation_metrics_by_date.xlsx')
    
    datasets_names = ["Best Overall Accuraccy - Index " + str(best_acc_index), "Best Balanced Accuraccy - Index " + str(best_bac_index)]
    traces = [feature_importances[best_acc_index], feature_importances[best_bac_index]]
    labels = [feature_importances[best_acc_index].index, feature_importances[best_bac_index].index]
    color = 'rgba(50, 171, 96, 0.6)'
    line_color = 'rgba(50, 171, 96, 1.0)'
    chart_title = 'Feature importances for USGS data'
    x_title = "Importance"
    y_title = "Feature"
    height = 1200
    width = 2400
    orientation='h'
    guidance="horizontal"
    export_name = path+'feature_importances'
    rsdata_charts.bar_chart(datasets_names, traces, labels, color, line_color, chart_title, x_title, y_title, height, width, orientation, guidance=guidance, export_name=export_name)
    #Portugues
    
    #MAPS
    rsdata_charts.check_path(path+'maps/o_acc/')
    rsdata_charts.check_path(path+'maps/b_acc/')
    rsdata_charts.check_path(caminho+'mapas/acc_geral')
    rsdata_charts.check_path(caminho+'mapas/acc_balanceada')
    
    classified_data = usgs.query("Label == 'Water' or Label == 'Plastic'").copy()
    classified_data['Predicted_class'] = y_pred[best_acc_index]
    for d in dates:
        rsdata_charts.map_nn(d, usgs.query("Label == 'Water' or Label == 'Plastic'"), classified_data, path+'maps/o_acc/', caminho+'mapas/acc_geral', 1000, 2000)
        
    classified_data = usgs.query("Label == 'Water' or Label == 'Plastic'").copy()
    classified_data['Predicted_class'] = y_pred[best_bac_index]
    for d in dates:
        rsdata_charts.map_nn(d, usgs.query("Label == 'Water' or Label == 'Plastic'"), classified_data, path+'maps/b_acc/', caminho+'mapas/acc_balanceada', 1000, 2000)
        

In [None]:
X = dart.query("Label == 'Water' or Label == 'Plastic'")[feature_names]
y = dart.query("Label == 'Water' or Label == 'Plastic'")['Label']
X_real = usgs.query("Label == 'Water' or Label == 'Plastic'")[feature_names]
y_real = usgs.query("Label == 'Water' or Label == 'Plastic'")['Label']

y_pred, assessment, errors, hits, confusion_matrices, acc, b_acc, f1_macro, f1_weighted, fbeta_macro, fbeta_weighted, pr_macro, rc_macro, pr_weighted, rc_weighted, feature_importances = rsdata_classification.random_forest(X, y, X_real, y_real, feature_names, 100)

confusion_matrices

### Artificial Neural Network

#### Feature selection (Random Forest)

DART data: all classes

In [None]:
# create dataset
X = dataset_dart[feature_names + radiometric_indexes]
y = dataset_dart['Label']

# holdout
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)

#Considerar correlação e heatmap dos datasets na análise

#RandomForest
random_forest = RandomForestClassifier(max_depth=3, random_state=123)
random_forest.fit(X_train, y_train)
y_true = y_test
y_pred = random_forest.predict(X_test)
print(f"Accuracy: {round(accuracy_score(y_true, y_pred), 4)}")

dart_importances = pd.Series(data=random_forest.feature_importances_, index=feature_names + radiometric_indexes)

Dart data: only plastic and water

In [None]:
# create dataset
X = dart_subdatasets['plastic_and_water'][feature_names + radiometric_indexes]
y = dart_subdatasets['plastic_and_water']['Label']

# holdout
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)

#Considerar correlação e heatmap dos datasets na análise

#RandomForest
random_forest = RandomForestClassifier(max_depth=3, random_state=123)
random_forest.fit(X_train, y_train)
y_true = y_test
y_pred = random_forest.predict(X_test)
print(f"Accuracy: {round(accuracy_score(y_true, y_pred), 4)}")

dart_ap_importances = pd.Series(data=random_forest.feature_importances_, index=feature_names + radiometric_indexes)

In [None]:
datasets_names = ["DART (all classes)", "DART (only water and plastic)"]
traces = [dart_ap_importances, dart_importances]
labels = [dart_ap_importances.index, dart_importances.index]
color = 'rgba(50, 171, 96, 0.6)'
line_color = 'rgba(50, 171, 96, 1.0)'
chart_title = 'Feature importances for DART data'
x_title = "Importance"
y_title = "Feature"
height = 600
width = 900
orientation='h'
guidance="horizontal"

rsdata_charts.check_path("charts/english/pre_processing/feature_selection/")
export_name = "charts/english/pre_processing/feature_selection/dart"

rsdata_charts.bar_chart(datasets_names, traces, labels, color, line_color, chart_title, x_title, y_title, height, width, orientation, guidance=guidance, export_name=export_name)



datasets_names = ["DART (todas as classes)", "DART (apenas água e plástico)"]
chart_title = 'Importância de cada feature nos dados DART'
x_title = "Importância"
y_title = "Feature"

rsdata_charts.check_path("charts/portugues/pre_processamento/selecao_atributos/")
export_name = "charts/portugues/pre_processamento/selecao_atributos/dart"
                        

rsdata_charts.bar_chart(datasets_names, traces, labels, color, line_color, chart_title, x_title, y_title, height, width, orientation, guidance=guidance, export_name=export_name)

USGS data: all classes

In [None]:
# create dataset
X = dataset_usgs[feature_names + radiometric_indexes]
y = dataset_usgs['Label']

# holdout
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)

#Considerar correlação e heatmap dos datasets na análise

#RandomForest
random_forest = RandomForestClassifier(max_depth=3, random_state=123)
random_forest.fit(X_train, y_train)
y_true = y_test
y_pred = random_forest.predict(X_test)
print(f"Accuracy: {round(accuracy_score(y_true, y_pred), 4)}")

usgs_importances = pd.Series(data=random_forest.feature_importances_, index=feature_names + radiometric_indexes)

USGS data: only plastic and water

In [None]:
# create dataset
X = usgs_subdatasets['plastic_and_water'][feature_names + radiometric_indexes]
y = usgs_subdatasets['plastic_and_water']['Label']

# holdout
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.25)

#Considerar correlação e heatmap dos datasets na análise

#RandomForest
random_forest = RandomForestClassifier(max_depth=3, random_state=123)
random_forest.fit(X_train, y_train)
y_true = y_test
y_pred = random_forest.predict(X_test)
print(f"Accuracy: {round(accuracy_score(y_true, y_pred), 4)}")

usgs_ap_importances = pd.Series(data=random_forest.feature_importances_, index=feature_names + radiometric_indexes)

In [None]:
datasets_names = ["USGS (all classes)", "USGS (only water and plastic)"]
traces = [usgs_ap_importances, usgs_importances]
labels = [usgs_ap_importances.index, usgs_importances.index]
color = 'rgba(50, 171, 96, 0.6)'
line_color = 'rgba(50, 171, 96, 1.0)'
chart_title = 'Feature importances for USGS data'
x_title = "Importance"
y_title = "Feature"
height = 600
width = 900
orientation='h'
guidance="horizontal"

export_name = "charts/english/pre_processing/feature_selection/usgs"

rsdata_charts.bar_chart(datasets_names, traces, labels, color, line_color, chart_title, x_title, y_title, height, width, orientation, guidance=guidance, export_name=export_name)



datasets_names = ["USGS (todas as classes)", "USGS (apenas água e plástico)"]
chart_title = 'Importância de cada feature nos dados USGS'
x_title = "Importância"
y_title = "Feature"

export_name = "charts/portugues/pre_processamento/selecao_atributos/usgs"
                        

rsdata_charts.bar_chart(datasets_names, traces, labels, color, line_color, chart_title, x_title, y_title, height, width, orientation, guidance=guidance, export_name=export_name)

#### GridsearchCV

In [None]:
feature_sets = [
                    feature_names + radiometric_indexes,
                    feature_names,
                    radiometric_indexes,
                    ['NIR1', 'SR', 'WRI', 'FDI']
                ]

In [None]:
for feature_set in feature_sets:
    print("Starting best params search for MLPClassifier for feature set ", feature_set)
    # create dataset
    X = dart_subdatasets['plastic_and_water'][feature_set]
    y = dart_subdatasets['plastic_and_water']['Label']
    
    # configure the cross-validation procedure
    cv_outer = StratifiedKFold(n_splits=4, shuffle=True, random_state=123)
    
    #accuraccy metrics
    metrics = ['accuracy', #calcula a acurácia do subconjunto: o conjunto de rótulos predito para uma amostra deve corresponder exatamente ao conjunto de rótulos em y_true.
               'balanced_accuracy', #para lidar com conjuntos de dados desbalanceados; é definida como a média do recall obtido em cada classe.
               'f1_micro', #Calcula as métricas globalmente contando o total de verdadeiros positivos, falsos negativos e falsos positivos.
               'f1_weighted', #Calcula as métricas para cada rótulo e encontra sua média ponderada pelo suporte (o número de instâncias verdadeiras para cada rótulo). Isso altera o 'macro' para levar em conta o desbalanceamento dos dados; pode resultar em um F-score que não está entre precisão e recall.
               'precision_micro', #tp / (tp + fp) - global
               'precision_weighted', #tp / (tp + fp) - média ponderada pelo suporte (número de tp do respectivo rótulo) dos rótulos - leva em conta o desbalanceamento dos dados 
               'recall_micro', #tp / (tp + fn) - global #CORTEI OS MACROS PQ SAO tp / (tp + fn) - média não ponderada dos rótulos
               'recall_weighted', #tp / (tp + fn) - média ponderada pelo suporte (número de tp do respectivo rótulo) dos rótulos - leva em conta o desbalanceamento dos dados
               'roc_auc_ovr', #Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores / Significa One-vs-rest. Calcula a AUC de cada classe em relação ao resto [3] [4]. Isso trata o caso multiclasse da mesma forma que o caso multirótulo. Sensível ao desbalanceamento de classe
               'roc_auc_ovr_weighted' #Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores / Significa One-vs-rest. Calcula a AUC de cada classe em relação ao resto [3] [4]. Isso trata o caso multiclasse da mesma forma que o caso multirótulo. Sensível ao desbalanceamento de classe / média, ponderada pelo suporte (o número de instâncias verdadeiras para cada rótulo).
              ]
    
    # define search space
    space = dict()
    space['hidden_layer_sizes'] = [(20), (30,30), (50,50,50)] 
    space['solver'] = ['lbfgs', 'sgd', 'adam']
    space['alpha'] = [0.00001, 0.0001, 0.001]
    space['max_iter'] = [100, 250, 500]
    space['activation'] = ['identity', 'logistic', 'tanh', 'relu']
               
    best_models = []
    results = []
    
    for train_ix, test_ix in cv_outer.split(X, y):
        # select rows
        train_X, test_X = X.iloc[train_ix], X.iloc[test_ix]#Depois treinar com 100% DART e testar com 100% USGS - sem validação cruzada no caso
        train_y, test_y = y.iloc[train_ix], y.iloc[test_ix]
        
        # configure the cross-validation procedure
        cv_inner = StratifiedKFold(n_splits=4, shuffle=True, random_state=123)
        
        bests = dict()       
        rslt = []
               
        for metric in metrics:     
            model = MLPClassifier(random_state=123)
            # define search
            search = GridSearchCV(model, space, scoring=metric, cv=cv_inner, refit=True)
            # execute search
            result = search.fit(train_X, train_y)
            # get the best performing model fit on the whole training set
            best_model = result.best_estimator_
            # evaluate model on the hold out dataset
            y_pred = best_model.predict(test_X)
            #tn, fp, fn, tp = confusion_matrix(test_y, yhat).ravel()
            # evaluate the model
            assessment = {
                          'feature_set': feature_set,
                          'metric:': metric,
                          'y_true': test_y,
                          'y_pred': y_pred,
                          'confusion_matrix': confusion_matrix(test_y, y_pred, labels=['Sand', 'Water', 'Plastic']),
                          'accuracy': accuracy_score(test_y, y_pred),
                          'balanced_accuracy': balanced_accuracy_score(test_y, y_pred),
                          'f1_micro': f1_score(test_y, y_pred, average='micro'),
                          'f1_macro': f1_score(test_y, y_pred, average='macro'),
                          'f1_weighted': f1_score(test_y, y_pred, average='weighted'),
                          'fbeta_micro': fbeta_score(test_y, y_pred, average='micro', beta=0.5),
                          'fbeta_macro': fbeta_score(test_y, y_pred, average='macro', beta=0.5),
                          'fbeta_weighted': fbeta_score(test_y, y_pred, average='weighted', beta=0.5),
                          'jaccard_micro': jaccard_score(test_y, y_pred, average='micro'),
                          'jaccard_macro': jaccard_score(test_y, y_pred, average='macro'), 
                          'jaccard_weighted': jaccard_score(test_y, y_pred, average='weighted'), 
                          'precision_micro': precision_score(test_y, y_pred, average='micro'),
                          'precision_macro': precision_score(test_y, y_pred, average='macro'),
                          'precision_weighted': precision_score(test_y, y_pred, average='weighted'),  
                          'recall_micro': recall_score(test_y, y_pred, average='micro'),
                          'recall_macro': recall_score(test_y, y_pred, average='macro'),
                          'recall_weighted': recall_score(test_y, y_pred, average='weighted')
                        }
               
            print(datetime.datetime.now().strftime("%d-%b-%Y %A %I:%M"), ' - best model: ' , best_model ,' - assessment: ', assessment)
            bests.update({best_model:assessment})
            rslt.append(result)
        print("Ending loop CV")
        
        # store the result
        best_models.append(bests)
        results.append(rslt)
        # report progress
               
    print("Ending best params search for MLPClassifier for feature set ", feature_set)

### Results for feature set A (bands)

#### Classification

In [None]:
assessment, errors, hits, confusion_matrices, acc, b_acc, f1_macro, f1_weighted, fbeta_macro, fbeta_weighted, pr_macro, rc_macro, pr_weighted, rc_weighted = rsdata_classification.multilayer_perceptron(dart_subdatasets['plastic_and_water'][feature_sets[0]], 
                                                                                            dart_subdatasets['plastic_and_water']['Label'],
                                                                                            usgs_subdatasets['plastic_and_water'][feature_sets[0]],
                                                                                            usgs_subdatasets['plastic_and_water']['Label'],
                                                                                            100)

#### Stats

In [None]:
acc_ = []
b_acc_ = []
f1_macro_ = []
f1_weighted_ = []
fbeta_macro_ = []
fbeta_weighted_ = []
pr_macro_ = []
rc_macro_ = []
pr_weighted_ = []
rc_weighted_ = []

for key in acc.keys():
    acc_.append(acc[key])
    b_acc_.append(b_acc[key])
    f1_macro_.append(f1_macro[key])
    f1_weighted_.append(f1_weighted[key])
    fbeta_macro_.append(fbeta_macro[key])
    fbeta_weighted_.append(fbeta_weighted[key])
    pr_macro_.append(pr_macro[key])
    rc_macro_.append(rc_macro[key])
    pr_weighted_.append(pr_weighted[key])
    rc_weighted_.append(rc_weighted[key])
    
acc_ = pd.DataFrame(acc_, columns = ['Acc'])
b_acc_ = pd.DataFrame(b_acc_, columns = ['B_acc'])
f1_macro_ = pd.DataFrame(f1_macro_, columns = ['F1M'])
f1_weighted_ = pd.DataFrame(f1_weighted_, columns = ['F1W'])
fbeta_macro_ = pd.DataFrame(fbeta_macro_, columns = ['FBM'])
fbeta_weighted_ = pd.DataFrame(fbeta_weighted_, columns = ['FBW'])
pr_macro_ = pd.DataFrame(pr_macro_, columns = ['PrM'])
rc_macro_ = pd.DataFrame(rc_macro_, columns = ['RcM'])
pr_weighted_ = pd.DataFrame(pr_weighted_, columns = ['PrW'])
rc_weighted_ = pd.DataFrame(rc_weighted_, columns = ['RcW'])
    
mt_a = pd.DataFrame([acc_['Acc'].mean(), b_acc_['B_acc'].mean(), f1_macro_['F1M'].mean(), f1_weighted_['F1W'].mean(), fbeta_macro_['FBM'].mean(), fbeta_weighted_['FBW'].mean(), pr_macro_['PrM'].mean(), pr_weighted_['PrW'].mean(), rc_macro_['RcM'].mean(), rc_weighted_['RcW'].mean()], index=['Overall accuracy', 'Balanced accuracy', 'F1 macro', 'F1 weighted', 'Fbeta macro', 'Fbeta weighted', 'Precision macro', 'Precision weighted', 'Recall macro', 'Recall weighted'], columns=['Feature set A'])
mt_a

In [None]:
stats_by_polymer, stats_by_label_year, stats_by_plastic_cover_percent, stats_by_date = rsdata_classification.stats_classification(assessment, usgs_subdatasets['plastic_and_water'])
stats_by_polymer

In [None]:
stats_by_label_year

In [None]:
stats_by_plastic_cover_percent

In [None]:
stats_by_date

#### Maps

In [None]:
dates = list(set(usgs_subdatasets['plastic_and_water'].loc[assessment[0].index]['Path']))

max_score = f1_weighted_.max()['F1W'] #Higher F1 weighted score

best_score = f1_weighted_.loc[f1_weighted_['F1W'] >= max_score].index[0] #Data from best F1 weighted score

classified_data_a = assessment[best_score]

In [None]:
usgs_subdatasets['plastic_and_water']['Simple_Path'] = usgs_subdatasets['plastic_and_water']['Path']
classified_data_a['Simple_Path'] = usgs_subdatasets['plastic_and_water']['Simple_Path']
classified_data_a['Line'] = usgs_subdatasets['plastic_and_water']['Line']
classified_data_a['Column'] = usgs_subdatasets['plastic_and_water']['Column']

rsdata_charts.check_path("charts/english/supervised_classification/ann/feature_set/A/")
usgs_path = "charts/english/supervised_classification/ann/feature_set/A/"

rsdata_charts.check_path("charts/portugues/classificacao_supervisionada/rna/conjunto_atributos/A/")
usgs_caminho = "charts/portugues/classificacao_supervisionada/rna/conjunto_atributos/A/"

for d in dates:
    rsdata_charts.map_nn(d, usgs_subdatasets['plastic_and_water'], classified_data_a, usgs_path, usgs_caminho, 750, 1500)

### Results for feature set B (bands)

In [None]:
assessment, errors, hits, confusion_matrices, acc, b_acc, f1_macro, f1_weighted, fbeta_macro, fbeta_weighted, pr_macro, rc_macro, pr_weighted, rc_weighted = rsdata_classification.multilayer_perceptron(dart_subdatasets['plastic_and_water'][feature_sets[1]], 
                                                                                            dart_subdatasets['plastic_and_water']['Label'],
                                                                                            usgs_subdatasets['plastic_and_water'][feature_sets[1]],
                                                                                            usgs_subdatasets['plastic_and_water']['Label'],
                                                                                            100)


In [None]:
acc_ = []
b_acc_ = []
f1_macro_ = []
f1_weighted_ = []
fbeta_macro_ = []
fbeta_weighted_ = []
pr_macro_ = []
rc_macro_ = []
pr_weighted_ = []
rc_weighted_ = []

for key in acc.keys():
    acc_.append(acc[key])
    b_acc_.append(b_acc[key])
    f1_macro_.append(f1_macro[key])
    f1_weighted_.append(f1_weighted[key])
    fbeta_macro_.append(fbeta_macro[key])
    fbeta_weighted_.append(fbeta_weighted[key])
    pr_macro_.append(pr_macro[key])
    rc_macro_.append(rc_macro[key])
    pr_weighted_.append(pr_weighted[key])
    rc_weighted_.append(rc_weighted[key])
    
acc_ = pd.DataFrame(acc_, columns = ['Acc'])
b_acc_ = pd.DataFrame(b_acc_, columns = ['B_acc'])
f1_macro_ = pd.DataFrame(f1_macro_, columns = ['F1M'])
f1_weighted_ = pd.DataFrame(f1_weighted_, columns = ['F1W'])
fbeta_macro_ = pd.DataFrame(fbeta_macro_, columns = ['FBM'])
fbeta_weighted_ = pd.DataFrame(fbeta_weighted_, columns = ['FBW'])
pr_macro_ = pd.DataFrame(pr_macro_, columns = ['PrM'])
rc_macro_ = pd.DataFrame(rc_macro_, columns = ['RcM'])
pr_weighted_ = pd.DataFrame(pr_weighted_, columns = ['PrW'])
rc_weighted_ = pd.DataFrame(rc_weighted_, columns = ['RcW'])
    
mt_b = pd.DataFrame([acc_['Acc'].mean(), b_acc_['B_acc'].mean(), f1_macro_['F1M'].mean(), f1_weighted_['F1W'].mean(), fbeta_macro_['FBM'].mean(), fbeta_weighted_['FBW'].mean(), pr_macro_['PrM'].mean(), pr_weighted_['PrW'].mean(), rc_macro_['RcM'].mean(), rc_weighted_['RcW'].mean()], index=['Overall accuracy', 'Balanced accuracy', 'F1 macro', 'F1 weighted', 'Fbeta macro', 'Fbeta weighted', 'Precision macro', 'Precision weighted', 'Recall macro', 'Recall weighted'], columns=['Feature set B'])
mt_b
#mt_b['Feature set B']


In [None]:
stats_by_polymer, stats_by_label_year, stats_by_plastic_cover_percent, stats_by_date = rsdata_classification.stats_classification(assessment, usgs_subdatasets['plastic_and_water'])
stats_by_polymer

In [None]:
confusion_matrices

In [None]:
stats_by_label_year

In [None]:
stats_by_plastic_cover_percent

In [None]:
stats_by_date

#### Maps

In [None]:
dates = list(set(usgs_subdatasets['plastic_and_water'].loc[assessment[0].index]['Path']))

max_score = f1_weighted_.max()['F1W'] #Higher F1 weighted score

best_score = f1_weighted_.loc[f1_weighted_['F1W'] >= max_score].index[0] #Data from best F1 weighted score

classified_data_b = assessment[best_score]

In [None]:
classified_data_b['Simple_Path'] = usgs_subdatasets['plastic_and_water']['Simple_Path']
classified_data_b['Line'] = usgs_subdatasets['plastic_and_water']['Line']
classified_data_b['Column'] = usgs_subdatasets['plastic_and_water']['Column']

rsdata_charts.check_path("charts/english/supervised_classification/ann/feature_set/B/")
usgs_path = "charts/english/supervised_classification/ann/feature_set/B/"

rsdata_charts.check_path("charts/portugues/classificacao_supervisionada/rna/conjunto_atributos/B/")
usgs_caminho = "charts/portugues/classificacao_supervisionada/rna/conjunto_atributos/B/"

for d in dates:
    rsdata_charts.map_nn(d, usgs_subdatasets['plastic_and_water'], classified_data_b, usgs_path, usgs_caminho, 750, 1500)

### Results for feature set C (bands)

In [None]:
assessment, errors, hits, confusion_matrices, acc, b_acc, f1_macro, f1_weighted, fbeta_macro, fbeta_weighted, pr_macro, rc_macro, pr_weighted, rc_weighted = rsdata_classification.multilayer_perceptron(dart_subdatasets['plastic_and_water'][feature_sets[2]], 
                                                                                            dart_subdatasets['plastic_and_water']['Label'],
                                                                                            usgs_subdatasets['plastic_and_water'][feature_sets[2]],
                                                                                            usgs_subdatasets['plastic_and_water']['Label'],
                                                                                            100)


In [None]:
acc_ = []
b_acc_ = []
f1_macro_ = []
f1_weighted_ = []
fbeta_macro_ = []
fbeta_weighted_ = []
pr_macro_ = []
rc_macro_ = []
pr_weighted_ = []
rc_weighted_ = []

for key in acc.keys():
    acc_.append(acc[key])
    b_acc_.append(b_acc[key])
    f1_macro_.append(f1_macro[key])
    f1_weighted_.append(f1_weighted[key])
    fbeta_macro_.append(fbeta_macro[key])
    fbeta_weighted_.append(fbeta_weighted[key])
    pr_macro_.append(pr_macro[key])
    rc_macro_.append(rc_macro[key])
    pr_weighted_.append(pr_weighted[key])
    rc_weighted_.append(rc_weighted[key])
    
acc_ = pd.DataFrame(acc_, columns = ['Acc'])
b_acc_ = pd.DataFrame(b_acc_, columns = ['B_acc'])
f1_macro_ = pd.DataFrame(f1_macro_, columns = ['F1M'])
f1_weighted_ = pd.DataFrame(f1_weighted_, columns = ['F1W'])
fbeta_macro_ = pd.DataFrame(fbeta_macro_, columns = ['FBM'])
fbeta_weighted_ = pd.DataFrame(fbeta_weighted_, columns = ['FBW'])
pr_macro_ = pd.DataFrame(pr_macro_, columns = ['PrM'])
rc_macro_ = pd.DataFrame(rc_macro_, columns = ['RcM'])
pr_weighted_ = pd.DataFrame(pr_weighted_, columns = ['PrW'])
rc_weighted_ = pd.DataFrame(rc_weighted_, columns = ['RcW'])
    
mt_c = pd.DataFrame([acc_['Acc'].mean(), b_acc_['B_acc'].mean(), f1_macro_['F1M'].mean(), f1_weighted_['F1W'].mean(), fbeta_macro_['FBM'].mean(), fbeta_weighted_['FBW'].mean(), pr_macro_['PrM'].mean(), pr_weighted_['PrW'].mean(), rc_macro_['RcM'].mean(), rc_weighted_['RcW'].mean()], index=['Overall accuracy', 'Balanced accuracy', 'F1 macro', 'F1 weighted', 'Fbeta macro', 'Fbeta weighted', 'Precision macro', 'Precision weighted', 'Recall macro', 'Recall weighted'], columns=['Feature set C'])
mt_c
#mt_c['Feature set C']


In [None]:
stats_by_polymer, stats_by_label_year, stats_by_plastic_cover_percent, stats_by_date = rsdata_classification.stats_classification(assessment, usgs_subdatasets['plastic_and_water'])
stats_by_polymer


In [None]:
stats_by_label_year

In [None]:
stats_by_plastic_cover_percent

In [None]:
stats_by_date

#### Maps

In [None]:
dates = list(set(usgs_subdatasets['plastic_and_water'].loc[assessment[0].index]['Path']))

max_score = f1_weighted_.max()['F1W'] #Higher F1 weighted score

best_score = f1_weighted_.loc[f1_weighted_['F1W'] >= max_score].index[0] #Data from best F1 weighted score

classified_data_c = assessment[best_score]

In [None]:
classified_data_c['Simple_Path'] = usgs_subdatasets['plastic_and_water']['Simple_Path']
classified_data_c['Line'] = usgs_subdatasets['plastic_and_water']['Line']
classified_data_c['Column'] = usgs_subdatasets['plastic_and_water']['Column']

rsdata_charts.check_path("charts/english/supervised_classification/ann/feature_set/C/")
usgs_path = "charts/english/supervised_classification/ann/feature_set/C/"

rsdata_charts.check_path("charts/portugues/classificacao_supervisionada/rna/conjunto_atributos/C/")
usgs_caminho = "charts/portugues/classificacao_supervisionada/rna/conjunto_atributos/C/"

for d in dates:
    rsdata_charts.map_nn(d, usgs_subdatasets['plastic_and_water'], classified_data_c, usgs_path, usgs_caminho, 750, 1500)

### Results for feature set D (bands)

In [None]:
assessment, errors, hits, confusion_matrices, acc, b_acc, f1_macro, f1_weighted, fbeta_macro, fbeta_weighted, pr_macro, rc_macro, pr_weighted, rc_weighted = rsdata_classification.multilayer_perceptron(dart_subdatasets['plastic_and_water'][feature_sets[3]], 
                                                                                            dart_subdatasets['plastic_and_water']['Label'],
                                                                                            usgs_subdatasets['plastic_and_water'][feature_sets[3]],
                                                                                            usgs_subdatasets['plastic_and_water']['Label'],
                                                                                            100)

In [None]:
acc_ = []
b_acc_ = []
f1_macro_ = []
f1_weighted_ = []
fbeta_macro_ = []
fbeta_weighted_ = []
pr_macro_ = []
rc_macro_ = []
pr_weighted_ = []
rc_weighted_ = []

for key in acc.keys():
    acc_.append(acc[key])
    b_acc_.append(b_acc[key])
    f1_macro_.append(f1_macro[key])
    f1_weighted_.append(f1_weighted[key])
    fbeta_macro_.append(fbeta_macro[key])
    fbeta_weighted_.append(fbeta_weighted[key])
    pr_macro_.append(pr_macro[key])
    rc_macro_.append(rc_macro[key])
    pr_weighted_.append(pr_weighted[key])
    rc_weighted_.append(rc_weighted[key])
    
acc_ = pd.DataFrame(acc_, columns = ['Acc'])
b_acc_ = pd.DataFrame(b_acc_, columns = ['B_acc'])
f1_macro_ = pd.DataFrame(f1_macro_, columns = ['F1M'])
f1_weighted_ = pd.DataFrame(f1_weighted_, columns = ['F1W'])
fbeta_macro_ = pd.DataFrame(fbeta_macro_, columns = ['FBM'])
fbeta_weighted_ = pd.DataFrame(fbeta_weighted_, columns = ['FBW'])
pr_macro_ = pd.DataFrame(pr_macro_, columns = ['PrM'])
rc_macro_ = pd.DataFrame(rc_macro_, columns = ['RcM'])
pr_weighted_ = pd.DataFrame(pr_weighted_, columns = ['PrW'])
rc_weighted_ = pd.DataFrame(rc_weighted_, columns = ['RcW'])
    
mt_d = pd.DataFrame([acc_['Acc'].mean(), b_acc_['B_acc'].mean(), f1_macro_['F1M'].mean(), f1_weighted_['F1W'].mean(), fbeta_macro_['FBM'].mean(), fbeta_weighted_['FBW'].mean(), pr_macro_['PrM'].mean(), pr_weighted_['PrW'].mean(), rc_macro_['RcM'].mean(), rc_weighted_['RcW'].mean()], index=['Overall accuracy', 'Balanced accuracy', 'F1 macro', 'F1 weighted', 'Fbeta macro', 'Fbeta weighted', 'Precision macro', 'Precision weighted', 'Recall macro', 'Recall weighted'], columns=['Feature set D'])
mt_d
#mt_d['Feature set D']

In [None]:
stats_by_polymer, stats_by_label_year, stats_by_plastic_cover_percent, stats_by_date = rsdata_classification.stats_classification(assessment, usgs_subdatasets['plastic_and_water'])
stats_by_polymer


In [None]:
stats_by_label_year

In [None]:
stats_by_plastic_cover_percent

In [None]:
stats_by_date

#### Maps

In [None]:
dates = list(set(usgs_subdatasets['plastic_and_water'].loc[assessment[0].index]['Path']))

max_score = f1_weighted_.max()['F1W'] #Higher F1 weighted score

best_score = f1_weighted_.loc[f1_weighted_['F1W'] >= max_score].index[0] #Data from best F1 weighted score

classified_data_d = assessment[best_score]

In [None]:
classified_data_d['Simple_Path'] = usgs_subdatasets['plastic_and_water']['Simple_Path']
classified_data_d['Line'] = usgs_subdatasets['plastic_and_water']['Line']
classified_data_d['Column'] = usgs_subdatasets['plastic_and_water']['Column']

rsdata_charts.check_path("charts/english/supervised_classification/ann/feature_set/D/")
usgs_path = "charts/english/supervised_classification/ann/feature_set/D/"

rsdata_charts.check_path("charts/portugues/classificacao_supervisionada/rna/conjunto_atributos/D/")
usgs_caminho = "charts/portugues/classificacao_supervisionada/rna/conjunto_atributos/D/"

for d in dates:
    rsdata_charts.map_nn(d, usgs_subdatasets['plastic_and_water'], classified_data_d, usgs_path, usgs_caminho, 750, 1500)

## AutoML

We employed an AutoML approach using the TPOT package, which identifies the best-performing models for a given dataset via genetic algorithms. TPOT was applied to the DART 2021, DART 2023, and USGS datasets. Since the results obtained with KMeans, Random Forest, and Multilayer Perceptron indicated that using indices as input features degraded performance, we removed them at this stage and used only the raw band values thereafter. The pipelines identified by TPOT were exported as .py files, and after their generation, only the model with the specified hyperparameters was used, since the other pipeline components were chosen to match those of the other supervised model experiments.

When we tested AutoML to select the model with the highest overall accuracy using a single dataset (simulated or observed) for both training and testing, the selected model was XGBoost, with small variations in hyperparameters depending on the dataset used. However, when XGBoost was applied using simulated data for training and observed data for testing, its predictions marked almost all pixels as water, achieving high overall accuracy but low recall for the plastic class, which deviates from the main objective of the application. Therefore, these models were disregarded. Finally, we chose to select the model with the highest F1-score using simulated DART 2023 data for training and observed data for testing.

#### DART 2023 (training set) X USGS (test set)

Initial Model Selection and Testing

In [None]:
feature_names = ['Blue', 'Green', 'Red', 'RedEdge1', 'RedEdge2', 'RedEdge3', 'NIR1', 'NIR2', 'SWIR1', 'SWIR2']
radiometric_indexes = ['NDWI', 'WRI', 'NDVI', 'AWEI', 'MNDWI', 'SR', 'PI', 'RNDVI', 'FDI', 'PWDI']

In [None]:
from sklearn.metrics import make_scorer, f1_score

# Criar diretório para salvar checkpoints, se ainda não existir
checkpoint_dir = "automl/checkpoints"
os.makedirs(checkpoint_dir, exist_ok=True)

# Separar os dados em treino (70%), teste (20%) e validação (10%)
X_train, y_train =  dart_subdatasets['plastic_and_water'][feature_names], dart_subdatasets['plastic_and_water']['Label']

X_test, X_val, y_test, y_val = train_test_split(
    usgs_subdatasets['plastic_and_water'][feature_names], usgs_subdatasets['plastic_and_water']['Label'], test_size=1/3, random_state=42
)

# Codificar as classes como números
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)
y_val_encoded = label_encoder.transform(y_val)  # Validação separada

# Aplicar SMOTETomek para balancear o treino
smotetomek = SMOTETomek(random_state=42)
X_train_resampled, y_train_resampled = smotetomek.fit_resample(X_train, y_train_encoded)

tpot = TPOTClassifier(
    generations=5,
    population_size=50,
    scoring=make_scorer(f1_score, average='binary', pos_label=1),  # assumindo que 1 = plástico
    verbosity=3,
    random_state=42,
    periodic_checkpoint_folder=checkpoint_dir
)

# Treinar o modelo
tpot.fit(X_train_resampled, y_train_resampled)

# Avaliar o modelo nos dados de teste (não usar validação ainda)
score = tpot.score(X_test, y_test_encoded)
print(f"Pontuação nos dados de teste: {score}")

# Exportar o melhor pipeline encontrado
tpot.export('automl/dartxusgs_best_pipeline_withoutindexes_f1score.py')

In [None]:
X_train, y_train =  dart_subdatasets['plastic_and_water'][feature_names], dart_subdatasets['plastic_and_water']['Label']

X_test, y_test = usgs_subdatasets['plastic_and_water'][feature_names], usgs_subdatasets['plastic_and_water']['Label']

# Codificar as classes como números
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

# Treinar o pipeline
classifier = ExtraTreesClassifier(bootstrap=False, criterion="entropy", max_features=0.35000000000000003, 
                                  min_samples_leaf=18, min_samples_split=9, n_estimators=100, random_state=100).fit(X_train, y_train_encoded)

y_test_pred = classifier.predict(X_test)

# Matriz de confusão
conf_matrix = confusion_matrix(y_test_encoded, y_test_pred)
conf_matrix

In [None]:
#Fazer shap para modelo que se sair melhor

In [None]:
#Checar modelo modelo e métricas obtidos

In [None]:
# Separar os dados em treino (70%), teste (20%) e validação (10%)
X_train, y_train =  dart_subdatasets['plastic_and_water'][feature_names], dart_subdatasets['plastic_and_water']['Label']

X_test, y_test = usgs_subdatasets['plastic_and_water'][feature_names], usgs_subdatasets['plastic_and_water']['Label']

# Codificar as classes como números
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

# Treinar o pipeline
classifier = ExtraTreesClassifier(bootstrap=False, criterion="entropy", max_features=0.35000000000000003, 
                                  min_samples_leaf=18, min_samples_split=9, n_estimators=100, random_state=1).fit(X_train, y_train_encoded)

y_test_pred = classifier.predict(X_test)

# Calcular métricas de desempenho
accuracy = accuracy_score(y_test_encoded, y_test_pred)

# 📌 Métricas ponderadas (weighted) – Consideram o tamanho de cada classe
precision_weighted = precision_score(y_test_encoded, y_test_pred, average="weighted")
recall_weighted = recall_score(y_test_encoded, y_test_pred, average="weighted")
f1_weighted = f1_score(y_test_encoded, y_test_pred, average="weighted")

# 📌 Métricas balanceadas (macro) – Tratam todas as classes igualmente
precision_macro = precision_score(y_test_encoded, y_test_pred, average="macro")
recall_macro = recall_score(y_test_encoded, y_test_pred, average="macro")
f1_macro = f1_score(y_test_encoded, y_test_pred, average="macro")

# Relatório de classificação
class_report = classification_report(y_test_encoded, y_test_pred, target_names=label_encoder.classes_)

# Matriz de confusão
conf_matrix = confusion_matrix(y_test_encoded, y_test_pred)

# Exibir resultados
print(f"\n🔹 **Métricas no conjunto de validação:**\n")
print(f"✅ **Acurácia:** {accuracy:.4f}")

print("\n📌 **Métricas Ponderadas (Weighted):**")
print(f"   ✅ **Precisão:** {precision_weighted:.4f}")
print(f"   ✅ **Recall:** {recall_weighted:.4f}")
print(f"   ✅ **F1-Score:** {f1_weighted:.4f}")

print("\n📌 **Métricas Balanceadas (Macro):**")
print(f"   ✅ **Precisão:** {precision_macro:.4f}")
print(f"   ✅ **Recall:** {recall_macro:.4f}")
print(f"   ✅ **F1-Score:** {f1_macro:.4f}")

print("\n🔹 **Relatório de Classificação:**\n")
print(class_report)

# Plotar a matriz de confusão
plt.figure(figsize=(8,6))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.xlabel("Predito")
plt.ylabel("Real")
plt.title("🔹 Matriz de Confusão")
plt.show()

In [None]:
X_train, y_train =  dart_subdatasets['plastic_and_water'][feature_names], dart_subdatasets['plastic_and_water']['Label']

X_test, y_test = usgs_subdatasets['plastic_and_water'][feature_names], usgs_subdatasets['plastic_and_water']['Label']

# Codificar as classes como números
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

# Treinar o pipeline
classifier = ExtraTreesClassifier(bootstrap=False, criterion="entropy", max_features=0.35000000000000003, 
                                  min_samples_leaf=18, min_samples_split=9, n_estimators=100, random_state=9).fit(X_train, y_train_encoded)

y_test_pred = classifier.predict(X_test)

# Matriz de confusão
conf_matrix = confusion_matrix(y_test_encoded, y_test_pred)
conf_matrix

In [None]:
import shap
import numpy as np
import matplotlib.pyplot as plt

# Criar o objeto explicador SHAP com base no modelo treinado
explainer = shap.Explainer(classifier.predict, X_val)

# Calcular valores SHAP para o conjunto de validação
shap_values = explainer(X_val)

# 🔹 **Resumo Gráfico de Importância Global**
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_val, show=True)

# 🔹 **Beeswarm Plot (Impacto das variáveis nas previsões)**
plt.figure(figsize=(10, 6))
shap.plots.beeswarm(shap_values)

# 🔹 **Dependência para variáveis-chave (Substitua 'feature_name' por uma variável relevante)**
feature_name = X_val.columns[np.argmax(np.abs(shap_values.values).mean(axis=0))]  # Escolhe a mais importante
plt.figure(figsize=(10, 6))
shap.plots.scatter(shap_values[:, feature_name], color=shap_values)

# 🔹 **Explicação Local (Amostra Individual)**
sample_idx = np.random.randint(0, len(X_val))  # Escolhe uma amostra aleatória
shap.plots.waterfall(shap_values[sample_idx])


In [None]:
from lime.lime_tabular import LimeTabularExplainer
import numpy as np

# Criar o explicador LIME
explainer = LimeTabularExplainer(
    training_data=X_train.values,  # Dados de treino usados pelo modelo
    feature_names=X_train.columns.tolist(),
    class_names=label_encoder.classes_.tolist(),
    mode="classification"
)

# Selecionar uma amostra aleatória para explicar
sample_idx = np.random.randint(0, len(X_val))
sample_instance = X_val.iloc[sample_idx].values

# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    sample_instance, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()


In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[0].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#WATER

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[7400].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#SAND

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[4003].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#LDPE 100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[32843].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#LDPE 40% X WATER 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[36183].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#LDPE 40% X SAND 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[76003].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[104823].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 40% X Water 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[108667].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 40% X Sand 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[366387].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#PVC 100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[104823].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 40% X Water 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names + radiometric_indexes].iloc[108667].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 40% X Sand 60%

In [None]:
import shap
import numpy as np
import matplotlib.pyplot as plt

# Criar o objeto explicador SHAP com base no modelo treinado
explainer = shap.Explainer(classifier.predict, X_val)

# Calcular valores SHAP para o conjunto de validação
shap_values = explainer(X_val)

# 🔹 **Resumo Gráfico de Importância Global**
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_val, show=True)

# 🔹 **Beeswarm Plot (Impacto das variáveis nas previsões)**
plt.figure(figsize=(10, 6))
shap.plots.beeswarm(shap_values)

# 🔹 **Dependência para variáveis-chave (Substitua 'feature_name' por uma variável relevante)**
feature_name = X_val.columns[np.argmax(np.abs(shap_values.values).mean(axis=0))]  # Escolhe a mais importante
plt.figure(figsize=(10, 6))
shap.plots.scatter(shap_values[:, feature_name], color=shap_values)

# 🔹 **Explicação Local (Amostra Individual)**
sample_idx = np.random.randint(0, len(X_val))  # Escolhe uma amostra aleatória
shap.plots.waterfall(shap_values[sample_idx])


In [None]:
from lime.lime_tabular import LimeTabularExplainer
import numpy as np

# Criar o explicador LIME
explainer = LimeTabularExplainer(
    training_data=X_train.values,  # Dados de treino usados pelo modelo
    feature_names=X_train.columns.tolist(),
    class_names=label_encoder.classes_.tolist(),
    mode="classification"
)

# Selecionar uma amostra aleatória para explicar
sample_idx = np.random.randint(0, len(X_val))
sample_instance = X_val.iloc[sample_idx].values

# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    sample_instance, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()


In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[0].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#WATER

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[7400].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#SAND

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[4003].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#LDPE 100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[32843].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#LDPE 40% X WATER 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[36183].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#LDPE 40% X SAND 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[76003].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[104823].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 40% X Water 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[108667].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 40% X Sand 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[366387].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#PVC 100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[104823].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 40% X Water 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[108667].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#MicroNapo 40% X Sand 60%

In [None]:
#Confirmar XGBoost
#Rodar XGBoost com mesmo protocolo das redes neurais e random forest pra avaliar estatísticas (fazer comparação mais direta com resultados prévios)
#Interpretabilidade/explicabilidade por último (para comparação de modelos)
#Tirar índices

In [None]:
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from imblearn.combine import SMOTETomek  # Versão mais rápida do SMOTE
import os

# Criar diretório para salvar checkpoints, se ainda não existir
checkpoint_dir = "automl/checkpoints"
os.makedirs(checkpoint_dir, exist_ok=True)

# Separar os dados em treino (70%), teste (20%) e validação (10%)
X_train, X_temp, y_train, y_temp = train_test_split(
    dart[feature_names], dart['Label'], 
    test_size=0.3, random_state=42
)

X_test, X_val, y_test, y_val = train_test_split(
    X_temp, y_temp, test_size=1/3, random_state=42
)

# Codificar as classes como números
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)
y_val_encoded = label_encoder.transform(y_val)  # Validação separada

# Aplicar SMOTETomek para balancear o treino
smotetomek = SMOTETomek(random_state=42)
X_train_resampled, y_train_resampled = smotetomek.fit_resample(X_train, y_train_encoded)

# Inicializar e rodar o TPOT (menos gerações e população para acelerar)
tpot = TPOTClassifier(
    generations=5,  # Reduzido para acelerar
    population_size=50,  # Reduzido para acelerar
    verbosity=3,  # Logs mais detalhados
    random_state=42,
    periodic_checkpoint_folder=checkpoint_dir  # Salvar checkpoints
)

# Treinar o modelo
tpot.fit(X_train_resampled, y_train_resampled)

# Avaliar o modelo nos dados de teste (não usar validação ainda)
score = tpot.score(X_test, y_test_encoded)
print(f"Pontuação nos dados de teste: {score}")

# Exportar o melhor pipeline encontrado
tpot.export('automl/dart_best_pipeline_withoutindexes.py')


In [None]:
import sys
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import (
    classification_report, confusion_matrix, accuracy_score, 
    precision_score, recall_score, f1_score
)

# Adicionar o diretório ao caminho para importação
sys.path.append("automl")

# Importar apenas o pipeline exportado
from dart_best_pipeline_withoutindexes import exported_pipeline


# Criar um LabelEncoder e aplicar aos dados de treinamento
label_encoder = LabelEncoder()
training_target = label_encoder.fit_transform(y_train)  # Converte ['Plastic', 'Sand', 'Water'] → [0, 1, 2]

# Treinar o pipeline
classifier = exported_pipeline.fit(X_train, training_target)

# Aplicar a mesma transformação que foi usada no treinamento aos dados de validação
validation_target = label_encoder.transform(y_val)

# Fazer previsões no conjunto de validação
y_val_pred = classifier.predict(X_val)

# Calcular métricas de desempenho
accuracy = accuracy_score(validation_target, y_val_pred)

# 📌 Métricas ponderadas (weighted) – Consideram o tamanho de cada classe
precision_weighted = precision_score(validation_target, y_val_pred, average="weighted")
recall_weighted = recall_score(validation_target, y_val_pred, average="weighted")
f1_weighted = f1_score(validation_target, y_val_pred, average="weighted")

# 📌 Métricas balanceadas (macro) – Tratam todas as classes igualmente
precision_macro = precision_score(validation_target, y_val_pred, average="macro")
recall_macro = recall_score(validation_target, y_val_pred, average="macro")
f1_macro = f1_score(validation_target, y_val_pred, average="macro")

# Relatório de classificação
class_report = classification_report(validation_target, y_val_pred, target_names=label_encoder.classes_)

# Matriz de confusão
conf_matrix = confusion_matrix(validation_target, y_val_pred)

# Exibir resultados
print(f"\n🔹 **Métricas no conjunto de validação:**\n")
print(f"✅ **Acurácia:** {accuracy:.4f}")

print("\n📌 **Métricas Ponderadas (Weighted):**")
print(f"   ✅ **Precisão:** {precision_weighted:.4f}")
print(f"   ✅ **Recall:** {recall_weighted:.4f}")
print(f"   ✅ **F1-Score:** {f1_weighted:.4f}")

print("\n📌 **Métricas Balanceadas (Macro):**")
print(f"   ✅ **Precisão:** {precision_macro:.4f}")
print(f"   ✅ **Recall:** {recall_macro:.4f}")
print(f"   ✅ **F1-Score:** {f1_macro:.4f}")

print("\n🔹 **Relatório de Classificação:**\n")
print(class_report)

# Plotar a matriz de confusão
plt.figure(figsize=(8,6))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.xlabel("Predito")
plt.ylabel("Real")
plt.title("🔹 Matriz de Confusão")
plt.show()

In [None]:
import shap
import numpy as np
import matplotlib.pyplot as plt

# Criar o objeto explicador SHAP com base no modelo treinado
explainer = shap.Explainer(classifier.predict, X_val)

# Calcular valores SHAP para o conjunto de validação
shap_values = explainer(X_val)

# 🔹 **Resumo Gráfico de Importância Global**
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_val, show=True)

# 🔹 **Beeswarm Plot (Impacto das variáveis nas previsões)**
plt.figure(figsize=(10, 6))
shap.plots.beeswarm(shap_values)

# 🔹 **Dependência para variáveis-chave (Substitua 'feature_name' por uma variável relevante)**
feature_name = X_val.columns[np.argmax(np.abs(shap_values.values).mean(axis=0))]  # Escolhe a mais importante
plt.figure(figsize=(10, 6))
shap.plots.scatter(shap_values[:, feature_name], color=shap_values)

# 🔹 **Explicação Local (Amostra Individual)**
sample_idx = np.random.randint(0, len(X_val))  # Escolhe uma amostra aleatória
shap.plots.waterfall(shap_values[sample_idx])


In [None]:
from lime.lime_tabular import LimeTabularExplainer
import numpy as np

# Criar o explicador LIME
explainer = LimeTabularExplainer(
    training_data=X_train.values,  # Dados de treino usados pelo modelo
    feature_names=X_train.columns.tolist(),
    class_names=label_encoder.classes_.tolist(),
    mode="classification"
)

# Selecionar uma amostra aleatória para explicar
sample_idx = np.random.randint(0, len(X_val))
sample_instance = X_val.iloc[sample_idx].values

# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    sample_instance, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()


In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    dart[feature_names].iloc[0].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#WATER

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    dart[feature_names].iloc[3527].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#LDPE 100% S0 Transparente

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    dart[feature_names].iloc[12627].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#LDPE Dry S0 Transparente 40% X WATER 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    old_dart[feature_names].iloc[49587].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#PET Dry S0 Transparente 60% X WATER 40%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    dart[feature_names].iloc[112499].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#PP Wet S0 White 80% X WATER 20%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    dart[feature_names].iloc[125943].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#PP Submerged (2cm) S2 Orange 40% X WATER 60%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    dart[feature_names].iloc[206547].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#PP Submerged (5cm) S5 White 100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    dart[feature_names].iloc[407247].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#Whitecap 100%

In [None]:
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from imblearn.combine import SMOTETomek  # Versão mais rápida do SMOTE
import os

# Criar diretório para salvar checkpoints, se ainda não existir
checkpoint_dir = "automl/checkpoints"
os.makedirs(checkpoint_dir, exist_ok=True)

# Separar os dados em treino (70%), teste (20%) e validação (10%)
X_train, X_temp, y_train, y_temp = train_test_split(
    usgs[feature_names], usgs['Label'], 
    test_size=0.3, random_state=42
)

X_test, X_val, y_test, y_val = train_test_split(
    X_temp, y_temp, test_size=1/3, random_state=42
)

# Codificar as classes como números
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)
y_val_encoded = label_encoder.transform(y_val)  # Validação separada

# Aplicar SMOTETomek para balancear o treino
smotetomek = SMOTETomek(random_state=42)
X_train_resampled, y_train_resampled = smotetomek.fit_resample(X_train, y_train_encoded)

# Inicializar e rodar o TPOT (menos gerações e população para acelerar)
tpot = TPOTClassifier(
    generations=5,  # Reduzido para acelerar
    population_size=50,  # Reduzido para acelerar
    verbosity=3,  # Logs mais detalhados
    random_state=42,
    periodic_checkpoint_folder=checkpoint_dir  # Salvar checkpoints
)

# Treinar o modelo
tpot.fit(X_train_resampled, y_train_resampled)

# Avaliar o modelo nos dados de teste (não usar validação ainda)
score = tpot.score(X_test, y_test_encoded)
print(f"Pontuação nos dados de teste: {score}")

# Exportar o melhor pipeline encontrado
tpot.export('automl/usgs_best_pipeline_withoutindexes.py')


In [None]:
import sys
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import (
    classification_report, confusion_matrix, accuracy_score, 
    precision_score, recall_score, f1_score
)

# Adicionar o diretório ao caminho para importação
sys.path.append("automl")

# Importar apenas o pipeline exportado
from usgs_best_pipeline_withoutindexes import exported_pipeline


# Criar um LabelEncoder e aplicar aos dados de treinamento
label_encoder = LabelEncoder()
training_target = label_encoder.fit_transform(y_train)  # Converte ['Plastic', 'Sand', 'Water'] → [0, 1, 2]

# Treinar o pipeline
classifier = exported_pipeline.fit(X_train, training_target)

# Aplicar a mesma transformação que foi usada no treinamento aos dados de validação
validation_target = label_encoder.transform(y_val)

# Fazer previsões no conjunto de validação
y_val_pred = classifier.predict(X_val)

# Calcular métricas de desempenho
accuracy = accuracy_score(validation_target, y_val_pred)

# 📌 Métricas ponderadas (weighted) – Consideram o tamanho de cada classe
precision_weighted = precision_score(validation_target, y_val_pred, average="weighted")
recall_weighted = recall_score(validation_target, y_val_pred, average="weighted")
f1_weighted = f1_score(validation_target, y_val_pred, average="weighted")

# 📌 Métricas balanceadas (macro) – Tratam todas as classes igualmente
precision_macro = precision_score(validation_target, y_val_pred, average="macro")
recall_macro = recall_score(validation_target, y_val_pred, average="macro")
f1_macro = f1_score(validation_target, y_val_pred, average="macro")

# Relatório de classificação
class_report = classification_report(validation_target, y_val_pred, target_names=label_encoder.classes_)

# Matriz de confusão
conf_matrix = confusion_matrix(validation_target, y_val_pred)

# Exibir resultados
print(f"\n🔹 **Métricas no conjunto de validação:**\n")
print(f"✅ **Acurácia:** {accuracy:.4f}")

print("\n📌 **Métricas Ponderadas (Weighted):**")
print(f"   ✅ **Precisão:** {precision_weighted:.4f}")
print(f"   ✅ **Recall:** {recall_weighted:.4f}")
print(f"   ✅ **F1-Score:** {f1_weighted:.4f}")

print("\n📌 **Métricas Balanceadas (Macro):**")
print(f"   ✅ **Precisão:** {precision_macro:.4f}")
print(f"   ✅ **Recall:** {recall_macro:.4f}")
print(f"   ✅ **F1-Score:** {f1_macro:.4f}")

print("\n🔹 **Relatório de Classificação:**\n")
print(class_report)

# Plotar a matriz de confusão
plt.figure(figsize=(8,6))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.xlabel("Predito")
plt.ylabel("Real")
plt.title("🔹 Matriz de Confusão")
plt.show()

In [None]:
import shap
import numpy as np
import matplotlib.pyplot as plt

# Criar o objeto explicador SHAP com base no modelo treinado
explainer = shap.Explainer(classifier.predict, X_val)

# Calcular valores SHAP para o conjunto de validação
shap_values = explainer(X_val)

# 🔹 **Resumo Gráfico de Importância Global**
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_val, show=True)

# 🔹 **Beeswarm Plot (Impacto das variáveis nas previsões)**
plt.figure(figsize=(10, 6))
shap.plots.beeswarm(shap_values)

# 🔹 **Dependência para variáveis-chave (Substitua 'feature_name' por uma variável relevante)**
feature_name = X_val.columns[np.argmax(np.abs(shap_values.values).mean(axis=0))]  # Escolhe a mais importante
plt.figure(figsize=(10, 6))
shap.plots.scatter(shap_values[:, feature_name], color=shap_values)

# 🔹 **Explicação Local (Amostra Individual)**
sample_idx = np.random.randint(0, len(X_val))  # Escolhe uma amostra aleatória
shap.plots.waterfall(shap_values[sample_idx])


In [None]:
from lime.lime_tabular import LimeTabularExplainer
import numpy as np

# Criar o explicador LIME
explainer = LimeTabularExplainer(
    training_data=X_train.values,  # Dados de treino usados pelo modelo
    feature_names=X_train.columns.tolist(),
    class_names=label_encoder.classes_.tolist(),
    mode="classification"
)

# Selecionar uma amostra aleatória para explicar
sample_idx = np.random.randint(0, len(X_val))
sample_instance = X_val.iloc[sample_idx].values

# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    sample_instance, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()


In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    usgs[feature_names].iloc[0].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#WATER

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    usgs[feature_names].iloc[1737].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#HDPE MESH ~100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    usgs[feature_names].iloc[3085].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#COAST 100%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    usgs[feature_names].iloc[1965].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#WOOD UNKNOWN%

In [None]:
# Gerar explicação para a previsão da amostra escolhida
exp = explainer.explain_instance(
    usgs[feature_names].iloc[954].values, 
    classifier.predict_proba  # Função que retorna probabilidades do modelo
)

# Exibir explicação no console
print(exp.as_list())

# Mostrar visualização interativa do LIME
exp.show_in_notebook()
#PLASTIC BAGS 27% X WATER 73%

Opção 1: H2O AutoML
H2O AutoML pode lidar com tarefas de classificação e regressão automaticamente.
Obs: Precisa do Java, deixei pra depois.

In [None]:
import h2o
from h2o.automl import H2OAutoML

h2o.init()

# Converter os dados para o formato H2O
data = h2o.H2OFrame(df1)

# Separar treino e teste
train, test = data.split_frame(ratios=[0.8], seed=42)

# Identificar preditores e alvo
target = 'classe_alvo'
predictors = data.columns[:-1]

# Treinar o modelo AutoML
aml = H2OAutoML(max_models=20, seed=1)
aml.train(x=predictors, y=target, training_frame=train)

# Avaliar o modelo
perf = aml.leader.model_performance(test)
print(perf)


Opção 2: TPOT
Se você deseja pipelines otimizados geneticamente, TPOT é excelente.

In [None]:
old_dart.columns

In [None]:
feature_names + radiometric_indexes

In [None]:
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Dividir os dados em treino e teste
X_train, X_test, y_train, y_test = train_test_split(old_dart[feature_names + radiometric_indexes], old_dart['Label'], test_size=0.2, random_state=42)

# Codificar as classes como números
label_encoder = LabelEncoder()

# Transformar as classes de treino e teste
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

# Rodar o TPOT com os dados processados
tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train_encoded)

# Avaliar o modelo nos dados de teste
score = tpot.score(X_test, y_test_encoded)
print(f"Pontuação nos dados de teste: {score}")

# Exportar o pipeline treinado
tpot.export('automl/olddart_best_pipeline.py')
#Gerar novo dataset só com dados numéricos (features + alvo)

In [None]:
# ✅ 2️⃣ **SHAP - Importância Global das Features**
booster = exported_pipeline.get_booster()
booster.save_model("temp_model.json")

explainer = shap.Explainer(exported_pipeline)
shap_values = explainer(testing_features)

plt.figure(figsize=(10, 5))
shap.summary_plot(shap_values, X_test, feature_names=feature_names + radiometric_indexes)
plt.show()

In [None]:
print(exported_pipeline)

In [None]:
# ✅ 3️⃣ **LIME - Explicação Local para uma amostra**
explainer_lime = lime.lime_tabular.LimeTabularExplainer(
    training_data=np.array(X_train), 
    feature_names=feature_names + radiometric_indexes,
    class_names=label_encoder.classes_,
    mode="classification"
)

# Escolher um exemplo aleatório para explicar
idx = np.random.randint(0, X_test.shape[0])
exp = explainer_lime.explain_instance(X_test.iloc[idx].values, exported_pipeline.predict_proba)
exp.show_in_notebook()  # Para visualizar no Jupyter Notebook

In [None]:
old_dart.iloc[idx]

In [None]:
from lime.lime_tabular import LimeTabularExplainer

explainer = LimeTabularExplainer(
    training_data=X_train.values,
    feature_names=X_train.columns,
    class_names=tpot.classes_,
    mode='classification'
)

exp = explainer.explain_instance(X_test.iloc[0].values, tpot.predict_proba)
exp.show_in_notebook()


In [None]:
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from imblearn.over_sampling import SMOTE
import os

# Criar diretório para salvar checkpoints, se ainda não existir
checkpoint_dir = "automl/checkpoints"
os.makedirs(checkpoint_dir, exist_ok=True)

# Dividir os dados em treino e teste
X_train, X_test, y_train, y_test = train_test_split(
    old_dart[feature_names + radiometric_indexes], old_dart['Label'], test_size=0.2, random_state=42
)

# Codificar as classes como números
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)

# Aplicar SMOTE para balancear as classes no conjunto de treino
smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train_encoded)

# Inicializar e rodar o TPOT
tpot = TPOTClassifier(
    generations=10,  # Aumentado para buscar pipelines melhores
    population_size=100,  # Aumentado para mais diversidade
    verbosity=3,  # Logs mais detalhados
    random_state=42,
    periodic_checkpoint_folder=checkpoint_dir  # Salvar checkpoints
)

# Treinar o modelo
tpot.fit(X_train_resampled, y_train_resampled)

# Avaliar o modelo
score = tpot.score(X_test, y_test_encoded)
print(f"Pontuação nos dados de teste: {score}")

# Exportar o melhor pipeline encontrado
tpot.export('automl/olddart_best_pipeline_optimized.py')


In [None]:
import numpy as np
import pandas as pd
import shap
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from olddart_best_pipeline import exported_pipeline  # Importando o modelo salvo

# 🔹 **Recarregar os dados no mesmo formato**
#X_train, X_test, y_train, y_test = train_test_split(
#    old_dart[feature_names + radiometric_indexes], old_dart[['Label']], test_size=0.2, random_state=42
#)

# 🔹 **Corrigir o formato de y_train e y_test**
#label_encoder = LabelEncoder()
#y_train_encoded = label_encoder.fit_transform(y_train.values.ravel())  # Transformando para array 1D
#y_test_encoded = label_encoder.transform(y_test.values.ravel())

# 🔹 **Fazer previsões**
#y_pred = exported_pipeline.predict(X_test)

# ✅ 1️⃣ **Métricas de desempenho**
#print("🔹 Relatório de Classificação:")
#print(classification_report(y_test_encoded, y_pred, target_names=label_encoder.classes_))

#print("🔹 Matriz de Confusão:")
#print(confusion_matrix(y_test_encoded, y_pred))


In [None]:
import joblib
import shap
from lime.lime_tabular import LimeTabularExplainer
import pandas as pd
import numpy as np

# Carregar os dados
#X_train = ...  # Substitua pelos seus dados de treino
#X_test = ...   # Substitua pelos seus dados de teste
#y_train = ...  # Substitua pelas suas classes de treino
#y_test = ...   # Substitua pelas suas classes de teste
X_train, X_test, y_train, y_test = train_test_split(old_dart[feature_names + radiometric_indexes], old_dart['Label'], test_size=0.2, random_state=42)

# Passo 1: Carregar o pipeline ajustado
from olddart_best_pipeline import exported_pipeline

# Treinar o pipeline com os dados de treino
exported_pipeline.fit(X_train, y_train_encoded)

# Passo 2: SHAP - Gerar explicações
explainer = shap.Explainer(exported_pipeline.predict, X_test)
shap_values = explainer(X_test)

# Plotar gráfico de resumo SHAP
shap.summary_plot(shap_values, X_test)

# Passo 3: LIME - Gerar explicações
lime_explainer = LimeTabularExplainer(
    training_data=X_train.values, 
    feature_names=X_train.columns,
    class_names=np.unique(y_train_encoded),
    mode="classification"
)

# Explicação para uma única instância
i = 0  # Índice da instância
exp = lime_explainer.explain_instance(X_test.iloc[i].values, exported_pipeline.predict_proba)
exp.show_in_notebook()


In [None]:
dart

In [None]:
usgs

Opção 3: PyCaret
Ideal para análise exploratória e experimentação rápida.

# References

[1] Themistocleous, K., Papoutsa, C., Michaelides, S., & Hadjimitsis, D. (2020). Investigating detection of floating plastic litter from space using sentinel-2 imagery. Remote Sensing, 12(16), 2648. <https://www.mdpi.com/2072-4292/12/16/2648>

[2] Biermann, L., Clewley, D., Martinez-Vicente, V., & Topouzelis, K. (2020). Finding plastic patches in coastal waters using optical satellite data. Scientific reports, 10(1), 5364.<https://www.nature.com/articles/s41598-020-62298-z>. (Accessed on 08/07/2021).

[3] Topouzelis, K. 2020. Plastic litter project 2019 dataset, https://zenodo.org/record/3752719#668.ZFK8BXbMK3B. (Accessed on 05/03/2023)

[4] Papageorgiou, D. & Topouzelis, K. 2022. Plastic litter project 2021 dataset, https://zenodo.org/record/6577085112#.ZFLCyHbMK3B. (Accessed on 05/03/2023).

[5] Garaba, S. P.; Dierssen, H. M. Spectral reference library of 11 types of virgin plastic pellets common in marine plastic debris. 2017. <https://tinyurl.com/y7de3cup>. (Accessed on 08/07/2021).

[6] Garaba, S. P.; Dierssen, H. M. Spectral reflectance of dry and wet marine harvested microplastics from Kamilo Point, Pacific Ocean. 2019. <https://tinyurl.com/2amnd89u>. (Accessed on 08/07/2021).

[7] Garaba, S. P.; DIERSSEN, H. M. Spectral reflectance of washed ashore macroplastics. 2019. <https://ecosis.org/package/spectralreflectanceofwashedashoremacroplastics>. (Accessed on 08/07/2021).

[8] M. Moshtaghi, E. Knaeps, S. Sterckx, S. Garaba, and D. Meire. Spectral reflectance of marine macroplastics in the vnir and swir measured in a controlled environment. Scientific Reports, 11(1):1–12, 2021.

[9] S. K. Meerdink, S. J. Hook, D. A. Roberts, and E. A. Abbott. The ecostress spectral library version 1.0. Remote Sensing of Environment, 230:111196, 2019.

[10] H. M. Dierssen. Hyperspectral measurements, parameterizations, and atmospheric correction of whitecaps and foam from visible to shortwave infrared for ocean color remote sensing. Frontiers in Earth Science, 7:14, 2019.

In [None]:
#Separar por cor e grau de submersao tambem
datasets_names = ["DART", "DART (100% cover percent)"]

traces = [
             [
                 [dart_subdatasets['plastic_ldpe'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_micronapo'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_nylon'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_pet'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_pp'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_pvc'][feature].mean() for feature in feature_names],
                 [dart_subdatasets['water'][feature].mean() for feature in feature_names]
                 #[dart_subdatasets['sand'][feature].mean() for feature in feature_names]
             ],
             [
                 [dart_subdatasets['plastic_ldpe'].query('Cover_percent == 100')[feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_micronapo'].query('Cover_percent == 100')[feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_nylon'].query('Cover_percent == 100')[feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_pet'].query('Cover_percent == 100')[feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_pp'].query('Cover_percent == 100')[feature].mean() for feature in feature_names],
                 [dart_subdatasets['plastic_pvc'].query('Cover_percent == 100')[feature].mean() for feature in feature_names],
                 [dart_subdatasets['water'][feature].mean() for feature in feature_names]
                 #[dart_subdatasets['sand'][feature].mean() for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names], 
          [feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names]]

legends = [['LDPE (DART)', 'MicroNapo (DART)', 'Nylon (DART)', 'PET (DART)', 'PP (DART)', 'PVC (DART)', 'Water (DART)', 'Whitecap (DART)'],
           ['LDPE (DART 100%)', 'MicroNapo (DART 100%)', 'Nylon (DART 100%)', 'PET (DART 100%)', 'PP (DART 100%)', 'PVC (DART 100%)', 'Water (DART)', 'Whitecap (DART)']] #De bags e bottles tinha poucos pixels mistos com mais de 100% de cobertura

modes = [['dot', 'dot', 'dot', 'dot', 'dot', 'dot', 'dot', 'dot'],
         ['lines', 'lines', 'lines', 'lines', 'lines', 'lines', 'lines', 'lines']]

colors = [['#ADF224','#1AB1B1', '#6342d1', '#D81F88', '#FF8825', '#F1C800', '#1E90FF', '#000'], #Plastic, Water, Sand
          ['#ADF224','#1AB1B1', '#6342d1', '#D81F88', '#FF8825', '#F1C800', '#1E90FF', '#000']]

chart_title = "DART elements statistics - Mean spectral signatures"
x_title = "Band"
y_title = "Reflectance"
height = 500
width = 800
guidance = "horizontal"

export_name = str(input("Path for chart of mean signatures of DART elements: ")) 
#For example: charts/english/exploratory_analysis/descriptive_statistics/elements_mean_signatures_dart

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
datasets_names = ["DART", "DART (100% de cobertura)"]

legends = [['LDPE (DART)', 'MicroNapo (DART)', 'Nylon (DART)', 'PET (DART)', 'PP (DART)', 'PVC (DART)', 'Água (DART)', 'Espuma (DART)'],
           ['LDPE (DART 100%)', 'MicroNapo (DART 100%)', 'Nylon (DART 100%)', 'PET (DART 100%)', 'PP (DART 100%)', 'PVC (DART 100%)', 'Água (DART)', 'Espuma (DART)']]

chart_title = "Estatísticas por elemento no DART - Assinaturas espectrais médias"
x_title = "Banda"
y_title = "Reflectância"
width = 850

export_name = str(input("Caminhos para gráficos de assinatura média por elemento dos dados simulados (DART): ")) 
#For example: charts/portugues/analise_exploratoria/estatisticas_descritivas/elementos_assinaturas_medias_dart

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
datasets_names = ["USGS", "USGS (100% cover percent)"]

traces = [
             [
                 [usgs_subdatasets['plastic_bags'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['plastic_bottles'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['plastic_mix'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['plastic_mesh'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['water'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['coast'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['wood'][feature].mean() for feature in feature_names]
             ],
             [
                 [usgs_subdatasets['plastic_mesh'].query('Cover_percent == -100')[feature].mean() for feature in feature_names],
                 [usgs_subdatasets['water'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['coast'][feature].mean() for feature in feature_names],
                 [usgs_subdatasets['wood_-100'][feature].mean() for feature in feature_names]
             ]
          ]

labels = [[feature_names, feature_names, feature_names, feature_names, feature_names, feature_names, feature_names],
          [feature_names, feature_names, feature_names, feature_names]]

legends = [['Bags (USGS)', 'Bottles (USGS)', 'Bags and Bottles (USGS)', 'HDPE mesh (USGS)', 'Water (USGS)', 'Coast (USGS)', 'Wood (USGS)'],
           ['HDPE mesh (USGS 100%)', 'Water (USGS)', 'Coast (USGS)', 'Wood (USGS 100%)']] #De bags e bottles tinha poucos pixels mistos com mais de 100% de cobertura

modes = [['dot', 'dot', 'dot', 'dot', 'dot', 'dot', 'dot'],
         ['lines', 'lines', 'lines', 'lines']]

colors = [['#49C658','#8945AB', '#FF675F', '#FCFE5E', '#1E90FF', '#FFD700', '#82431d'],
          ['#FCFE5E', '#1E90FF', '#FFD700', '#82431d']]

chart_title = "USGS elements statistics - Mean spectral signatures"
x_title = "Band"
y_title = "Reflectance"
height = 500
width = 800
guidance = "horizontal"

export_name = str(input("Path for chart of mean signatures of USGS elements: ")) 
#For example: charts/english/exploratory_analysis/descriptive_statistics/elements_mean_signatures_usgs

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)

In [None]:
datasets_names = ["USGS", "USGS (100% de cobertura)"]

legends = [['Sacolas (USGS)', 'Garrafas (USGS)', 'Sacolas e garrafas (USGS)', 'Malha de HDPE (USGS)', 'Água (USGS)', 'Costa (USGS)', 'Madeira (USGS)'],
           ['Malha de HDPE (USGS 100%)', 'Água (USGS)', 'Costa (USGS)', 'Madeira (USGS 100%)']]

chart_title = "Estatísticas por elemento no USGS - Assinaturas espectrais médias"
x_title = "Banda"
y_title = "Reflectância"
width = 850

export_name = str(input("Caminhos para gráficos de assinatura média por elemento dos dados observados (USGS): ")) 
#For example: charts/portugues/analise_exploratoria/estatisticas_descritivas/elementos_assinaturas_medias_usgs

rsdata_charts.line_chart(datasets_names, traces, labels, legends, modes, colors, chart_title, x_title, y_title, height, width, legend_orientation = "v", guidance=guidance, export_name=export_name)