<a href="https://colab.research.google.com/github/valerio-unifei/ecom01/blob/main/ECOM01-DS03-ArgumentosValidos.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Base de Dados (Dataset)

http://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset

Sete tipos diferentes de **feijões secos** foram utilizados nesta pesquisa, levando em consideração as características como forma, formato, tipo e estrutura pela situação do mercado. Um sistema de visão computacional foi desenvolvido para distinguir sete diferentes variedades registradas de feijões secos com características semelhantes, a fim de obter uma classificação uniforme das sementes. Para o modelo de classificação, imagens de 13.611 grãos de 7 feijões diferentes registrados foram obtidas com uma câmera de alta resolução. As imagens de feijão obtidas pelo sistema de visão computacional foram submetidas às etapas de segmentação e extração de características, totalizando 16 características; 12 dimensões e 4 formas de forma, foram obtidas a partir dos grãos.

In [None]:
# Baixa o arquivo da web
!wget http://archive.ics.uci.edu/ml/machine-learning-databases/00602/DryBeanDataset.zip -O DryBeanDataset.zip
# extrai arquivos do zip
!unzip -q DryBeanDataset.zip

In [None]:
import pandas as pd
# carrega arquivo excel existente com base de dados de feijões secos
df = pd.read_excel('/content/DryBeanDataset/Dry_Bean_Dataset.xlsx')
df

In [None]:
import seaborn as sns
sns.set(rc={'figure.figsize':(12,7)})

_ = sns.histplot(data=df, x='Area', kde=True, hue='Class')

In [None]:
sns.kdeplot(data=df, x='MajorAxisLength',y='MinorAxisLength',hue='Class')

In [None]:
sns.jointplot(data=df, x='MajorAxisLength',y='MinorAxisLength',hue='Class')

# Extrator de Conhecimento

In [None]:
X = df[df.columns[:-1]].values
y, class_names = pd.factorize(df[df.columns[-1]])
columns_names = list(df.columns)

Gerando Árvore de Decisão

In [None]:
from sklearn import tree

clf = tree.DecisionTreeClassifier(max_depth=5)
clf.fit(X,y)
print('Score =',clf.score(X,y))

In [None]:
from sklearn.tree import _tree
import numpy as np

def tree_to_code(tree, columns_names, class_names):
    tree_ = tree.tree_
    undef = _tree.TREE_UNDEFINED
    feature_name = [columns_names[i] if i != undef else "indefinido!" for i in tree_.feature]
    feature_names = [f.replace(" ", "_") for f in columns_names]

    def recurse(node, depth):
        if tree_.feature[node] != undef:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            print(f'({name} <= {threshold}), ',end='')
            recurse(tree_.children_left[node], depth + 1)
            print(f'({name} > {threshold}), ',end='')
            recurse(tree_.children_right[node], depth + 1)
        else:
            value_name = class_names[np.argmax(tree_.value[node])]
            rows_number = int(sum(tree_.value[node][0]))
            print(f'∴ ({columns_names[-1]} == {value_name})\n')
            print(f'amostras: {rows_number}\n\n')
            
    recurse(0, 1)

tree_to_code(clf,columns_names,list(class_names))

# Argumentos Válidos

Verificar se os argumentos abaixo são válidos

```
(MajorAxisLength <= 280.7041931152344), (ShapeFactor1 <= 0.006819602334871888), (ShapeFactor3 <= 0.7274537682533264), 
(roundness <= 0.9265730679035187), (Perimeter <= 745.8924865722656) ∴ (Class == DERMASON)

amostras: 43

```





```
(roundness > 0.9265730679035187), (ShapeFactor4 <= 0.9987176060676575), ∴ (Class == DERMASON)

amostras: 48
```





```
(Perimeter > 745.8924865722656) ∴ (Class == SIRA)

amostras: 131
```





```
(Compactness > 0.8845791220664978), (ShapeFactor1 <= 0.007053160108625889) ∴ (Class == SEKER)

amostras: 63
```



## Demais condições



```
(ShapeFactor3 > 0.7274537682533264), (Compactness <= 0.8695324659347534), 
(ShapeFactor4 <= 0.9966754019260406) ∴ (Class == SIRA)

amostras: 27


(ShapeFactor4 > 0.9966754019260406) ∴ (Class == SEKER)

amostras: 142


(Compactness > 0.8695324659347534), (Extent <= 0.8283936977386475) ∴ (Class == SEKER)

amostras: 1628


(Extent > 0.8283936977386475) ∴ (Class == DERMASON)

amostras: 1


(ShapeFactor1 > 0.006819602334871888), (Perimeter <= 711.3135070800781), (Compactness <= 0.8845791220664978), 
(ShapeFactor1 <= 0.007065028417855501) ∴ (Class == DERMASON)

amostras: 95


(ShapeFactor1 > 0.007065028417855501) ∴ (Class == DERMASON)

amostras: 2797


(Compactness > 0.8845791220664978), (ShapeFactor1 <= 0.007053160108625889) ∴ (Class == SEKER)

amostras: 63


(ShapeFactor1 > 0.007053160108625889) ∴ (Class == DERMASON)

amostras: 22


(Perimeter > 711.3135070800781), (Perimeter <= 739.5480041503906), 
(roundness <= 0.9030214250087738) ∴ (Class == DERMASON)

amostras: 247


(roundness > 0.9030214250087738) ∴ (Class == DERMASON)

amostras: 304


(Perimeter > 739.5480041503906), (MinorAxisLength <= 175.9747085571289) ∴ (Class == DERMASON)

amostras: 16


(MinorAxisLength > 175.9747085571289) ∴ (Class == SIRA)

amostras: 163


(MajorAxisLength > 280.7041931152344), (ShapeFactor3 <= 0.5304427742958069), (MinorAxisLength <= 215.31133270263672), 
(ShapeFactor1 <= 0.006129891611635685), (Extent <= 0.793764054775238) ∴ (Class == HOROZ)

amostras: 15


(Extent > 0.793764054775238) ∴ (Class == CALI)

amostras: 4


(ShapeFactor1 > 0.006129891611635685), (Eccentricity <= 0.8643796145915985) ∴ (Class == HOROZ)

amostras: 495


(Eccentricity > 0.8643796145915985) ∴ (Class == HOROZ)

amostras: 1225


(MinorAxisLength > 215.31133270263672), (MajorAxisLength <= 583.6639404296875), 
(roundness <= 0.7935212254524231) ∴ (Class == HOROZ)

amostras: 20


(roundness > 0.7935212254524231) ∴ (Class == CALI)

amostras: 89


(MajorAxisLength > 583.6639404296875) ∴ (Class == BOMBAY)

amostras: 7


(ShapeFactor3 > 0.5304427742958069), (Perimeter <= 897.3164978027344), (roundness <= 0.9222025275230408), 
(Perimeter <= 764.1730041503906) ∴ (Class == SIRA)

amostras: 322


(Perimeter > 764.1730041503906) ∴ (Class == SIRA)

amostras: 2169


(roundness > 0.9222025275230408), (Area <= 45473.0) ∴ (Class == SIRA)

amostras: 25


(Area > 45473.0) ∴ (Class == SEKER)

amostras: 92


(Perimeter > 897.3164978027344), (MinorAxisLength <= 313.8231506347656), 
(Compactness <= 0.7847835123538971) ∴ (Class == CALI)

amostras: 1855


(Compactness > 0.7847835123538971) ∴ (Class == BARBUNYA)

amostras: 1035


(MinorAxisLength > 313.8231506347656), (roundness <= 0.7931346297264099) ∴ (Class == BARBUNYA)

amostras: 1


(roundness > 0.7931346297264099) ∴ (Class == BOMBAY)

amostras: 515


```

