## Implementação de Naive Bayes

O algoritmo de Naive Bayes é baseado em probabilidade condicional. Este é um dos mais simples algoritmos de classificação. Inicialmente, deve-se separar todas as classes, e então para cada _feature_, é necessário encontrar a probabilidade para seus valores.
### Definir as configurações

In [82]:
import numpy as np
import pandas as pd
import math


diabetes = pd.read_csv("./diabetes.csv")

target = 'Outcome'
features = diabetes.columns[diabetes.columns != target]
classes = diabetes[target].unique()
data_length = len(diabetes.index)

k = 10
num_examples = math.floor(data_length/k)    

In [None]:
import numpy as np
import pandas as pd
import math


wine = pd.read_csv("./wine.csv")

target = 'Alcohol'
features = wine.columns[wine.columns != target]
classes = wine[target].unique()
data_length = len(wine.index)

k = 10
num_examples = math.floor(data_length/k)    

### Construir os Dataset

In [83]:
train = []
test = []

for i in range(0,k):
    test.append(diabetes[i*num_examples : (i+1)*num_examples])
    train.append(diabetes.drop(test[i].index))

### Cálculo das Probabilidades

Aqui será caluclada as probabilidades de cada classe, e serão armazenadas em uma estrutura de dicionário 

```
dict: 
  keys: classe
  values: dict: 
        keys: atributo
        values: dict:
              keys: valor
              values: probabilidade do valor
```

Deste modo, a probabilidade de cada classe poderá ser facilmente acessada.

In [80]:
probs = []
probcl = []

for i in range(0,k):
    probs.append({})
    probcl.append({})
    
    for x in classes:
        diabetescl = diabetes[diabetes[target]==x][features]
        clsp = {}
        tot = len(diabetescl)
        for col in diabetescl.columns:
            colp = {}
            for val,cnt in diabetescl[col].value_counts().iteritems():
                pr = cnt/tot
                colp[val] = pr
            clsp[col] = colp
        probs[i][x] = clsp
        probcl[i][x] = len(diabetescl)/len(diabetes)
        
def probabs(i, x):
    # i - Iteração K-Fold
    # x - DataFrame Pandas com índices
    
    if not isinstance(x,pd.Series):
        raise IOError("Arg must of type Series")
        
    probab = {}
    for cl in classes:
        pr = probcl[i][cl]
        for col,val in x.iteritems():
            try:
                pr *= probs[i][cl][col][val]
            except KeyError:
                pr = 0
        probab[cl] = pr
    return probab

def classify(i, x):
    # i - Iteração K-Fold
    # x - DataFrame Pandas com índices
    
    probab = probabs(i, x)
    mx = 0
    mxcl = ''
    for cl,pr in probab.items():
        if pr > mx:
            mx = pr
            mxcl = cl
            
    return mxcl

### Treinamento

In [81]:
b = []

for l in range(0,k):
    for j in train[l].index:
        # print(classify(l, diabetes.loc[j,features]), diabetes.loc[j])
        # print('')
        b.append([])
        b[l].append(classify(l, train[l].loc[j,features]) == train[l].loc[j,target])
    
    print(sum(b[l]),"corretos de ",len(train[l]))
    print("Precisão:", sum(b[l])/len(train[l]))
    print('')

677 correct of 692
Accuracy: 0.9783236994219653

678 correct of 692
Accuracy: 0.9797687861271677

677 correct of 692
Accuracy: 0.9783236994219653

678 correct of 692
Accuracy: 0.9797687861271677

677 correct of 692
Accuracy: 0.9783236994219653

678 correct of 692
Accuracy: 0.9797687861271677

677 correct of 692
Accuracy: 0.9783236994219653

678 correct of 692
Accuracy: 0.9797687861271677

677 correct of 692
Accuracy: 0.9783236994219653

679 correct of 692
Accuracy: 0.9812138728323699



### Teste

In [84]:
a = []

for l in range(0,k):
    for j in test[l].index:
        # print(classify(l, diabetes.loc[j,features]), diabetes.loc[j])
        # print('')
        a.append([])
        a[l].append(classify(l, test[l].loc[j,features]) == test[l].loc[j,target])
    
    print(sum(a[l]),"corretos de",len(test[l]))
    print("Precisão:", sum(a[l])/len(test[l]))
    print('')

75 correct of 76
Accuracy: 0.9868421052631579

74 correct of 76
Accuracy: 0.9736842105263158

75 correct of 76
Accuracy: 0.9868421052631579

74 correct of 76
Accuracy: 0.9736842105263158

75 correct of 76
Accuracy: 0.9868421052631579

74 correct of 76
Accuracy: 0.9736842105263158

75 correct of 76
Accuracy: 0.9868421052631579

74 correct of 76
Accuracy: 0.9736842105263158

75 correct of 76
Accuracy: 0.9868421052631579

73 correct of 76
Accuracy: 0.9605263157894737

