# CIS 678 Machine Learning: Naive Bayes Intro
Tyler Reed

NOTE: To view all output, use nbviewer link below.
https://nbviewer.org/github/treed8887/ML/blob/main/CIS678-Project2-Part1.ipynb

### Overview
--------------
A collection of labeled fishing data describing ideal conditions to fish on a particular day is provided to create a basic classifier. The Naive-Bayes algorithm is implemented.

In [506]:
import numpy as np
import pandas as pd
df = pd.read_csv('/Users/Study/Downloads/CIS678/fishing-revised.data', 
                 sep=" ", 
                 names=('Ideal', 'Wind', 'AirTemp', 'Water', 'Sky'))
print(df)

   Ideal    Wind  AirTemp     Water     Sky
0    Yes  Strong  WarmAir      Warm   Sunny
1     No    Weak  WarmAir      Warm   Sunny
2    Yes  Strong  WarmAir      Warm  Cloudy
3    Yes  Strong  WarmAir  Moderate   Rainy
4     No  Strong  ColdAir      Cold  Cloudy
5     No    Weak  ColdAir      Cold   Rainy
6     No    Weak  ColdAir      Cold   Sunny
7    Yes  Strong  WarmAir  Moderate   Sunny
8    Yes  Strong  ColdAir      Cold   Sunny
9     No  Strong  ColdAir  Moderate   Rainy
10   Yes    Weak  ColdAir  Moderate   Sunny
11   Yes    Weak  WarmAir  Moderate   Sunny
12   Yes  Strong  ColdAir      Warm   Sunny
13    No    Weak  WarmAir  Moderate   Rainy


### Learning Probabilities
<ins>Simplifying Assumption</ins>: all instances are independent of each other.

The following function takes the first column (assumed to be class variable) and two strings, one for each class. It determines the prior probabilities.

In [507]:
# Estimate probability of each class for k=2 problem
def classp(np, cval1, cval2):
    count_cval1 = 0
    count_cval2 = 0
    for clss1 in np:
        if clss1 == cval1:
            count_cval1+=1
        else:
            count_cval2+=1
    pcval1 = count_cval1 / len(np)
    pcval2 = count_cval2 / len(np)
    fpcval1 = "{:.2f}".format(pcval1)
    fpcval2 = "{:.2f}".format(pcval2)
    return fpcval1, fpcval2

class_prob = classp(df["Ideal"], "Yes", "No")

#### Conditional Probabilities
Next are two functions to determine the conditional probabilities of each attribute, one conditional probability for each class.

In [508]:
import collections as col
def cond_proby(df, cval1):
    # separate classes
    df_y = df[df.iloc[:,0] == cval1]
    # convert to dictionary and remove class variable
    ydict = df_y.to_dict()
    del ydict["Ideal"]
    ndict = df_n.to_dict()
    del ndict["Ideal"]
    # cond probabilities for 'Yes' class
    i=-1
    pr_ydict = dict.fromkeys(ydict.keys(), 0)
    for key in ydict.values():
        i = i+1
        jval = col.Counter(key.values())
        count_y = np.array(list(jval.keys()))
        values_y = np.array(list(jval.values()))
        probs_y = values_y / 8
        val_ydict = dict(zip(count_y, probs_y))
        if i<len(pr_ydict):
            pr_ydict[list(ydict.keys())[i]] = val_ydict
    return pr_ydict

In [509]:
def cond_probn(df, cval2):
    # separate classes
    df_n = df[df.iloc[:,0] == cval2]
    # convert to dictionary and remove class variable
    ndict = df_n.to_dict()
    del ndict["Ideal"]
    # cond probabilities for 'No' class
    i=-1
    pr_ndict = dict.fromkeys(ndict.keys(), 0)
    for key in ndict.values():
        i = i+1
        jval = col.Counter(key.values())
        count_n = np.array(list(jval.keys()))
        values_n = np.array(list(jval.values()))
        probs_n = values_n / 6
        val_ndict = dict(zip(count_n, probs_n))
        if i<len(pr_ndict):
            pr_ndict[list(ndict.keys())[i]] = val_ndict
    return pr_ndict

#### Naive Bayes Classifier
Below is a Naive Bayes classifier function incorporating the output of the other functions above. It includes an input segment which has the user provide the new instance to be classified. 

In [510]:
def nb_class():
    new_inst = input("Enter new instance: ")
    # place input into list 
    l = new_inst.split(" ")
    print("\n")

    print("Prior Probabilities\n" + "P(Yes): " + str(class_prob[0]) + "\n" + "P(No): " + str(class_prob[1]))
    print("\n")
    # compute conditional probability of instance for class "Yes"
    cy = cond_proby(df, "Yes")
    print("Conditional probabilities")
    i = -1
    CnbY = 1
    for val in cy.values():
        i = i+1
        print(val[l[i]])
        CnbY = CnbY*val[l[i]]

    # compute conditional probability of instance for class "No"
    cn = cond_probn(df, "No")
    i = -1
    CnbN = 1
    for val in cn.values():
        i = i+1
        CnbN = CnbN*val[l[i]]
    # format probabilites to 3 decimal places
    fcY = "{:.3f}".format(CnbY*float(class_prob[0]))
    fcN = "{:.3f}".format(CnbN*float(class_prob[1]))
    print("\n")
    print("Class probabilities: \n" + "Yes: " + fcY + "\n" + "No: " + fcN)
    print("\n")
    
    # determine class based on which conditional class probability is higher
    if CnbY*float(class_prob[0]) > CnbN*float(class_prob[1]):
        print("Classify: Yes")
    else:
        print("Classify: No")
    return

In [511]:
nb_class()

Enter new instance: Strong WarmAir Cold Sunny


Prior Probabilities
P(Yes): 0.57
P(No): 0.43


Conditional probabilities
0.75
0.625
0.125
0.75


Class probabilities: 
Yes: 0.025
No: 0.008


Classify: Yes


In [512]:
nb_class()

Enter new instance: Weak ColdAir Cold Sunny


Prior Probabilities
P(Yes): 0.57
P(No): 0.43


Conditional probabilities
0.25
0.375
0.125
0.75


Class probabilities: 
Yes: 0.005
No: 0.032


Classify: No


In [513]:
nb_class()

Enter new instance: Strong WarmAir Moderate Cloudy


Prior Probabilities
P(Yes): 0.57
P(No): 0.43


Conditional probabilities
0.75
0.625
0.5
0.125


Class probabilities: 
Yes: 0.017
No: 0.003


Classify: Yes


### Conclusion
----------------
Overall, the classifier algorithm performs as expected when measuring against the tutorial. The prior probabilities function could have been built with a dictionary instead of using a pandas DataFrame and all functions could have probably been simpler with more experience using Python and dictionaries. 