### Problem 4
In this question you will import data from the CSV file created in the Setup section above.

### Problem 4
This data represents the condition of an electrical mini-substation based on readings from voltage, current and temperature sensor readings. A condition of ‘0’ represents a properly functioning device, while a condition of ‘1’ represents failure.

### Problem 4
#### a) Which is worse for this use case, a false positive or a false negative? What value of β would be suitable for an Fβ score?

Let's take positive (a condition of "1") as a failure in device and negative (a condition of "0") as a properly functioning device.
For this use case, a false negative is worse.

A false positive will trigger the alert to the engineers. They will go and check but only to find out the electrical mini-substation is functioning properly. Although resources are used to do this check, it is still fine.

A false negative will cover up a failure in device and let the engineers think that the device is functioning properly. They cannot fix the failure in time. As a result, the electrical mini-substation may fail entirely and the loss can be huge.

Recall measures how many of positive points were correctly predicted by the model. Recall is low when the model creates many false negatives. Recall is a useful metric in models in which we don’t want many false negatives. Since higher values of β give recall more importance, and lower values of β give precision more importance, we would take a higher value of β. A value of 10 for β would be suitable for an Fβ score.

### Problem 4
#### b) Load the CSV file into an SFrame named data. Print the SFrame. Split the data into training/validation/testing sets using 80%/10%/10% respectively.

In [14]:
# Importing packages
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
import turicreate as tc
import seaborn as sns
sns.set_theme(style="darkgrid")

In [15]:
data = tc.SFrame.read_csv("0380915_data.csv")

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,float,float,float]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


In [16]:
data

Condition,Voltage,Current,Temperature
1,25.666912748577367,443.1776787138455,39.04298008669625
1,24.93619269601364,443.7752058654272,36.615609013862766
1,26.431378686661077,440.7051966113706,59.53153475566279
1,25.75551597941211,444.45465600218176,42.75831261815376
1,25.97123695471139,442.8046424128778,37.65621788818368
1,25.88350340890212,444.5814752212545,39.4083109258523
0,26.94535437487196,442.3878725340347,45.120200601609525
1,26.490910090226947,444.4899725301915,40.08125414679967
1,25.279176510841378,442.3930666295656,38.47134221045628
1,26.083107828134683,441.3197317953561,52.22772264719654


In [17]:
data.show()

In [18]:
train_data, test_validate_data = data.random_split(.8, seed=0, exact=True)

In [19]:
test_data, validate_data = test_validate_data.random_split(.5, seed=0, exact=True)

In [20]:
# check the length of training set
print("Length of training set:", len(train_data))
print("Length of validation set:", len(validate_data))
print("Length of testing set:", len(test_data))

Length of training set: 800
Length of validation set: 100
Length of testing set: 100


### Problem 4
#### c) Is feature rescaling turned on by default for the function turicreate.logistic_classifier.create? What scale are the coefficients given in?

According to the Turi Create User Guide, feature rescaling is turned on by default. [feature_rescaling=True]

The coefficients are given in original scale of the problem. 

https://apple.github.io/turicreate/docs/userguide/supervised-learning/linear-regression.html#feature-rescaling

### Problem 4
#### d) Create perceptrons using Turicreate to classify data with ‘Condition’ as the target. Be sure to use your validation set in the training. Experiment with different values of hyperparameters to develop two different models.

In [31]:
def perceptrons(l2_penalty=0.01, 
                l1_penalty=0, 
                solver="auto", 
                feature_rescaling=True, 
                convergence_threshold=0.01, 
                max_iterations=20,
                class_weights="auto"):
    perceptron = tc.logistic_classifier.create(
        train_data, target="Condition", 
        l2_penalty=l2_penalty, 
        l1_penalty=l1_penalty, 
        solver=solver, 
        feature_rescaling=feature_rescaling,
        convergence_threshold=convergence_threshold,
        max_iterations=max_iterations,
        class_weights=class_weights,
        validation_set=validate_data, verbose=True, seed=0
        )

In [46]:
model_dict = {
    1: {
        "l2_penalty" : 0.01,
        "l1_penalty" : 0,
        "solver" : "auto",
        "feature_rescaling" : True,
        "convergence_threshold" : 0.01, 
        "max_iterations" : 20,
        "class_weights" : "auto"
    },
    2: {
        "l2_penalty" : 0.1,
        "l1_penalty" : 0.1,
        "solver" : "auto",
        "feature_rescaling" : True,
        "convergence_threshold" : 0.01, 
        "max_iterations" : 200,
        "class_weights" : "auto"
    }
    
}

In [47]:
for i in range(len(model_dict)):
    print(model_dict[i+1])
    print(model_dict[i+1]["l2_penalty"])
    perceptrons(l2_penalty=model_dict[i+1]["l2_penalty"],
                l1_penalty=model_dict[i+1]["l1_penalty"], 
                solver=model_dict[i+1]["solver"], 
                feature_rescaling=model_dict[i+1]["feature_rescaling"],
                convergence_threshold=model_dict[i+1]["convergence_threshold"],
                max_iterations=model_dict[i+1]["max_iterations"],
                class_weights=model_dict[i+1]["class_weights"]
                )    

{'l2_penalty': 0.01, 'l1_penalty': 0, 'solver': 'auto', 'feature_rescaling': True, 'convergence_threshold': 0.01, 'max_iterations': 20, 'class_weights': 'auto'}
0.01


{'l2_penalty': 0.1, 'l1_penalty': 0.1, 'solver': 'auto', 'feature_rescaling': True, 'convergence_threshold': 0.01, 'max_iterations': 200, 'class_weights': 'auto'}
0.1


In [29]:

    
perceptrons(l2_penalty=0.01, l1_penalty=0, 
            solver="auto", feature_rescaling=True, 
            convergence_threshold=0.01, 
            max_iterations=20)

In [21]:
def perceptrons ():
    perceptron = tc.logistic_classifier.create(
        train_data, target="Condition", validation_set=validate_data, verbose=True, seed=0
        l2_penalty=0.01, 
        l1_penalty=0, 
        solver="auto", 
        feature_rescaling=True,
        convergence_threshold=0.01,
        max_iterations=20,
        )
    
    
    
perceptron_01 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.01, l1_penalty=0, 
    solver="auto", feature_rescaling=True,
    convergence_threshold=0.01,
    max_iterations=20,
    validation_set=validate_data, verbose=True, seed=0)
result_01 = perceptron_01.evaluate(test_data)
recall_prediction = result_01.get('recall')
precision_prediction = result_01.get('precision')
TP = result['confusion_matrix'][1]
TP = int(TP['count'])
TN = result_01['confusion_matrix'][2]
TN = int(TN['count'])
FN = result_01['confusion_matrix'][3]
FN = int(FN['count'])
FP = result_01['confusion_matrix'][0]
FP = int(FP ['count'])
sensitivity=TP/(TP+FN)
specificity=TN/(TN+FP)
print('Recall of model 01:', recall_prediction,
      '\nPrecision of model 01', precision_prediction,
      '\nSensitivity of model 01', sensitivity,
      '\nSpecificity of model 01', specificity)

NameError: name 'result' is not defined

In [None]:
from prettytable import PrettyTable as pp t = pp(['Name', 'Accuracy', 'AUC', 'recall_prediction', 'Precision Prediction', 'Sensitivity', 'Specificity'])
t.add_row(['Perceptron 01', accuracy, auc,recall_prediction ,precision_prediction, sensitivity, specificity])
t.add_row(['Perceptron 01', accuracy_02, auc_02 ,recall_prediction_02,precision_prediction_02, sensitivity_02, specificity_02]) print(t)

In [None]:
sns.relplot(data=pd.DataFrame(data), x="Temperature", y="Current", hue="Condition")

In [None]:
tc.visualization.scatter(tc.SArray(data["Temperature"]), tc.SArray(data["Current"]))

In [None]:
tc.visualization.scatter(tc.SArray(data["Temperature"]), tc.SArray(data["Voltage"]))

In [None]:
tc.visualization.scatter(tc.SArray(data["Current"]), tc.SArray(data["Voltage"]))

In [None]:
tc.visualization.box_plot(tc.SArray(data["Condition"], str), tc.SArray(data["Temperature"]))

In [None]:
perceptron_02 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.01, l1_penalty=0, 
    solver="auto", feature_rescaling=True,
    convergence_threshold=0.01,
    max_iterations=20, class_weights="auto",
    validation_set=validate_data, verbose=True, seed=0)
perceptron_02.evaluate(test_data)

In [None]:
perceptron_03 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.01, l1_penalty=0, 
    solver="auto", feature_rescaling=True,
    convergence_threshold=0.01,
    max_iterations=20, class_weights=None,
    validation_set=validate_data, verbose=True, seed=0)
perceptron_03.evaluate(test_data)

In [None]:
perceptron_04 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.01, l1_penalty=0, 
    solver="auto", feature_rescaling=False,
    convergence_threshold=0.01,
    max_iterations=20,
    validation_set=validate_data, verbose=True, seed=0)
perceptron_04.evaluate(test_data)

In [None]:
perceptron_05 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.01, l1_penalty=0, 
    solver="auto", feature_rescaling=True,
    convergence_threshold=0.02,
    max_iterations=20,
    validation_set=validate_data, verbose=True, seed=0)
perceptron_05.evaluate(test_data)

In [None]:
perceptron_06 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.01, l1_penalty=0.01, 
    solver="auto", feature_rescaling=True,
    convergence_threshold=0.01,
    max_iterations=1000,
    validation_set=validate_data, verbose=True, seed=0)
perceptron_06.evaluate(test_data)

In [None]:
perceptron_07 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.01, l1_penalty=0, 
    solver="auto", feature_rescaling=True,
    convergence_threshold=1,
    max_iterations=20,
    validation_set=validate_data, verbose=True, seed=0)
perceptron_07.evaluate(test_data)

In [None]:
perceptron_08 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.01, l1_penalty=0, 
    solver="fista", feature_rescaling=True,
    convergence_threshold=0.01,
    step_size=0.1,
    max_iterations=200,
    validation_set=validate_data, verbose=True, seed=0)
perceptron_08.evaluate(test_data)

In [None]:
perceptron_01.coefficients

In [None]:
results = perceptron_01.evaluate(test_data)

In [None]:
results

In [None]:
print("Accuracy         : %s" % results['accuracy'])
print("Confusion Matrix : \n%s" % results['confusion_matrix'])


In [None]:
# calculate recall, precision, sensitivity and specificity on the testing set predictions_02 = perceptron_02.predict(test_data)
result = perceptron_01.evaluate(test_data)

In [None]:
recall_prediction = result.get('recall')

In [None]:
precision_prediction = result.get('precision')

In [None]:
TP = result['confusion_matrix'][1]

In [None]:
TP = int(TP['count'])

In [None]:
TN = results['confusion_matrix'][2]

In [None]:
TN = int(TN['count'])

In [None]:
FN = results['confusion_matrix'][3]

In [None]:
FN = int(FN['count'])

In [None]:
FP = results['confusion_matrix'][0]

In [None]:
FP = int(FP ['count'])

In [None]:
sensitivity=TP/(TP+FN)

In [None]:
specificity=TN/(TN+FP)

In [None]:
print('Recall of model 2:', recall_prediction,
      '\nPrecision of model 2', precision_prediction,
      '\nSensitivity of model 2', sensitivity,
      '\nSpecificity of model 2', specificity)

In [None]:
predictions = perceptron_01.predict(test_data)

In [None]:
# Compute boolean filters
false_positive_filter = (predictions == 1) & (test_data[perceptron_01.target] == 0)
false_negative_filter = (predictions == 0) & (test_data[perceptron_01.target] == 1)

false_negatives = test_data[false_negative_filter]
false_positives = test_data[false_positive_filter]

In [None]:
perceptron_02 = tc.logistic_classifier.create(
    train_data, target="Condition", 
    l2_penalty=0.1, l1_penalty=0.1, 
    validation_set=validate_data, seed=0)

In [None]:
perceptron_02

In [None]:
perceptron_02.evaluate(test_data)

### Problem 4
#### e) For each model:
##### i) find predictions using the testing set;
##### ii) display the training/validation/testing accuracy;
##### iii) display the confusion matrix on the testing set;
##### iv) calculate recall, precision, sensitivity and specificity on the testing set;
##### v) calculate the Fβ score on the testing set using the value of β you chose above.

### Problem 4
#### f) Select which of your two models is the best (or declare a tie) and justify your choice.

[draft] An excellent model has AUC near to the 1 which means it has a good measure of separability.