# Normalising Data
We want to feature scale our data to values between 0 and 1

To go into util.py:

In [1]:
'''# assume all rows are of the same length
# and feature scale each column to be in the range 0 to 1
def normalize_by_feature_scaling(dataset: List[List[float]]) -> None:
    for col_num in range(len(dataset[0])):
        column: List[float] = [row[col_num] for row in dataset]
        maximum = max(column)
        minimum = min(column)
        for row_num in range(len(dataset)):
            dataset[row_num][col_num] = (dataset[row_num][col_num] - minimum) / (
                maximum - minimum
            )
'''

# The classic iris data set

- Our neural network will have 3 output neurons, with each representing one possible species. e.g. [0.9, 0.3, 0.1] will represent the classification of iris-setosa, because the first neuron represents that species, and it is the largest number.

- The values in iris_classifications will be used to calculate the errors of the neural network outputs after each training step.

__" Warning: The lack of error-checking code makes this code fairly dangerous. It is not suitable for production, but it is fine for testing."__

Reading the data:

In [2]:
import csv
from typing import List
from random import shuffle

# from .py files in the same folder
from util import normalize_by_feature_scaling
from network import Network

# initialised some lists to be filled
iris_parameters: List[List[float]] = []
iris_classifications: List[List[float]] = []
iris_species: List[str] = []

with open('iris.csv', mode = 'r') as iris_file:
    irises: List = list(csv.reader(iris_file))
    shuffle(irises) # get our lines of data in random order
    for iris in irises:
        # the first columns are the sepal length, sepal width, petal length, and petal width
        parameters: List[float] = [float(n) for n in iris[0:4]]
        iris_parameters.append(parameters)
        # record the species classification
        species: str = iris[4]
        if species == 'Iris-setosa':
            iris_classifications.append([1.0, 0.0, 0.0])
        elif species == 'Iris-versicolor':
            iris_classifications.append([0.0, 1.0, 0.0])
        else:
            iris_classifications.append([0.0, 0.0, 1.0])
        iris_species.append(species)

normalize_by_feature_scaling(iris_parameters)

Defining the actual network:
- 4 neurons for the input layer - 4 attributes : sepal length, sepal width, petal length, and petal width
- 0.3 learning rate
- 3 output neurons representing 3 species
-  The number of neurons in the hidden layer and the learning rate can be experimented with to improve accuracy/precision

In [3]:
iris_network: Network = Network([4, 6, 3], 0.3)

Output intepreting function ( for using the validate function in network.py):

In [4]:
def iris_interpret_output(output: List[float]) -> str:
    if max(output) == output[0]:
        return 'Iris-setosa'
    elif max(output) == output[1]:
        return 'Iris-versicolor'
    else:
        return 'Iris-virginica'

Train the first 140 irises out of the 150 irises 50 times.

In [5]:
# train over the first 140 irises in the data set 50 times
iris_trainers: List[List[float]] = iris_parameters[0:140]
iris_trainers_corrects: List[List[float]] = iris_classifications[0:140]
for _ in range(50):
    iris_network.train(iris_trainers, iris_trainers_corrects)

Testing the network over the last 10 irises:

In [6]:
# test over the last 10 of the irises in the data set:
iris_testers: List[List[float]] = iris_parameters[140:150]
iris_testers_corrects: List[str] = iris_species[140:150]
iris_results = iris_network.validate(iris_testers, iris_testers_corrects,
                                    iris_interpret_output)

# print result
print(f'{iris_results[0]} correct of {iris_results[1]}\
= {iris_results[2] * 100}%')

8 correct of 10= 80.0%


# The Wine Dataset

- In this dataset, there are 13 inputs/attributes and 3 outputs (cultivars of wine).

- Interestingly, the network works well with fewer neurons in the hidden layer than in the input layer. It is possible that some of the inputs are not very useful for the purpose of predicting the cultivar. Note that this is not exactly how having fewer neurons in a hidden layer work though.

In [7]:
wine_parameters: List[List[float]] = []
wine_classifications: List[List[float]] = []
wine_species: List[int] = []
    
with open('wine.csv', mode='r') as wine_file:
    wines: List = list(csv.reader(wine_file, quoting=csv.QUOTE_NONNUMERIC))
    shuffle(wines) # get our lines of data in random order
    for wine in wines:
        # classification is in first column this time
        parameters: List[float] = [float(n) for n in wine[1:14]]
        wine_parameters.append(parameters)
        species: int = int(wine[0])
        if species == 1:
            wine_classifications.append([1.0, 0.0, 0.0])
        elif species == 2:
            wine_classifications.append([0.0, 1.0, 0.0])
        else:
            wine_classifications.append([0.0, 0.0, 1.0])
        wine_species.append(species)

normalize_by_feature_scaling(wine_parameters)

The network:

In [8]:
wine_network: Network = Network([13, 7, 3], 0.9)

Interpreting function:

In [9]:
def wine_interpret_output(output: List[float]) -> int:
    if max(output) == output[0]:
        return 1
    elif max(output) == output[1]:
        return 2
    else:
        return 3

Training over the first 150 wines 10 times:

In [10]:
# train over the first 150 wines 10 times
wine_trainers: List[List[float]] = wine_parameters[0:150]
wine_trainers_corrects: List[List[float]] = wine_classifications[0:150]

for _ in range(10):
    wine_network.train(wine_trainers, wine_trainers_corrects)

Test over the last 28 of the wines in the data set

In [11]:
# test over the last 28 of the wines in the data set
wine_testers: List[List[float]] = wine_parameters[150:178]
wine_testers_corrects: List[int] = wine_species[150:178]

wine_results = wine_network.validate(wine_testers, wine_testers_corrects, wine_interpret_output)

print(f'{wine_results[0]} correct of {wine_results[1]} = {wine_results[2] * 100}%')

26 correct of 28 = 92.85714285714286%
