# 03/25/22 Artificial Neural Network (ANN) in Machine Learning

ANN is for machine learning as well as pattern recognition, based on the model of a human neuron. The human brain consists of millions of neurons. 
A neural network is an oriented graph. It consists of nodes which in the biological analogy represent neurons, connected by arcs.

Both for classification and regression.

**Structure**

Neural network may contain the following 3 layers:

1. Input layer – The activity of the input units represents the raw information that can feed into the network. They are in the form of various texts, numbers, audio files, image pixels, etc.

2. Hidden layer – These hidden layers perform various types of mathematical computation on the input data and the weights on the connections between the input and the hidden units. There may be one or more hidden layers.

3. Output layer – The output values are the predictions of the response variable. 

**Characteristics**

Non Linearity: The mechanism followed in ANN for the generation of the input signal is nonlinear.

Supervised Learning: The input and output are mapped and the ANN is trained with the training dataset.

Unsupervised Learning: The target output is not given, so the ANN will learn on its own by discovering the features in the input patterns.

Adaptive Nature: The connection weights in the nodes of ANN are capable to adjust themselves to give the desired output.

**Applications:**

Handwritten Character Recognition, Speech Recognition, Signature Classification, Facial Recognition,...


For supervised learning such as classification, image recognition, the flow of information is from the input layer to the hidden layer and finally to the output. It is called FeedForward Artificial Neural Networks

By adding 1 or more hidden layers between the input and output layers and units in this layer the predictive power of neural network increases. But a number of hidden layers should be as small as possible. This ensures that neural network does not store all information from learning set but can generalize it to avoid overfitting.


## Single-Layer Feed-Forward Network, Multi-Layer Feed-Forward Network


# **Library**
class sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, alpha=0.0001, batch_size=’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10)

a) hidden_layer_sizes : tuple, length = n_layers - 2, default (100,). The ith element represents the number of neurons in the ith hidden layer.

b) activation : {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’

c) solver : {‘lbfgs’, ‘sgd’, ‘adam’}, default ‘adam’. The default solver ‘adam’ works pretty well on relatively large datasets 
(with thousands of training samples or more) in terms of both training time and 
validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.

d) alpha, L2 penalty (regularization term) parameter.

e) btach_size, When set to “auto”, batch_size=min(200, n_samples)

f) learning_rate : {‘constant’, ‘invscaling’, ‘adaptive’}, default ‘constant’, Only used when solver='sgd'.

g) verbose : bool, optional, default False, Whether to print progress messages to stdout.

h) validation_fraction.

.fit(X,y)

X : array-like or sparse matrix, shape (n_samples, m_features). The input data.

y : array-like, shape (n_samples,) or (n_samples, n_outputs). The target values (class labels in classification, real numbers in regression


In [None]:
import pandas as pd # for data manipulation
import numpy as np # for data manipulation
#%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
from sklearn.model_selection import train_test_split # for splitting the data into train and test samples
from sklearn.metrics import classification_report # for model evaluation metrics
from sklearn import metrics
from sklearn.preprocessing import OrdinalEncoder # for encoding categorical features from strings to number arrays
from sklearn.preprocessing import LabelEncoder#for encoding, converting catogroey variables
from sklearn.neural_network import MLPClassifier #Multi-layer Perceptron classifier.
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
from sklearn.metrics import classification_report, confusion_matrix #Import scikit-learn metrics module for accuracy calculation
import pandas as pd 
from sklearn.metrics import mean_squared_error

# **Keras ANN for deep learning**

In [None]:

from keras.models import Sequential
from keras.layers import Dense, LSTM, SimpleRNN, Flatten
from keras.layers.embeddings import Embedding
from keras.layers.convolutional import Conv1D, MaxPooling1D
from numpy import loadtxt
from keras.layers import Dense

## **Iris data Application using sklearn.neural_network import MLPClassifier**

In [None]:
df = sns.load_dataset('iris')
df.head()

In [None]:
sns.pairplot(data=df, hue = 'species')

In [None]:
#define the taget y and predictor X
target = df['species'] #target/response variable y
df1 = df.copy()
df1 = df1.drop('species', axis =1)

## Defining the attributes/predictors
X = df1

In [None]:
#label encoding
#
le = LabelEncoder()
target = le.fit_transform(target)
target

In [None]:
y=target
# Splitting the data - 80:20 ratio
X_train, X_test, y_train, y_test = train_test_split(X , y, test_size = 0.2, random_state = 42)
print("Training split input- ", X_train.shape)
print("Testing split input- ", X_test.shape)

Training split input-  (120, 4)
Testing split input-  (30, 4)


In [None]:

def MLP(X,y):
  result=list()
  X_train, X_test, y_train, y_test = train_test_split(X , y, test_size = 0.2, random_state = 42)
  alphas = np.linspace(0.01, 2, num=20)
  lvc_df = pd.DataFrame(alphas, columns=['c'])
  #lvc_df['model'] = lvc_df['c'].apply(lambda c: MLPClassifier(solver='lbfgs', alpha=c, hidden_layer_sizes=(5, 2)).fit(trX, trY))
  lvc_df['model'] = lvc_df['c'].apply(lambda c: MLPClassifier(solver='sgd', alpha=c, hidden_layer_sizes=(100, )).fit(X_train, y_train))

  lvc_df['score']=lvc_df['model'].apply(lambda model: model.score(X_test, y_test))
  filter=lvc_df['score']==max(lvc_df['score'])
  bestmodel=lvc_df[filter]['model'].values[0]

  train_score=bestmodel.score(X_train,y_train)
  test_score=bestmodel.score(X_test,y_test)
  return bestmodel, train_score, test_score, X_test,y_test
#predict_proba(X)

In [None]:
y=target
bestmodel, train_score, test_score, X_test,y_test=MLP(X,y)
y_pred=bestmodel.predict(X_test)


In [None]:
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(5,5))
sns.heatmap(data=cm,linewidths=.5, annot=True,square = True,  cmap = 'Blues')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(bestmodel.score(X_test, y_test))
plt.title(all_sample_title, size = 15)

# **More data set at **
https://github.com/mwaskom/seaborn-data

df=pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/raw/planets.csv")

https://soumenatta.medium.com/analyzing-pima-indians-diabetes-data-using-python-89a021b5f4eb

## Keras for indian diabeta data
Keras is a neural network Application Programming Interface (API) for Python.
It is perfect for those that do not have a strong background in Deep Learning, but still want to work with neural networks

In any neural network, a dense layer is a layer that is deeply connected with its preceding layer, which means the neurons of the layer are connected to every neuron of its preceding layer.

In [None]:
# load the dataset
#dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
filein="https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"

df=pd.read_csv(filein,header=None)
df.columns=['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']

target = df['Outcome'] #target/response variable y
df1 = df.copy()
df1 = df1.drop('Outcome', axis =1)

## Defining the attributes/predictors
X = df1
y=target

In [None]:
temp=pd.DataFrame(X)
temp.head()

In [None]:
# Compute correlation matrix 
correlations = df.corr(method = 'pearson') 
print(correlations)

In [None]:
# Import required package 
from matplotlib import pyplot
# set the figure size
pyplot.rcParams['figure.figsize'] = [20, 10]
# Draw histograms for all attributes 
df.hist()
pyplot.show()

In [None]:
df.describe()

In [None]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(df1)
X=scaler.transform(df1)

#get_feature_names_out(input_features=df1.columns.values)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X , y, test_size = 0.2, random_state = 42)
 

In [None]:
temp=pd.DataFrame(X_train)
temp.head()

In [None]:
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
# split into input (X) and output (y) variables

# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
#model.add(Dense(2, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) #multi class
# fit the keras model on the dataset
#model.fit(X, y, epochs=20, batch_size=100)
# evaluate the keras model
#_, accuracy = model.evaluate(X, y)
#print('Accuracy: %.2f' % (accuracy*100))


his=model.fit(X_train, y_train, epochs=20, batch_size=32,validation_data=(X_test, y_test))
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
plt.plot(his.history['loss'])
plt.plot(his.history['val_loss'])
plt.title('Curve')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

In [None]:
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)
# summarize the first 5 cases
for i in range(5):
	print('%s => %d (expected %d)' % (df1.loc[i].tolist(), predictions[i], y[i]))

In [None]:
cnf_matrix = confusion_matrix(y, predictions)
cnf_matrix

In [None]:
class_names=[0,1] # name  of classes

fig, ax = plt.subplots(figsize=(10, 5))
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names)
plt.yticks(tick_marks, class_names)
# create heatmap
sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap="YlGnBu" ,fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual label')

In [None]:

plt.figure(figsize=(10,5))
sns.heatmap(data=cnf_matrix,linewidths=.5, annot=True,square = True,  cmap = 'Blues')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
plt.tight_layout()
_, accuracy=model.evaluate(X_test,y_test)
all_sample_title = 'Accuracy Score: {0}'.format(accuracy)
plt.title(all_sample_title, size = 15)

##  **MLPClassifier for indian diabeta data**

In [None]:

def MLP(X,y):
  result=list()
  X_train, X_test, y_train, y_test = train_test_split(X , y, test_size = 0.2, random_state = 42)
  alphas = np.linspace(0.01, 2, num=20)
  lvc_df = pd.DataFrame(alphas, columns=['c'])
  #lvc_df['model'] = lvc_df['c'].apply(lambda c: MLPClassifier(solver='lbfgs', alpha=c, hidden_layer_sizes=(5, 2)).fit(trX, trY))
  lvc_df['model'] = lvc_df['c'].apply(lambda c: MLPClassifier(solver='sgd', alpha=c, hidden_layer_sizes=(100, )).fit(X_train, y_train))

  lvc_df['score']=lvc_df['model'].apply(lambda model: model.score(X_test, y_test))
  filter=lvc_df['score']==max(lvc_df['score'])
  bestmodel=lvc_df[filter]['model'].values[0]

  train_score=bestmodel.score(X_train,y_train)
  test_score=bestmodel.score(X_test,y_test)
  return bestmodel, train_score, test_score, X_test,y_test
#predict_proba(X)

In [None]:

bestmodel, train_score, test_score, X_test,y_test=MLP(X,y)


In [None]:
y_pred=bestmodel.predict(X)
cnf_matrix = confusion_matrix(y, y_pred)
cnf_matrix
mse_krm=mean_squared_error(y, y_pred)
print(mse_krm)
plt.figure(figsize=(5,5))
sns.heatmap(data=cnf_matrix,linewidths=.5, annot=True,square = True,  cmap = 'Blues')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(bestmodel.score(X, y))
plt.title(all_sample_title, size = 15)

In [None]:
#specify size of heatmap
fig, ax = plt.subplots(figsize=(10, 5))

#create seaborn heatmap
ax = sns.heatmap(cnf_matrix/np.sum(cnf_matrix), annot=True, 
            fmt='.2%', cmap='Blues')

ax.set_title('Seaborn Confusion Matrix with labels\n\n');
ax.set_xlabel('\nPredicted ')
ax.set_ylabel('Actual ');

## Display the visualization of the Confusion Matrix.
plt.show()

In [None]:
#specify size of heatmap
fig, ax = plt.subplots(figsize=(10, 5))

labels = ['True Neg','False Pos','False Neg','True Pos']
labels = np.asarray(labels).reshape(2,2)
sns.heatmap(cnf_matrix, annot=labels, fmt='', cmap='Blues')

In [None]:
y_pred=bestmodel.predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)
cnf_matrix
mse_krm=mean_squared_error(y_test, y_pred)
print(mse_krm)

plt.figure(figsize=(5,5))
sns.heatmap(data=cnf_matrix,linewidths=.5, annot=True,square = True,  cmap = 'Blues')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(bestmodel.score(X_test, y_test))
plt.title(all_sample_title, size = 15)
plt.legend()
plt.show()

In [None]:
from sklearn.metrics import plot_confusion_matrix
plot_confusion_matrix(bestmodel, X_test, y_test)
plt.show()  

#**HW Diamonds data application,encode categoral data**

In [None]:
df = sns.load_dataset('diamonds')
df.head()

In [None]:
#define the taget y and predictor X
target = df['price'] #target/response variable y
df1 = df.copy()
df1 = df1.drop('price', axis =1)

## Defining the attributes/predictors
X = df1
X = X.astype(str)

In [None]:
#The best practice when encoding variables is to fit the encoding on the training dataset,
# then apply it to the train and test datasets.
# prepare input data
def prepare_inputs(X_train, X_test,enccolname,colname):
  oe = OrdinalEncoder()
  oe.fit(X_train[enccolname])
  X_train_enc = oe.transform(X_train[enccolname])
  X_test_enc = oe.transform(X_test[enccolname])
  df2=pd.DataFrame(X_train_enc)
  df1=pd.DataFrame(X_train[colname])
  X_train_enc=df1.reset_index(drop=True).merge(df2.reset_index(drop=True), left_index=True, right_index=True)
  df2=pd.DataFrame(X_test_enc)
  df1=pd.DataFrame(X_test[colname])
  X_test_enc=df1.reset_index(drop=True).merge(df2.reset_index(drop=True), left_index=True, right_index=True)
  X_train_enc=scaler.fit_transform(X_train_enc)
  X_test_enc=scaler.transform(X_test_enc)
  return X_train_enc, X_test_enc


# prepare target
def prepare_targets(y_train, y_test):
	le = LabelEncoder()
	le.fit(y_train)
	y_train_enc = le.transform(y_train)
	y_test_enc = le.transform(y_test)
	return y_train_enc, y_test_enc

In [None]:
def MLP(X,y):
  result=list()
  X_train, X_test, y_train, y_test = train_test_split(X , y, test_size = 0.2, random_state = 42)
  X_train_enc, X_test_enc = prepare_inputs(X_train, X_test,enccolname=['cut','color','clarity'],colname=['carat','depth','table','x','y','z'])
  alphas = np.linspace(1, 2, num=2)
  lvc_df = pd.DataFrame(alphas, columns=['c'])
  #lvc_df['model'] = lvc_df['c'].apply(lambda c: MLPClassifier(solver='lbfgs', alpha=c, hidden_layer_sizes=(5, 2)).fit(trX, trY))
  lvc_df['model'] = lvc_df['c'].apply(lambda c: MLPClassifier(solver='sgd', alpha=c, hidden_layer_sizes=(100, )).fit(X_train_enc, y_train))

  lvc_df['score']=lvc_df['model'].apply(lambda model: model.score(X_test_enc, y_test))
  filter=lvc_df['score']==max(lvc_df['score'])
  bestmodel=lvc_df[filter]['model'].values[0]

  train_score=bestmodel.score(X_train_enc,y_train)
  test_score=bestmodel.score(X_test_enc,y_test)
  return bestmodel, train_score, test_score, X_train_enc, X_test_enc,y_train, y_test
#predict_proba(X)

In [None]:
y=target
bestmodel, train_score, test_score, X_train_enc, X_test_enc,y_train, y_test=MLP(X,y)





In [None]:
y_pred=bestmodel.predict(X_test_enc)
cnf_matrix = confusion_matrix(y_test, y_pred)
cnf_matrix
mse_krm=mean_squared_error(y_test, y_pred)
print(mse_krm)

plt.figure(figsize=(5,5))
sns.heatmap(data=cnf_matrix,linewidths=.5, annot=True,square = True,  cmap = 'Blues')
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
all_sample_title = 'Accuracy Score: {0}'.format(bestmodel.score(X_test_enc, y_test))
plt.title(all_sample_title, size = 15)
plt.legend()
plt.show()

10804429.922784576
