<a href="https://colab.research.google.com/github/ucfilho/ANN_capstone_projects/blob/master/ANN_Alexandre_dez_04_2018.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:

# 1. Import libraries and modules, and defining needed functions

from sklearn.neural_network import MLPRegressor   # sci-kit learn is a very used library for ANN consisting in classification problems or regression problems, and other machine learning problems.
import random     # FOR MORE INFO: -> https://docs.python.org/3/library/random.html
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
#import pandas as pd

In [2]:
# ANN reproduction of the article : Biohydrogen production by batch indoor and outdoor photo-fermentation with
# an immobilized consortium: A process model with Neural Networks . Monroy et al 2018.

#Mechanistic models can be very useful since they provide simulated kinetics of substrate,
#biomass and product concentration through the process time, and allow creating databases
#for further applications. Nonetheless, they present some disadvantages such as the inability
#to reproduce lag phases, metabolic shifts or continuous changes in input variables, for
#instance, solar radiation.

#In comparison to these mathematical models, there are data-based models (considered as
#black-box models) that can be constructed by applying Artificial Intelligence (AI)
#techniques or machine learning (ML) methods, which act increasing the process
#performance through experience recorded by data. Artificial Neural Networks (ANN) is in
#fact the most popular ML algorithm in process engineering, also considered as a pattern
#recognition method [24].

#ANN algorithm has been documented as a bioprocess modeling technique managed without
#prior knowledge of the metabolic kinetics within the biological system [25]. In this sense,
#some authors have reported the use of ANN-based models for anaerobic digestion [26,27]
#and dark fermentation process [28-34] with high correlations between observed and
#predicted data; however, its application has not been reported for photo-fermentation
#modeling.


#This work addresses the ANN application as a modeling technique to a group of several
#experimental batches of photo-fermentation, carried out by an immobilized consortium of
#photo-bacteria under different conditions of light intensity (I) using tungsten light, initial
#pH, and concentrations of Fe, Mo and V.

#ANN model was cross-validated, and the predicted kinetics were compared to the
#experimental ones, as well as against the simulated kinetics produced by a mechanistic
#model. The potential of the ANN algorithm was evaluated regarding its capacity to deliver
#reliable predictions of hydrogen production by feeding information about sampling time,
#light intensity, pH and metals (iron - Fe, Molybdenium - Mo and Vanadium - V) concentration added to the medium to the neural network.

# for more information about the experiment, read the article.

# the article compares a mechanistic model (mathematical equation) to the ANN model, to increase reliability.
#The objective of this algorithm is to only reproduce the ANN model and to compare with the experimental results,
#to see how good the ANN model predicted the experimental results.


#The ANN model used in the article was used to predict the TOTAL ACCUMULATED HYDROGEN PRODUCED AT THE END OF THE SUBSTRATE CONSUMPTION (named H2), so the time was not an input data.
#So the authors did lots of batches. But the authors could have done only one batch to make an ANN model to identify the dynamic effects of Hydrogen being produced along the time, and in
#this case the time would be an input data to the ANN.


#ANN can be recommended used to predict phenomena and to estimate a variable a value, when it is possible and cheap to make the experiments, or when it is worthy.







def Normalize(list):     # function to normalize the data contained in "list" to values between -1 and 1 and to store in a new list. input of this function must be a list.

    list_max_orig = max(list)   # list's maximum element value before normalized
    list_min_orig = min(list)    # list's minimum element value before normalized

    a = (list_max_orig + list_min_orig)/2

    b =  (list_max_orig - list_min_orig)/2

    list_norm = []

    for i in range(0,len(list)):   # increasing elements to the new normalized list

        x_norm = (list[i] - a)/b      # normalized variable value between -1 and 1 of index "i"

        list_norm.append(float(x_norm))

    return list_norm




def Original_value(before_normalized_list,normalized_list):     # function to make the normalized values, from a list (y_train of just one column!!, (only one dependent variable)), go back to its original value
    #for the use of this function the before_normalized_list is the original list before being normalized. Generally, it will have more elements than the "normalized_list".
    #the number of elements in comparison of these two lists, does not matter.
    #this function is used only for y_train, because it does not make sense use it for x_train.


    #normalized_list is the list obtained of y_train after the train is done.
    #before_normalized_list is the list containing the experimental results.


    list_max_orig = max(before_normalized_list)   # list's maximum element value before normalized
    list_min_orig = min(before_normalized_list)    # list's minimum element value before normalized

    a = (list_max_orig + list_min_orig)/2

    b =  (list_max_orig - list_min_orig)/2

    list_original = []

    for i in range(0,len(normalized_list)):   # increasing elements to the new normalized list

        x_norm = normalized_list[i]
        x_original  =  a + b*x_norm #original value of x original before it was normalized
        list_original.append(float(x_original))

    return list_original




def create_list(main_list,indice_list): #cria uma nova list derivada de uma list de entrada, onde os elementos puxados para a nova list, são os elementos que contém na velha list e são identificados de acordo com o indice requerido
    # , onde os mesmos são especificados lista de indices.
    #  indice_list não precisa ter a mesma dimensão de main_list.
    new_list=[]
    for i in range(len(indice_list)):
        new_list.append(main_list[int(indice_list[i])])        # se o valor do indice de i for diferente

    return new_list








# This is a regression script to predict a phenomenon.
## Table 1 data: experimental data




Fe = [2.8, 2.8, 2.8, 2.8, 11, 11, 11, 0, 13.8, 6.9, 6.9, 6.9, 6.9, 6.9,0 ,0, 0, 0, 0, 0, 0, 0, 0, 0,11, 6.9, 0, 0, 0,0]  # Medium's initial iron concentration in mg/L

V = [0.13, 0.13, 0.51, 0.51, 0.13, 0.51,0.51, 0.32, 0.32, 0, 0.64, 0.32, 0.32, 0.32, 0 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.13, 0.32, 0, 0, 0, 0] # Medium's initial Vanadium concentration in mg/L

Mo = [0.32, 1.26, 0.32, 1.26, 1.26, 0.32, 1.26, 0.79, 0.79, 0.79, 0.79, 0, 1.58, 0.79, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.32, 0.79, 0, 0, 0, 0] # Medium's initial Molybdenium concentration in mg/L

I = [221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 221, 125.3, 125.3, 366.7, 246, 246, 75.3, 416.7, 246, 366.7, 221, 221, 221, 366.7, 246, 125.3] # Tungsten's light intensity in W/m^2. Kept constant during each batch.

Initial_ph = [6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.5, 6.7, 7.7, 7.7, 6.5, 7.9, 7.2, 7.2, 7.2, 6.7, 6.5, 6.5, 6.5, 6.7, 7.2, 6.7] # Initial pH of the culture medium. Kept constant during each batch.

H2 = [4194, 4433, 4447, 4327, 3105, 4265, 3372, 3926, 3348, 4397, 4209, 3463, 4387, 4186, 4393, 3965, 2970, 2845, 3977, 2269, 2620, 2229, 3795, 2991, 4626, 4023, 4075, 3327, 3475, 4100]  # Total accumulated biohydrogen in mL/L      . Each element of this arrays regards one of the batches.




## Normalizing the data and storing in new lists, to enter in the ANN MLP (multi-layer perceptron) - which requires the normalization of the data







Fe_norm = Normalize(Fe)  # Normalizing Fe

V_norm = Normalize(V)  # Normalizing V

Mo_norm = Normalize(Mo)  # Normalizing Mo

I_norm = Normalize(I)  # Normalizing I

Initial_ph_norm = Normalize(Initial_ph) # Normalizing pH

H2_norm = Normalize(H2)  # Normalizing H2


y_norm = H2_norm   # valores normalizados da resposta




## Training the ANN using .fit and .train and MLPRegressor




population = 25   # número de elementos que deseja ser colocado na variável treino, deve ser menor ou igual ao tamanho de amostras.

LISTA = []   #criando um indice para que cada variável da amostra estejam relacionados ao mesmo experimento.
for i in range(len(y_norm)):
    LISTA.append(i)
indice = random.sample(LISTA,population) #creating a random list to train  # random.sample(population, k) -> Return a k length list of unique elements chosen from the population sequence or set. Used for random sampling without replacement.






Fe_train = create_list(Fe_norm,indice) # criando list de cada variavel para treino ou teste, já normalizadas
V_train = create_list(V_norm,indice)
Mo_train = create_list(Mo_norm,indice)
I_train = create_list(I_norm,indice)
Initial_ph_train = create_list(Initial_ph_norm,indice)
H2_train = create_list(H2_norm,indice)

Matrix_train = [Fe_train,V_train,Mo_train,I_train,Initial_ph_train]



x_train = np.array(Matrix_train)
y_train = np.array( H2_train )

x_train = x_train.T  #transpondo a matriz, transformando linhas em colunas e colunas em linhas, pois o .fit do sklearn trabalha com a linha da matriz x_train sendo o valor da amostra no experimento.

y_train = y_train.T






# Defining neural network - for more info: http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor

clf = MLPRegressor(solver='lbfgs',activation='tanh',alpha=1e-5,hidden_layer_sizes=(6,9),random_state=1)  # clf is a short to "classifier", a term very used in machine learning


# treinando a rede neural

clf.fit(x_train, y_train)   #  o numero de linhas de x_train e y_train tem que ser equivalente ao numero do experimento, enquanto a coluna diz respeito ao numero de variaveis

y_calc_rede_train = clf.predict(x_train)   #comando para calcular valor de y através de uma dada matriz x_train, depois que a rede já está treinada. Os valores sairão normalizados.

y_calc_rede_train = Original_value(H2,y_calc_rede_train)  # retornando valores normalizados de y_calc_rede, para valores com escalas originais, que foram calculados a partir da rede neural.
#o retorno dessa variável para valores originais é importante para calcular o R^2 do fit.



# criando uma matriz com dados experimentais aleatorios para testar mais ainda a rede treinada e ver se o R^2 da rede está bom o suficiente como o desejável.


population = 10
indice = random.sample(LISTA,population) #creating a random list to train  # random.sample(population, k) -> Return a k length list of unique elements chosen from the population sequence or set. Used for random sampling without replacement.


Fe_test = create_list(Fe_norm,indice) # criando list de cada variavel para treino ou teste, já normalizadas
V_test = create_list(V_norm,indice)
Mo_test = create_list(Mo_norm,indice)
I_test = create_list(I_norm,indice)
Initial_ph_test = create_list(Initial_ph_norm,indice)
H2_test = create_list(H2_norm,indice)



x_test = ( np.array([Fe_test,V_test,Mo_test,I_test,Initial_ph_test]) ).T  #
y_test = (np.array(H2_test) ).T



y_calc_rede_test = clf.predict(x_test)   # calculando valor de y pela rede neural
#A rede seria perfeita R^2 = 1 se todos os valores de ycalc_rede_test fossem iguais aos valores dos elementos de H2_test! e também se o critério de não overfitting fosse satisfeito.

y_calc_rede_test = Original_value(H2,y_calc_rede_test)


# calculando o R^2
H2_test = Original_value(H2,H2_test)

mse =  mean_squared_error(H2_test,y_calc_rede_test)

R2=r2_score(H2_test,y_calc_rede_test)

#


print(mse)
print(R2)



9181.453488330444
0.9830545600926927
