In [2]:
# Neural networks (also known as artificial neural networks, or ANN) are a set of algorithms that are modeled after the human brain. They are an advanced form of machine learning that recognizes patterns and features in input data and provides a clear quantitative output. In its simplest form, a neural network contains layers of neurons, which perform individual computations. These computations are connected and weighed against one another until the neurons reach the final layer, which returns a numerical result, or an encoded categorical result.


# A deep neural network includes input and output layers. In this example, the layers process an image of an animal to determine whether it is a cat or dog.

# Neural networks are particularly useful in data science because they serve multiple purposes.

# One way to use a neural network model is to create a classification algorithm that determines if an input belongs in one category versus another. Alternatively neural network models can behave like a regression model, where a dependent output variable can be predicted from independent input variables. Therefore, neural network models can be an alternative to many of the models we have learned throughout the course, such as random forest, logistic regression, or multiple linear regression.

# There are a number of advantages to using a neural network instead of a traditional statistical or machine learning model. For instance, neural networks are effective at detecting complex, nonlinear relationships. Additionally, neural networks have greater tolerance for messy data and can learn to ignore noisy characteristics in data. The two biggest disadvantages to using a neural network model are that the layers of neurons are often too complex to dissect and understand (creating a black box problem), and neural networks are prone to overfitting (characterizing the training data so well that it does not generalize to test data effectively). However, both of the disadvantages can be mitigated and accounted for.

# REWIND
# Overfitting occurs when a model gives undue importance to patterns within a particular dataset that are not found in other, similar datasets.

# Neural networks have many practical uses across multiple industries. In the finance industry, neural networks are used to detect fraud as well as trends in the stock market. Retailers like Amazon and Apple are using neural networks to classify their consumers to provide targeted marketing as well as behavior training for robotic control systems. Due to the ease of implementation, neural networks also can be used by small businesses and for personal use to make more cost-effective decisions on investing and purchasing business materials. Neural networks are scalable and effective—it is no wonder why they are so popular.

In [3]:
# PERCEPTRON

# Brain metaphors, easy to use, and cost-effective? Excellent at detecting complex, nonlinear relationships? Neural networks are starting to sound like a great fit for the model. Beks sends Andy a quick Slack message to let him know the research phase of the project is well under way. Her next step will be to dig into the math a bit: What exactly goes into a neural network? To explore this, she'll start with the perceptron model.
# Although artificial neural networks have become popular in recent years, the original design for computational neurons (and, subsequently, the neural network) dates as far back as the late 1950s, when Frank Rosenblatt, a pioneer in the field of artificial intelligence, created the perceptron, a machine for training the first neural network. The perceptron model is a single neural network unit, and it mimics a biological neuron by receiving input data, weighing the information, and producing a clear output.

# The perceptron model has four major components:

# Input values, typically labelled as x or 𝝌 (chi, pronounced kaai, as in eye)
# A weight coefficient for each input value, typically labelled as w or ⍵ (omega)
# Bias is a constant value added to the input to influence the final decision, typically labelled as w0. In other words, no matter how many inputs we have, there will always be an additional value to "stir the pot."
# A net summary function that aggregates all weighted inputs, in this case a weighted summation:
#  The perceptron model shows the net summary function and its input
# values, weight coefficients, and bias as a constant
# value.

# Perceptrons are capable of classifying datasets with many dimensions; however, the perceptron model is most commonly used to separate data into two groups (also known as a linear binary classifier). In other words, the perceptron algorithm works to classify two groups that can be separated using a linear equation (also known as linearly separable). For example, in the image below, we have a purple group and blue group in both figures. In the left figure, we can draw a line down the middle of the figure to separate the groups entirely. While in the right figure, there is no single (straight) line that can be drawn to separate the two groups entirely. Therefore, the left image is considered linearly separable while the right is not:

# "Linearly separable" means that we can draw a straight line that
# perfectly separates the two groups of data. The figure on the left
# demonstrates data that is linearly separable, while the figure on the
# right does
# not.



# The perceptron model is designed to produce a discrete classification model and to learn from the input data to improve classifications as more data is analyzed. To better understand how the perceptron model and algorithm works, let's consider the following dataset:

# The 2D graph shows XY data points: four orange crosses and four green
# circles.

# In this example, we want to generate a perceptron classification model that can distinguish between values that are purple squares versus values that are blue circles. Since this perceptron model will try to classify values in a two-dimensional space, our input values would be:

# 𝜒2 - the y value
# 𝜒1 - the x value
# ⍵0 - the constant variable (which becomes the bias constant)
# IMPORTANT
# There will always be one more input variable than the number of dimensions to ensure there is a bias constant within the model.

# As for the weight and bias coefficients, these values are arbitrary when the perceptron model first looks at the data. As a result, the two-dimensional perceptron's net sum function would be:

# ⍵0 + 𝜒1⍵1 + 𝜒2⍵2

# Where ⍵0 is the bias term, and 𝜒1⍵1 and 𝜒2⍵2 are the weighted x and y values for each data point. If the net sum of the data point is greater than zero, it classifies the data point as a purple square, otherwise the data point is classified as a blue circle.

# Due to the initial weight coefficients being arbitrary, it is very likely that the first iteration of the perceptron model will classify values incorrectly. Let's say that the first iteration of our perceptron model looks like the following image:

#  The same 2D graph now includes a linear classifier (dashed
# line).

# In this image, the perceptron's linear classifier is represented with the dashed line. According to the perceptron model, values above the dashed line would be considered to be purple squares, and values below the dashed line would be considered to be blue circles. Although the perceptron model correctly classified all of the purple square data points, it misclassified one of the blue circle data points:

# The same 2D graph highlights a single misclassified data
# point.

# The next step in the perceptron algorithm is to check each data point and determine if we need to update the weight coefficients to better classify all data points. When the perceptron model evaluates all of the input data, the correctly classified data points will not change the weight coefficients; however, the incorrectly classified data points will adjust the weight coefficients to move toward the data point.

# After adjusting the weights, the perceptron algorithm will reevaluate each data point using the new model:

# After adjusting the weights, the 2D graph properly classifies all data
# points.

# As with other machine learning algorithms, this process of perceptron model training continues again and again until one of three conditions are met:

# The perceptron model exceeds a predetermined performance threshold, determined by the designer before training. In machine learning this is quantified by minimizing the loss metric.
# The perceptron model training performs a set number of iterations, determined by the designer before training.
# The perceptron model is stopped or encounters an error during training.
# At first glance, the perceptron model is very similar to other classification and regression models; however, the power of the perceptron model comes from its ability to handle multidimensional data and interactivity with other perceptron models. As more multidimensional perceptrons are meshed together and layered, a new, more powerful classification and regression algorithm emerges—the neural network.

In [4]:
# A basic neural network has three layers:

# An input layer of input values transformed by weight coefficients
# A single "hidden" layer of neurons (single neuron or multiple neurons)
# An output layer that reports the classification or regression model value
# As mentioned previously, neural networks work by linking together neurons and producing a clear quantitative output. But if each neuron has its own output, how does the neural network combine each output into a single classifier or regression model? The answer is an activation function.

# The activation function is a mathematical function applied to the end of each "neuron" (or each individual perceptron model) that transforms the output to a quantitative value. This quantitative output is used as an input value for other layers in the neural network model. There are a wide variety of activation functions that can be used for many specific purposes; however, most neural networks will use one of the following activation functions:

# The linear function returns the sum of our weighted inputs without transformation.
# The sigmoid function is identified by a characteristic S curve. It transforms the output to a range between 0 and 1.
# The tanh function is also identified by a characteristic S curve; however, it transforms the output to a range between -1 and 1.
# The Rectified Linear Unit (ReLU) function returns a value from 0 to infinity, so any negative input through the activation function is 0. It is the most used activation function in neural networks due to its simplifying output, but it might not be appropriate for simpler models.
# The Leaky ReLU function is a "leaky" alternative to the ReLU function, whereby negative input values will return very small negative values.


# To better understand how multiple neurons connect together with activation functions to make a robust neural network, we'll explore a teaching application known as the TensorFlow PlaygroundLinks to an external site..

# TensorFlow is a neural network and machine learning library for Python that has become an industry standard for developing robust neural network models. TensorFlow developed its playground application as a teaching tool to demystify the black box of neural networks and provide a working simulation of a neural network as it trains on a variety of different datasets and conditions.



# In this video, you'll use the simulations in the TensorFlow Playground to better understand how altering the neurons and activation functions of a neural network can change its performance.


# Now that we have spent some time understanding the structure of a basic neural network and how each component impacts the final model, it is time to learn how to build our own functioning models. Don't worry if you feel like there are too many components, options, and parameters to keep track of—neural networks start off simple and grow to match the complexity of the input data.

In [5]:
# Simulations in the TensorFlow Playground

In [6]:
# # Installs latest version of TensorFlow 2.X 
# pip install --upgrade tensorflow

In [7]:
# There are a number of smaller modules within the TensorFlow library that make it even easier to build machine learning models. For our purposes, we'll use the Keras module to help build our basic neural networks. Keras contains multiple classes and objects that can be combined to design a variety of neural network types. These classes and objects are order-dependent, which means that depending on what Keras objects are used (and in what order), the behavior of the neural network model will change accordingly. For our basic neural network, we'll use two Keras classes:

# The Sequential class is a linear stack of neural network layers, where data flows from one layer to the next. This model is what we simulated in the TensorFlow Playground.
# The generalized Dense class allows us to add layers within the neural network.
# With the Sequential model, we'll add multiple Dense layers that can act as our input, hidden, and output layers. For each Dense layer, we'll define the number of neurons, as well as the activation function. Once we have completed our Sequential model design, we can apply the same Scikit-learn model -> fit -> predict/transform workflow as we used for other machine learning algorithms.

# REWIND
# The process of model -> fit -> predict/transform follows the same general steps across all of data science:

# Decide on a model, and create a model instance.
# Split into training and testing sets, and preprocess the data.
# Train/fit the training data to the model. (Note that "train" and "fit" are used interchangeably in Python libraries as well as the data field.)
# Use the model for predictions and transformations.

In [8]:
# conda install nomkl