## Multi-Dimensional Decision Inputs

In the [Introduction to Activation Functions](Introduction_ActivationFunctions.ipynb) notebook, we
introduced the idea that very simple decisions may be encoded into mathematical functions known as
_activation functions_. These functions in general take some numerical value as input and output
a number between 0 and 1, which we interpret to be a measure of confidence about whether the decision
which the function had made is a "yes" decision or a "no" decision. Our examples for this were quite
limited in nature - we necessarily chose to consider examples which embodied a very simple in/out
relationship consisting of a single variable input and a single output. It is immediately apparent that
such models are extremely limited in their scope of applicability to real-world problems, as very few
phenomena in the world around us are simple enough to be modeled via a simple activation function. In
general, the vast majority of observed effects have multiple contributing causes, each of which contributes
to the observed effect to varying degrees. For instance, one does not consider only the ambient
temperature outside when deciding what to wear into a rainstorm - surely precipitation should
influence this decision too, and to a disproportionate degree at that! It is intuitively apparent then
that any model which we wish to use in order to replicate, classify, or predict observed phenomena ought to
account for multiple causes.

In this notebook, we expand our rudimentary decision model in a simple yet profound way by allowing multiple
inputs to help determine what decision a model makes. In doing so, we take a small but significant step
toward assembling a mathematical model which is capable of learning how to accurately make mathematically
nuanced decisions.


### Software Prerequisites

The following Python libraries are prerequisites to run this notebook; simply run the following code block to
install them. They are also listed in the `requirements.txt` file in the root of this notebook's
[GitHub repository](https://github.com/uccs-math-clinic/mc-workshops).

In [1]:
%pip install matplotlib==3.5.1 \
             numpy==1.21.5

You should consider upgrading via the '/work/jonathan/uccs/mc-workshops/venv/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


The Python kernel must be restarted after running the above code block for the first time within a particular virtual environment. This may be accomplished by navigating to `Kernel -> Restart` in the menu bar.

With our package dependencies installed, we can run the following [boilerplate code](https://en.wikipedia.org/wiki/Boilerplate_code) in order to import the packages needed for this notebook:

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import time

%matplotlib notebook
plt.ion()

<matplotlib.pyplot._IonContext at 0x7f53dc0602b0>

## Input Features

As we explore a particular phenomenon which we wish to model or predict, it is likely that we can make some pretty
good guesses as to what sorts of things contribute to the behavior of the phenomenon we observe. For example, if we
consider a model which predicts appropriate outerwear given a set of weather conditions, we might conjecture that
temperature, humidity, precipitation, and wind speed are all relevant sources of information for making an informed
choice. If instead we wish to predict a student's performance on a test, we might consider the number of hours studied
beforehand, the length of time over which the test material was presented in class, and the student's sleep schedule to
all be contributing factors toward a pass or fail result. In both of these scenarios, there exist several contributing
factors toward a particular result, and each factor carries with it its own level of influence toward that final result.
These distinct factors are referred to as _features_ (or variables or attributes) for a mathematical model and - as we've
seen with our simple examples already - may vary quite a bit between various models and contexts. The degree to which a
particular feature contributes to a final decision made is referred to as that feature's _weight_. We often also wish to
maintain some control over the threshold at which a decision is made; for instance, we'd probably like for a model which
predicts the likelihood of a safe lane change while driving to output a very high probably of success before we act on
that prediction. This activation threshold is referred to as the feature's _bias_. By adjusting these weights and biases,
our model encodes the degree to which distinct input features are prioritized and in doing so allows us to more closely align
the model's behavior with our own intuition and observations.

Recall that we chose the logistic function as our sigmoid activation function, whose definition and derivative are as follows:

$$
\begin{aligned}
           f(z) &= \frac{1}{1 + e^{-z}} \\
  \frac{df}{dz} &= f(z) \cdot (1 - f(z))
\end{aligned}
$$

Moreover, if $z = z(\textbf{\theta}, x)$ is also function of some set of parameters $\textbf{\theta} = \{\theta_{i}\}_{i=1}^{n}$ in
addition to the input value $x$, then the partial derivative with respect to any single parameter $\theta_i$ may be calculated
via the Chain Rule:

$$
\begin{equation*}
  \frac{\partial f}{\partial \theta_i} &= f(z) \cdot (1 - f(z)) \cdot \frac{\partial z}{\partial \theta_i}
\end{equation*}
$$

In particular, we let $z = wx + b$, where $w$ and $b$ are the _weight_ and _bias_ parameters for the activation function $\sigma(z)$,
and $x$ is the actual feature data point. With this result in hand, we now turn our attention toward a decision function which maps two
feature inputs $x_1$ and $x_2$ (with corresponding weights $w_1$ and $w_2$ and biases $b_1$ and $b_2$) to a single output decision. As
before, we wish to learn the weight and bias which allows our model to most closely represent the available data set; the primary
difference at this point is, of course, the fact that we are learning two sets of weights and biases simultaneously instead of just one.
In this case, our decision function looks quite similar to before, with a slight modification to our definition of $z$:

$$
\begin{aligned}
           f(z) &= \frac{1}{1 + e^{-z}} \\
                &= \frac{1}{1 + e^{-(w_{1}x_{1} + b_1 + w_{2}x_{2} + b_2)}}
\end{aligned}
$$

More succinctly:


$$
\begin{equation*}
           f(z) &= \frac{1}{1 + e^{-\sum_{i=1}^{2}{w_{i}x_{i} + b_i}}}
\end{equation*}
$$
