![DLI Header](../images/DLI_Header.png)

# Introduction to Autoencoders

In this notebook we are going to give a high-level introduction autoencoders, a kind of deep neural network that Morpheus can use to create unique "digital fingerprints" for users and services. In subsequent notebooks you will be leveraging autoencoders in Morpheus pipelines.

## Objectives

By the time you complete this notebook you will:

- Have a high-level understanding of autoencoders.
- Understand how autoencoders can be used in a cybersecurity setting to identify anomalous user or service behavior.

---

## Autoencoders

**Autoencoders** are a subset of neural network architectures where the output dimension is the same as the input dimension. Autoencoders have two networks, an **encoder** and a **decoder**.  The encoder encodes its input data into a smaller dimensional space, called the **latent space**. The decoder network tries to **reconstruct** the original data from the latent encoding.

Typically, the encoder and decoder are symmetric, and the latent space is a bottleneck. The autoencoder has to learn essential characteristics of the data to be able to do a high-quality reconstruction of the data during decode.

![autoencoder network](images/ae.png)

## Evaluating Autoencoder Performance

Because the goal of an autoencoder is to reconstruct its input data, we can evaluate how well an autoencoder is performing by comparing its input to its output. The greater the difference between the input and the output, the worse it performed.

As visual examples here, we will use the squares divided into 4 quarters, each with their own color. The autoencoder should try to reconstruct the input square to match exactly the output square.

As an example, we might consider the following a "perfect" reconstruction:

![perfect](images/perfect.png)

The following a "decent" reconstruction:

![decent](images/decent.png)

And the following a "terrible" reconstruction:

![terrible](images/terrible.png)

## Expected Autoencoder Performance

We should expect that an autoencoder can provide a consistent quality of reconstruction on the kinds of inputs that it was trained on. For our example, let's assume that we have trained an autoencoder to reconstruct the same kinds of square images as above, training it on squares that contain only shades of green.

Here we show that when the trained autoencoder is given several never-before-seen squares with shades of green, that is consistently does a "decent" reconstruction:

![consistent decent](images/consistent_decent.png)

However, if we were to provide the autoencoder, which was trained on squares with only shades of green, squares containing other colors, that is does not consistently perform at a "decent" level: 

![consistent terrible](images/consistent_terrible.png)

---

## Key Takeaways

1. We can expect that an autoencoder will perform consistently well reconstructing the same kind of data it was trained on.
2. Knowing an autoencoder's consistent performance on the kind of data it was trained on, we can assume that if it performs significantly different, that it was given data significantly different than the kind it was trained on.

In our visual example, we trained an autoencoder on squares containing shades of green. After training it consistently did a "decent" job reconstructing new squares containing shades of green. When given squares that did not contain shades of green, the autoencoder no longer did a consistently "decent" job reconstructing them.

---

## Relevance to Cybersecurity

In the context of cybersecurity we will use autoencoders to discover when a user or service has been taken over by a malicious agent.

We will train autoencoders on typical, non-malicious user and service data, observing their typical reconstruction performance.

We will then pass new user data into the trained autoencoders, and when we see their reconstruction performance deviating significantly from our pre-established norm, we will consider that this new data is atypical, or anomalous, and may represent that the user or service is being controlled by a malicious agent.

---

## Next

In the next section you will begin to build Morpheus pipelines that utilize autoencoders for identifying compromised user or service activity.

Please continue to the next notebook.