![DLI Header](../images/DLI_Header.png)

# Exercise: Build an Autoencoder Pipeline

Your task in this exercise is to create a digital fingerprint for `user123` and then use this digital fingerprint to ascertain whether or not the user's new activity is exhibiting anomalous behavior.

---

## Your Pipeline Here

In [1]:
!morpheus run \
  --num_threads=1 \
  pipeline-ae \
    --userid_filter="user123" \
    --userid_column_name="userIdentitysessionContextsessionIssueruserName" \
  from-cloudtrail \
    --input_glob="data/input-data/*.csv" \
  train-ae \
    --train_data_glob="data/training-data/*.csv" \
    --seed 42 \
  preprocess \
  inf-pytorch \
  add-scores \
  serialize \
  to-file \
    --filename="data/output/output.csv" \
    --overwrite

[32mConfiguring Pipeline via CLI[0m
[33mC++ is disabled for AutoEncoder pipelines at this time.[0m
[31mStarting pipeline via CLI... Ctrl+C to Quit[0m
512


---

## Examine the Output

After successfully constructing and running the pipeline, assuming you saved the output of your pipeline to `data/output/output.csv`, use the following cells to examine the output data.

In [2]:
import pandas as pd

In [3]:
output = pd.read_csv('data/output/output.csv')

In [4]:
output.dtypes

Unnamed: 0                                           int64
_index_                                              int64
eventID                                              int64
eventTime                                           object
userIdentityaccountId                               object
eventSource                                         object
eventName                                           object
sourceIPAddress                                     object
userAgent                                           object
userIdentitytype                                    object
apiVersion                                          object
userIdentityprincipalId                             object
userIdentityarn                                     object
userIdentityaccessKeyId                             object
userIdentitysessionContextsessionIssueruserName     object
errorCode                                           object
errorMessage                                        obje

In [5]:
output['ae_anomaly_score'].describe()

count    847.000000
mean       0.425394
std        0.244687
min        0.154966
25%        0.248559
50%        0.376444
75%        0.502863
max        1.868160
Name: ae_anomaly_score, dtype: float64

In [6]:
output.sort_values(by='ae_anomaly_score', ascending=False)['ae_anomaly_score']

315    1.868160
318    1.646329
314    1.640971
321    1.592058
317    1.592058
         ...   
312    0.154966
306    0.154966
556    0.154966
559    0.154966
767    0.154966
Name: ae_anomaly_score, Length: 847, dtype: float64

---

## Calculate the Z-Scores

Use the following cells to assist you in calculating Z-Scores from `user123`'s autoencoder anomaly scores.

In [7]:
output['zscore'] = ( output['ae_anomaly_score'] - output['ae_anomaly_score'].mean() ) / output['ae_anomaly_score'].std()

In [8]:
output['zscore'].describe()

count    8.470000e+02
mean    -1.090561e-16
std      1.000000e+00
min     -1.105201e+00
25%     -7.226960e-01
50%     -2.000484e-01
75%      3.166043e-01
max      5.896379e+00
Name: zscore, dtype: float64

In [9]:
output.sort_values(by='zscore', ascending=False)['zscore']

315    5.896379
318    4.989788
314    4.967892
321    4.767992
317    4.767992
         ...   
312   -1.105201
306   -1.105201
556   -1.105201
559   -1.105201
767   -1.105201
Name: zscore, Length: 847, dtype: float64

---

## Analysis

Under the assumption that a Z-score greater than 4 indicates that activity is anomalous compared to a user's digital fingerprint, what is your analysis of the pipeline's output?

## Solution

If you get stuck, click on the `...` directly below to expand the solution pipeline.

### Solution Pipeline

```sh
morpheus run \
  --num_threads=1 \
  pipeline-ae \
    --userid_filter="user123" \
    --userid_column_name="userIdentitysessionContextsessionIssueruserName" \
  from-cloudtrail \
    --input_glob="data/input-data/*.csv" \
  train-ae \
    --train_data_glob="data/training-data/*.csv" \
    --seed 42 \
  preprocess \
  inf-pytorch \
  add-scores \
  serialize \
  to-file \
    --filename="data/output/output.csv" \
    --overwrite
```

### Solution Analysis

Incoming data for `user123` exhibited z-scores in excess of 4. This user's account is exhibiting highly anomalous behavior compared to its digital fingerprint and further investigation or action is warranted.

---

## Next

In addition to the powerful autoencoder tools you've utilized so far, the Morpheus autoencoder pipeline also provides the capabilities to perform time series analysis to identify anomalous user behavior across time. In the next section you will learn how to utilize it.

Please continue to the next notebook.