

    ██████╗ ██╗██╗     ███████╗████████╗███╗   ███╗     ██████╗██████╗  █████╗  ██████╗██╗  ██╗███████╗██████╗ 
    ██╔══██╗██║██║     ██╔════╝╚══██╔══╝████╗ ████║    ██╔════╝██╔══██╗██╔══██╗██╔════╝██║ ██╔╝██╔════╝██╔══██╗
    ██████╔╝██║██║     ███████╗   ██║   ██╔████╔██║    ██║     ██████╔╝███████║██║     █████╔╝ █████╗  ██████╔╝
    ██╔══██╗██║██║     ╚════██║   ██║   ██║╚██╔╝██║    ██║     ██╔══██╗██╔══██║██║     ██╔═██╗ ██╔══╝  ██╔══██╗
    ██████╔╝██║███████╗███████║   ██║   ██║ ╚═╝ ██║    ╚██████╗██║  ██║██║  ██║╚██████╗██║  ██╗███████╗██║  ██║
    ╚═════╝ ╚═╝╚══════╝╚══════╝   ╚═╝   ╚═╝     ╚═╝     ╚═════╝╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝
                                                                                                           

---

![alt text](http://www.treasurenet.com/forums/attachment.php?attachmentid=173574&amp;d=1332348453)

---

This module trains a bidirectional long short-term memory (LSTM) network on a dataset consisting of cleartext passwords. The trained network is then used to predict the most likely alterations and/or additions to a given sequence.

---


### Assumptions
The dataset is assumed to contain no information other than the cleartext passwords.

The network parameters (*e.g.*, number of hidden units, embedding layer, *etc.*) are defined in the configuration file (`program/config.yml`).


### Code steps
This is the basic flow of the code:

1. read in data
   * clean up data (duplicates, NaN, etc)  
2. get data characteristics
   * determine number of characters  
   * determine/define longest sequence length  
3. generator
   * tokenization  
   * sliding windows  
4. training
5. sequence
   * for i in sequence, predict most likely candidates in each position  
   * calculate most likely shared candidates  
   * calculate probabilities of overall adjusted sequences  


---
# Initial Definitions
---

Import the libraries used in this notebook:

In [17]:
# import libraries
import boto3
import time
import os
import keras

# sagemaker libraries
import sagemaker
from sagemaker.tuner              import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
from sagemaker.tensorflow         import TensorFlow
from sagemaker.tensorflow.serving import Model

## Variable definitions

Define all of the variables used in the notebook here:

In [85]:
# specify the S3 bucket parameters
bucket = 'production-enrichment-repo'
prefix = 'blstm'

# get the session and IAM role information
sess = sagemaker.Session()
role = sagemaker.get_execution_role()

# location and name of the program containing all of the code
program_name = 'program.py'
program_path = 'program'
job_name     = 'AJT'

These are the variables related to the model artifacts:

In [86]:
# locations in which to store model artifacts
intermediate_location = 's3://{}/{}/intermediate'.format(bucket, prefix)
output_location       = 's3://{}/{}/output/'.format(bucket, prefix)

# specify the location in S3 containing the dataset
data_name     = 'train.csv'
key           = os.path.join(prefix, 'train', data_name)
s3_train_data = 's3://{}/{}'.format(bucket, key)

# define the intermediate path where the model artifacts will be stored
inter      = os.path.join(prefix, 'intermediate')
inter_path = 's3://{}/{}'.format(bucket, inter)

# define the output path
out         = os.path.join(prefix, 'output')
output_path = 's3://{}/{}'.format(bucket, out)

These are the variables related to the endpoint:

In [87]:
# path to model artifacts
model_artifacts = '{}/{}/model.tar.gz'.format(output_location, job_name)

# include the date in the endpoint name
endpoint_name = 'blstm-'+time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

Save the SageMaker session and IAM role information:

In [None]:
# https://github.com/aws/sagemaker-python-sdk/issues/911
# https://towardsdatascience.com/building-fully-custom-machine-learning-models-on-aws-sagemaker-a-practical-guide-c30df3895ef7

---
# Data
---

## Processing

The next section uploads the dataset to the S3 bucket:

In [88]:
# configure SageMaker input channel
input_data = {
    'training': sagemaker.session.s3_input(s3_train_data, distribution='FullyReplicated', content_type='text/csv')
}

---
# Model
---

## Training

Define the hyperparameters for the training job:

In [89]:
hyperparameters={'epochs':       1, 
                 'batch_size':   128,
                 'hidden_units': 100,
                 'training':     s3_train_data}

Define the TensorFlow estimator:

In [94]:
estimator = TensorFlow(entry_point          = program_name, 
                       role                 = role,
                       source_dir           = program_path,
                       model_dir            = intermediate_location,
                       output_path          = output_location,
                       code_location        = intermediate_location,
                       train_instance_count = 1, 
                       train_instance_type  = 'local',
                       framework_version    = '1.12', 
                       py_version           = 'py3',
                       script_mode          = True,
                       hyperparameters      = hyperparameters
                       )

Fit the model using the hyperparameters and estimator defined above:

In [92]:
role

'arn:aws:iam::752281881774:role/service-role/AmazonSageMaker-ExecutionRole-20191022T142571'

In [95]:
estimator.fit(inputs=input_data, job_name=job_name)

Creating tmp7dvg41bc_algo-1-w4x6n_1 ... 
[1BAttaching to tmp7dvg41bc_algo-1-w4x6n_12mdone[0m
[36malgo-1-w4x6n_1  |[0m 2019-12-18 16:23:39,288 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training
[36malgo-1-w4x6n_1  |[0m 2019-12-18 16:23:39,296 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)
[36malgo-1-w4x6n_1  |[0m 2019-12-18 16:23:39,687 sagemaker-containers INFO     Installing module with the following command:
[36malgo-1-w4x6n_1  |[0m /usr/bin/python -m pip install -U . -r requirements.txt
[36malgo-1-w4x6n_1  |[0m Processing /opt/ml/code
[36malgo-1-w4x6n_1  |[0m Requirement already up-to-date: absl-py==0.7.1 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 1)) (0.7.1)
[36malgo-1-w4x6n_1  |[0m Collecting altair==3.2.0 (from -r requirements.txt (line 2))
[36malgo-1-w4x6n_1  |[0m [?25l  Downloading https://files.pythonhosted.org/packages/34/24/3e50e226a79db1bb1427bf8c58cc4dc7c2f

[36malgo-1-w4x6n_1  |[0m [?25hCollecting docker-compose==1.23.2 (from -r requirements.txt (line 27))
[36malgo-1-w4x6n_1  |[0m [?25l  Downloading https://files.pythonhosted.org/packages/1e/6c/bf9879305530c4b765ef4eb3be76202788ca1037aec74d2c0ec73191d467/docker_compose-1.23.2-py2.py3-none-any.whl (131kB)
[K    100% |████████████████████████████████| 133kB 33.5MB/s ta 0:00:01
[36malgo-1-w4x6n_1  |[0m [?25hCollecting docker-pycreds==0.4.0 (from -r requirements.txt (line 28))
[36malgo-1-w4x6n_1  |[0m   Downloading https://files.pythonhosted.org/packages/f5/e8/f6bd1eee09314e7e6dee49cbe2c5e22314ccdb38db16c9fc72d2fa80d054/docker_pycreds-0.4.0-py2.py3-none-any.whl
[36malgo-1-w4x6n_1  |[0m Collecting dockerpty==0.4.1 (from -r requirements.txt (line 29))
[36malgo-1-w4x6n_1  |[0m   Downloading https://files.pythonhosted.org/packages/8d/ee/e9ecce4c32204a6738e0a5d5883d3413794d7498fe8b06f44becc028d3ba/dockerpty-0.4.1.tar.gz
[36malgo-1-w4x6n_1  |[0m Collecting docopt==0.6.2 (from -r r

[36malgo-1-w4x6n_1  |[0m Requirement already up-to-date: Keras==2.2.4 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 55)) (2.2.4)
[36malgo-1-w4x6n_1  |[0m Requirement already up-to-date: Keras-Applications==1.0.7 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 56)) (1.0.7)
[36malgo-1-w4x6n_1  |[0m Requirement already up-to-date: Keras-Preprocessing==1.0.9 in /usr/local/lib/python3.6/dist-packages (from -r requirements.txt (line 57)) (1.0.9)
[36malgo-1-w4x6n_1  |[0m Collecting kiwisolver==1.0.1 (from -r requirements.txt (line 58))
[36malgo-1-w4x6n_1  |[0m [?25l  Downloading https://files.pythonhosted.org/packages/69/a7/88719d132b18300b4369fbffa741841cfd36d1e637e1990f27929945b538/kiwisolver-1.0.1-cp36-cp36m-manylinux1_x86_64.whl (949kB)
[K    100% |████████████████████████████████| 952kB 23.7MB/s ta 0:00:01
[36malgo-1-w4x6n_1  |[0m [?25hCollecting langdetect==1.0.7 (from -r requirements.txt (line 59))
[36malgo-1-w4x6n_1

[36malgo-1-w4x6n_1  |[0m [?25hCollecting pycodestyle==2.4.0 (from -r requirements.txt (line 83))
[36malgo-1-w4x6n_1  |[0m [?25l  Downloading https://files.pythonhosted.org/packages/e5/c6/ce130213489969aa58610042dff1d908c25c731c9575af6935c2dfad03aa/pycodestyle-2.4.0-py2.py3-none-any.whl (62kB)
[K    100% |████████████████████████████████| 71kB 26.3MB/s ta 0:00:01
[36malgo-1-w4x6n_1  |[0m [?25hCollecting pydot==1.4.1 (from -r requirements.txt (line 84))
[36malgo-1-w4x6n_1  |[0m   Downloading https://files.pythonhosted.org/packages/33/d1/b1479a770f66d962f545c2101630ce1d5592d90cb4f083d38862e93d16d2/pydot-1.4.1-py2.py3-none-any.whl
[36malgo-1-w4x6n_1  |[0m Collecting pydub==0.23.1 (from -r requirements.txt (line 85))
[36malgo-1-w4x6n_1  |[0m   Downloading https://files.pythonhosted.org/packages/79/db/eaf620b73a1eec3c8c6f8f5b0b236a50f9da88ad57802154b7ba7664d0b8/pydub-0.23.1-py2.py3-none-any.whl
[36malgo-1-w4x6n_1  |[0m Collecting pyflakes==2.0.0 (from -r requirements.txt (l

[36malgo-1-w4x6n_1  |[0m   Downloading https://files.pythonhosted.org/packages/ef/06/53edcae4edea76b38a325980dd35aed3b39f9bd0ef27b9d33f2e6dc4c7f6/soupsieve-1.6.2-py2.py3-none-any.whl
[36malgo-1-w4x6n_1  |[0m Collecting speech==0.5.2 (from -r requirements.txt (line 113))
[36malgo-1-w4x6n_1  |[0m   Downloading https://files.pythonhosted.org/packages/0f/ab/12dbcc8ad860546b7aaef6c367ffa639cc81007540e488eb92cf22639f86/speech-0.5.2.tar.gz
[36malgo-1-w4x6n_1  |[0m Collecting streamlit==0.47.4 (from -r requirements.txt (line 114))
[36malgo-1-w4x6n_1  |[0m [?25l  Downloading https://files.pythonhosted.org/packages/d0/99/f8166f0baac96f2ea4d634749331bf78d8830db9991fc8086e828dca8959/streamlit-0.47.4-py2.py3-none-any.whl (4.9MB)
[K    100% |████████████████████████████████| 4.9MB 8.6MB/s eta 0:00:01
[36malgo-1-w4x6n_1  |[0m [?25hCollecting tensorboard==1.13.1 (from -r requirements.txt (line 115))
[36malgo-1-w4x6n_1  |[0m [?25l  Downloading https://files.pythonhosted.org/packages/0

[36malgo-1-w4x6n_1  |[0m [?25l  Downloading https://files.pythonhosted.org/packages/16/1c/d9e4d1e4eb9777ae675c5ac01290e70012498944d5e743bd2777d1096ad7/zope.interface-4.7.1-cp36-cp36m-manylinux1_x86_64.whl (168kB)
[K    100% |████████████████████████████████| 174kB 16.6MB/s ta 0:00:01
[36malgo-1-w4x6n_1  |[0m [?25hCollecting python3-Xlib (from MouseInfo==0.0.4->-r requirements.txt (line 67))
[36malgo-1-w4x6n_1  |[0m [?25l  Downloading https://files.pythonhosted.org/packages/ef/c6/2c5999de3bb1533521f1101e8fe56fd9c266732f4d48011c7c69b29d12ae/python3-xlib-0.15.tar.gz (132kB)
[K    100% |████████████████████████████████| 133kB 15.1MB/s ta 0:00:01
[36malgo-1-w4x6n_1  |[0m Building wheels for collected packages: aws-kinesis-agg, awsebcli, backcall, blinker, cement, dockerpty, docopt, enum-compat, fsspec, future, geographiclib, googletrans, gTTS-token, langdetect, MouseInfo, pathspec, pathtools, psutil, PyAutoGUI, PyGetWindow, Pympler, PyMsgBox, pyperclip, PyRect, PyScreeze, pytho

[36malgo-1-w4x6n_1  |[0m   Running setup.py bdist_wheel for Theano ... [?25ldone
[36malgo-1-w4x6n_1  |[0m [?25h  Stored in directory: /root/.cache/pip/wheels/88/fb/be/483910ff7e9f703f30a10605ad7605f3316493875c86637014
[36malgo-1-w4x6n_1  |[0m   Running setup.py bdist_wheel for timeloop ... [?25ldone
[36malgo-1-w4x6n_1  |[0m [?25h  Stored in directory: /root/.cache/pip/wheels/70/e7/0e/d125f034638a6f46b47b419e51b9c5885f23c054072579d791
[36malgo-1-w4x6n_1  |[0m   Running setup.py bdist_wheel for toolz ... [?25ldone
[36malgo-1-w4x6n_1  |[0m [?25h  Stored in directory: /root/.cache/pip/wheels/e1/8b/65/3294e5b727440250bda09e8c0153b7ba19d328f661605cb151
[36malgo-1-w4x6n_1  |[0m   Running setup.py bdist_wheel for tornado ... [?25ldone
[36malgo-1-w4x6n_1  |[0m [?25h  Stored in directory: /root/.cache/pip/wheels/6d/e1/ce/f4ee2fa420cc6b940123c64992b81047816d0a9fad6b879325
[36malgo-1-w4x6n_1  |[0m   Running setup.py bdist_wheel for validators ... [?25ldone
[36malgo-1-w4

[36malgo-1-w4x6n_1  |[0m       Successfully uninstalled Werkzeug-0.15.4
[36malgo-1-w4x6n_1  |[0m   Found existing installation: tensorflow 1.12.0
[36malgo-1-w4x6n_1  |[0m     Uninstalling tensorflow-1.12.0:
[36malgo-1-w4x6n_1  |[0m       Successfully uninstalled tensorflow-1.12.0
[36malgo-1-w4x6n_1  |[0m Successfully installed DateTime-4.3 Jinja2-2.10.3 Markdown-3.1 MouseInfo-0.0.4 Pillow-6.1.0 PyAutoGUI-0.9.47 PyGetWindow-0.0.7 PyMsgBox-1.0.7 PyRect-0.1.4 PyScreeze-0.1.22 PyTweening-1.0.3 PyYAML-5.1.2 Pygments-2.3.0 Pympler-0.7 Theano-1.0.4 Werkzeug-0.15.2 altair-3.2.0 appnope-0.1.0 argh-0.26.2 astor-0.7.1 aws-kinesis-agg-1.1.0 awsebcli-3.15.2 backcall-0.1.0 base58-1.0.3 beautifulsoup4-4.7.1 blessed-1.15.0 blinker-1.4 boto3-1.9.252 botocore-1.12.252 cached-property-1.5.1 cement-2.8.2 certifi-2019.6.16 chardet-3.0.4 cloudpickle-1.2.2 colorama-0.3.9 cycler-0.10.0 dask-1.0.0 decorator-4.3.0 docker-3.7.3 docker-compose-1.23.2 docker-pycreds-0.4.0 dockerpty-0.4.1 docopt-0.6.2 ela

[36malgo-1-w4x6n_1  |[0m Using TensorFlow backend.
[36malgo-1-w4x6n_1  |[0m here: 1
[36malgo-1-w4x6n_1  |[0m here: 2
[36malgo-1-w4x6n_1  |[0m here: 3
[36malgo-1-w4x6n_1  |[0m Traceback (most recent call last):
[36malgo-1-w4x6n_1  |[0m   File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
[36malgo-1-w4x6n_1  |[0m     "__main__", mod_spec)
[36malgo-1-w4x6n_1  |[0m   File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
[36malgo-1-w4x6n_1  |[0m     exec(code, run_globals)
[36malgo-1-w4x6n_1  |[0m   File "/opt/ml/code/program.py", line 553, in <module>
[36malgo-1-w4x6n_1  |[0m     main()
[36malgo-1-w4x6n_1  |[0m   File "/opt/ml/code/program.py", line 542, in main
[36malgo-1-w4x6n_1  |[0m     l.tokenization()
[36malgo-1-w4x6n_1  |[0m   File "/opt/ml/code/program.py", line 300, in tokenization
[36malgo-1-w4x6n_1  |[0m     f.write(json.dumps(self.tokenizer.to_json(), ensure_ascii=False))
[36malgo-1-w4x6n_1  |[0m   File "/usr/local/lib/pytho

RuntimeError: Failed to run: ['docker-compose', '-f', '/tmp/tmp7dvg41bc/docker-compose.yaml', 'up', '--build', '--abort-on-container-exit'], Process exited with code: 1

## Hyperparameter Tuning

In [None]:
hyperparameter_ranges = {
    'epochs':        IntegerParameter(20, 100),
    'learning-rate': ContinuousParameter(0.001, 0.1, scaling_type='Logarithmic'), 
    'batch-size':    IntegerParameter(32, 1024),
    'dense-layer':   IntegerParameter(128, 1024),
    'dropout':       ContinuousParameter(0.2, 0.6)
}

objective_metric_name = 'val_acc'
objective_type        = 'Maximize'
metric_definitions    = [{'Name': 'val_acc', 'Regex': 'val_acc: ([0-9\\.]+)'}]

tuner = HyperparameterTuner(tf_estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=10,
                            max_parallel_jobs=2,
                            objective_type=objective_type)

---
# Endpoint
---

Load the model from its artifacts stored on S3 and use this to deploy an endpoint:

In [None]:
# load the model from its artifacts on S3
model = Model(model_data=model_artifacts, role=role)

# deploy an endpoint
predictor = model.deploy(initial_instance_count=1, 
                         instance_type='ml.t2.medium',
                         endpoint_name=endpoint_name)

Delete the endpoint after it is no longer needed:

In [None]:
predictor.delete_endpoint()