# Using TensorFlow Scripts in SageMaker - Quickstart

Starting with TensorFlow version 1.11, you can use SageMaker's TensorFlow containers to train TensorFlow scripts the same way you would train outside SageMaker. This feature is named **Script Mode**. 

This example uses 
[Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow](https://github.com/sherjilozair/char-rnn-tensorflow). 
You can use the same technique for other scripts or repositories, including 
[TensorFlow Model Zoo](https://github.com/tensorflow/models) and 
[TensorFlow benchmark scripts](https://github.com/tensorflow/benchmarks/tree/master/scripts/tf_cnn_benchmarks).

## Test locally using SageMaker Python SDK TensorFlow Estimator

You can use the SageMaker Python SDK [`TensorFlow`](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/README.rst#training-with-tensorflow) estimator to easily train locally and in SageMaker. 

Let's start by setting the training script arguments `--num_epochs` and `--data_dir` as hyperparameters. Remember that we don't need to provide `--model_dir`:

In [1]:
from sagemaker import get_execution_role

role = get_execution_role()

In [31]:
hyperparameters = {'train_steps': 10}

This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments. Just change your estimator's train_instance_type to local or local_gpu. For more information, see: https://github.com/aws/sagemaker-python-sdk#local-mode.

In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU). Running following script will install docker-compose or nvidia-docker-compose and configure the notebook environment for you.

Note, you can only run a single local notebook at a time.

In [25]:
!/bin/bash ./utils/setup.sh

The user has root access.
nvidia-docker2 already installed. We are good to go!
SageMaker instance route table setup is ok. We are good to go.
SageMaker instance routing for Docker is ok. We are good to go!


To train locally, you set `train_instance_type` to [local](https://github.com/aws/sagemaker-python-sdk#local-mode):

In [26]:
import subprocess

train_instance_type='local'

if subprocess.call('nvidia-smi') == 0:
    ## Set type to GPU if one is present
    train_instance_type = 'local_gpu'
    
print("Train instance type = " + train_instance_type)

Train instance type = local_gpu


We create the `TensorFlow` Estimator, passing the `git_config` argument and the flag `script_mode=True`. Note that we are using Git integration here, so `source_dir` should be a relative path inside the Git repo; otherwise it should be a relative or absolute local path. the `Tensorflow` Estimator is created as following: 


In [35]:
import os

import sagemaker
from sagemaker.tensorflow import TensorFlow


estimator = TensorFlow(entry_point='train.py',
                       source_dir='.',
                       train_instance_type=train_instance_type,
                       train_instance_count=1,
                       hyperparameters=hyperparameters,
                       role=role,
                       framework_version='2.2.0',
                       py_version='py37',
                       script_mode=True,
                       model_dir='/opt/ml/model')

train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_count has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


To start a training job, we call `estimator.fit(inputs)`, where inputs is a dictionary where the keys, named **channels**, 
have values pointing to the data location. `estimator.fit(inputs)` downloads the TensorFlow container with TensorFlow Python 3, CPU version, locally and simulates a SageMaker training job. 
When training starts, the TensorFlow container executes **train.py**, passing `hyperparameters` and `model_dir` as script arguments, executing the example as follows:
```bash
python -m train --num-epochs 1 --data_dir /opt/ml/input/data/training --model_dir /opt/ml/model
```


In [37]:
inputs = {'training': f'file:///home/ec2-user/SageMaker/deepctr_sagemaker/data/'}

estimator.fit(inputs)

Creating tmpzojpgd0s_algo-1-m28iz_1 ... 
[1BAttaching to tmpzojpgd0s_algo-1-m28iz_12mdone[0m
[36malgo-1-m28iz_1  |[0m 2021-01-09 18:10:24,352 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training
[36malgo-1-m28iz_1  |[0m 2021-01-09 18:10:24,563 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:
[36malgo-1-m28iz_1  |[0m /usr/local/bin/python3.7 -m pip install -r requirements.txt
[36malgo-1-m28iz_1  |[0m Collecting deepctr[gpu]
[36malgo-1-m28iz_1  |[0m   Downloading deepctr-0.8.3-py3-none-any.whl (114 kB)
[K     |████████████████████████████████| 114 kB 14.9 MB/s eta 0:00:01
[36malgo-1-m28iz_1  |[0m Installing collected packages: deepctr
[36malgo-1-m28iz_1  |[0m Successfully installed deepctr-0.8.3
[36malgo-1-m28iz_1  |[0m 2021-01-09 18:10:25,935 sagemaker-training-toolkit INFO     Invoking user script
[36malgo-1-m28iz_1  |[0m 
[36malgo-1-m28iz_1  |[0m Training Env:
[36malgo-1-m28iz_1  |[0m 

Let's explain the values of `--data_dir` and `--model_dir` with more details:

- **/opt/ml/input/data/training** is the directory inside the container where the training data is downloaded. The data is downloaded to this folder because `training` is the channel name defined in ```estimator.fit({'training': inputs})```. See [training data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-running-container-trainingdata) for more information. 

- **/opt/ml/model** use this directory to save models, checkpoints, or any other data. Any data saved in this folder is saved in the S3 bucket defined for training. See [model data](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html#your-algorithms-training-algo-envvariables) for more information.

### Reading additional information from the container

Often, a user script needs additional information from the container that is not available in ```hyperparameters```.
SageMaker containers write this information as **environment variables** that are available inside the script.

For example, the example above can read information about the `training` channel provided in the training job request by adding the environment variable `SM_CHANNEL_TRAINING` as the default value for the `--data_dir` argument:

```python
if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  # reads input channels training and testing from the environment variables
  parser.add_argument('--data_dir', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
```

Script mode displays the list of available environment variables in the training logs. You can find the [entire list here](https://github.com/aws/sagemaker-containers/blob/master/README.rst#list-of-provided-environment-variables-by-sagemaker-containers).

# Training in SageMaker

After you test the training job locally, upload the dataset to an S3 bucket so SageMaker can access the data during training:

In [39]:
import sagemaker

inputs = sagemaker.Session().upload_data(path='/home/ec2-user/SageMaker/deepctr_sagemaker/data', key_prefix='DEMO-tensorflow-deepctr')
print(inputs)

s3://sagemaker-us-east-1-579019700964/DEMO-tensorflow-deepctr


The returned variable inputs above is a string with a S3 location which SageMaker Tranining has permissions
to read data from.

To train in SageMaker:
- change the estimator argument `train_instance_type` to any SageMaker ml instance available for training.
- set the `training` channel to a S3 location.

In [42]:
estimator = TensorFlow(entry_point='train.py',
                       source_dir='.',
                       train_instance_type='ml.p3.8xlarge', # Executes training in a ml.p2.xlarge/ml.p3.2xlarge/ml.p3.8xlarge instance
                       train_instance_count=1,
                       hyperparameters=hyperparameters,
                       role=role,
                       framework_version='2.2.0',
                       py_version='py37',
                       script_mode=True,
                       model_dir='/opt/ml/model')

estimator.fit({'training': inputs})

train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_count has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


2021-01-09 18:29:56 Starting - Starting the training job...
2021-01-09 18:30:19 Starting - Launching requested ML instancesProfilerReport-1610216996: InProgress
.........
2021-01-09 18:31:40 Starting - Preparing the instances for training......
2021-01-09 18:32:52 Downloading - Downloading input data
2021-01-09 18:32:52 Training - Downloading the training image.........
2021-01-09 18:34:22 Training - Training image download completed. Training in progress..[34m2021-01-09 18:34:21,660 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2021-01-09 18:34:22,084 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:[0m
[34m/usr/local/bin/python3.7 -m pip install -r requirements.txt[0m
[34mCollecting deepctr[gpu]
  Downloading deepctr-0.8.3-py3-none-any.whl (114 kB)[0m
[34mInstalling collected packages: deepctr[0m
[34mSuccessfully installed deepctr-0.8.3[0m
[34m2021-01-09 18:34:23,775 sagemaker-traini

## Git Support

In [44]:
git_config = {'repo': 'https://github.com/whn09/deepctr_sagemaker.git', 'branch': 'main'}

estimator = TensorFlow(entry_point='train.py',
                       source_dir='.',
                       git_config=git_config,
                       train_instance_type='ml.p3.2xlarge', # Executes training in a ml.p2.xlarge instance
                       train_instance_count=1,
                       hyperparameters=hyperparameters,
                       role=role,
                       framework_version='2.2.0',
                       py_version='py37',
                       script_mode=True,
                       model_dir='/opt/ml/model')

estimator.fit({'training': inputs})

train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_count has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


2021-01-09 18:38:55 Starting - Starting the training job...
2021-01-09 18:39:19 Starting - Launching requested ML instancesProfilerReport-1610217534: InProgress
......
2021-01-09 18:40:25 Starting - Preparing the instances for training.........
2021-01-09 18:41:49 Downloading - Downloading input data
2021-01-09 18:41:49 Training - Downloading the training image.........
2021-01-09 18:43:22 Training - Training image download completed. Training in progress..[34m2021-01-09 18:43:20,787 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2021-01-09 18:43:21,167 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:[0m
[34m/usr/local/bin/python3.7 -m pip install -r requirements.txt[0m
[34mCollecting deepctr[gpu]
  Downloading deepctr-0.8.3-py3-none-any.whl (114 kB)[0m
[34mInstalling collected packages: deepctr[0m
[34mSuccessfully installed deepctr-0.8.3[0m
[34m2021-01-09 18:43:22,774 sagemaker-traini