# The Stanford Sentiment Treebank 
The Stanford Sentiment Treebank consists of sentences from movie reviews and human annotations of their sentiment. The task is to predict the sentiment of a given sentence. We use the two-way (positive/negative) class split, and use only sentence-level labels.

In [1]:
from IPython.display import display, Markdown
with open('../../doc/env_variables_setup.md', 'r') as fh:
    content = fh.read()
display(Markdown(content))

Environment variables that need to be defined:   
`export DIR_PROJ=your_path_git_repository`  
`export PYTHONPATH=$DIR_PROJ/src`  
`export PATH_TENSORBOARD=your_path_tensorboard`  
`export PATH_DATASETS=your_path_datasets`  
`export PROJECT_ID=your_gcp_project_id`  
`export BUCKET_NAME=your_gcp_gs_bucket_name`  
`export BUCKET_TRANSLATION_NAME=your_gcp_gs_bucket_translation_name`  
`export REGION=your_region`  
`export PATH_SAVE_MODEL=your_path_to_save_model`  
`export CLOUDSDK_PYTHON=your_path/conda-env/env_gcp_sdk/bin/python`  
`export CLOUDSDK_GSUTIL_PYTHON=your_path/conda-env/env_gcp_sdk/bin/python`  

- Use local Jupyter Lab 
    - you need to have the `jupyter-notebook` Anaconda python environment created [link](local_jupyter_lab_installation.md) 
    - you need to have the `jupyter-notebook` Anaconda python environment activated [link](local_jupyter_lab_installation.md) 
    - then define the environment variables above (copy and paste) 
    - you need to have the `env_multilingual_class` Anaconda python environment created [link](local_jupyter_lab_installation.md)  
    - start Jupyter Lab:  `jupyter lab` 
    - open a Jupyter Lab notebook from `notebook/` 
     - clone this repositiory: `git clone https://github.com/tarrade/proj_multilingual_text_classification.git`
    - choose the proper Anaconda python environment:  `Python [conda env:env_multilingual_class]` [link](conda_env.md) 
    - clone this repositiory: `git clone https://github.com/tarrade/proj_multilingual_text_classification.git`


- Use GCP Jupyter Lab 
    - Go on GCP
    - open a Cloud Shell
    - `ssh-keygen -t rsa -b 4096 -C firstName_lastName`
    - `cp .ssh/id_rsa.pub .`
    - use Cloud Editor to edit this file `id_rsa.pub` and copy the full content
    - Go on Compute Engine -> Metadata
    - Click SSH Keys
    - Click Edit
    - Click + Add item, copy the content of `id_rsa.pub`
    - You should see firstName_lastName of the left
    - Click Save
    - you need to start a AI Platform instance 
    - open a Jupyter Lab terminal and got to `/home/gcp_user_name/`
    - clone this repositiory: `git clone https://github.com/tarrade/proj_multilingual_text_classification.git`
    - then `cd proj_multilingual_text_classification/`
    - create the Anacond Python environment `conda env create -f env/environment.yml`
    - create a file `config.sh` in `/home` with the following information: 
    ```
    #!/bin/bash
    
    echo "applying some configuration ..."
    git config --global user.email user_email
    git config --global user.name user_name
    git config --global credential.helper store
        
    # Add here the enviroment variables from above below
    # [EDIT ME]
    export DIR_PROJ=your_path_git_repository
    export PYTHONPATH=$DIR_PROJ/src
  
    cd /home/gcp_user_name/
    
    conda activate env_multilingual_class

    export PS1='\[\e[91m\]\u@:\[\e[32m\]\w\[\e[0m\]$'
    ```
    - Got to AI Platform Notebook, select your instance and click "Reset".
    - Wait and reshreh you Web browser with the Notebook


## Import Packages

In [11]:
import tensorflow as tf
from transformers import (
    BertConfig,
    BertTokenizer,
    XLMRobertaTokenizer,
    TFBertModel,
    TFXLMRobertaModel,
)
import os
from datetime import datetime
import tensorflow_datasets
from tensorboard import notebook
import math

from googleapiclient import discovery
from googleapiclient import errors

## Check configuration

In [2]:
print(tf.version.GIT_VERSION, tf.version.VERSION)

v2.2.0-rc4-8-g2b96f3662b 2.2.0


In [3]:
print(tf.keras.__version__)

2.3.0-tf


In [4]:
gpus = tf.config.list_physical_devices('GPU')
if len(gpus)>0:
    for gpu in gpus:
        print('Name:', gpu.name, '  Type:', gpu.device_type)
else:
    print('No GPU available !!!!')

No GPU available !!!!


## Define Paths

In [5]:
try:
    data_dir=os.environ['PATH_DATASETS']
except KeyError:
    print('missing PATH_DATASETS')
try:   
    tensorboard_dir=os.environ['PATH_TENSORBOARD']
except KeyError:
    print('missing PATH_TENSORBOARD')
try:   
    savemodel_dir=os.environ['PATH_SAVE_MODEL']
except KeyError:
    print('missing PATH_SAVE_MODEL')

## Train the model on AI Platform Training (for production)

In [7]:
project_name = os.environ['PROJECT_ID']
project_id = 'projects/{}'.format(project_name)

In [10]:
ai_platform_training = discovery.build('ml', 'v1')




In [None]:
training_inputs = {
    'scaleTier': 'CUSTOM',
    'masterType': 'complex_model_m',
    'workerType': 'complex_model_m',
    'parameterServerType': 'large_model',
    'workerCount': 9,
    'parameterServerCount': 3,
    'packageUris': ['gs://my/trainer/path/package-0.0.0.tar.gz'],
    'pythonModule': 'trainer.task',
    'args': ['--arg1', 'value1', '--arg2', 'value2'],
    'region': 'us-central1',
    'jobDir': 'gs://my/training/job/directory',
    'runtimeVersion': '2.1',
    'pythonVersion': '3.7',
}

job_spec = {'jobId': 'my_job_name', 'trainingInput': training_inputs}


In [108]:
# train on GCP
model_name='tf_bert_classification'
os.environ['JOB_NAME'] = model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['RUNTIME_VERSION'] = '2.1'
os.environ['PYTHON_VERSION'] = '3.7'
os.environ['TRAINER_PACKAGE_PATH'] = os.environ['PYTHONPATH']+'/model'
os.environ['MAIN_TRAINER_MODULE'] = 'model.'+model_name+'.task'
os.environ['REGION'] = 'europe-west1'

os.environ['EPOCHS'] = '1' 
os.environ['STEPS_PER_EPOCH_TRAIN'] = '5' 
os.environ['BATCH_SIZE_TRAIN'] = '32' 
os.environ['STEPS_PER_EPOCH_EVAL'] = '1' 
os.environ['BATCH_SIZE_EVAL'] = '64'
os.environ['PACKAGE_STAGING_PATH'] = 'gs://'+os.environ['BUCKET_STAGING_NAME']
os.environ['INPUT_EVAL_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/valid'
os.environ['INPUT_TRAIN_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/train'
os.environ['OUTPUT_DIR'] = 'gs://'+os.environ['BUCKET_NAME']+'/training_model_gcp/'+model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['PRETRAINED_MODEL_DIR']= 'gs://'+os.environ['BUCKET_NAME']+'/pretrained_model/bert-base-multilingual-uncased'

In [109]:
%%bash
# Use Cloud Machine Learning Engine to train the model on GCP
gcloud ai-platform jobs submit training $JOB_NAME \
        --scale-tier basic \
        --python-version $PYTHON_VERSION \
        --runtime-version $RUNTIME_VERSION \
        --module-name=$MAIN_TRAINER_MODULE \
        --package-path=$TRAINER_PACKAGE_PATH \
        --staging-bucket=$PACKAGE_STAGING_PATH \
        --region=$REGION \
        -- \
        --epochs=$EPOCHS \
        --steps_per_epoch_train=$STEPS_PER_EPOCH_TRAIN \
        --batch_size_train=$BATCH_SIZE_TRAIN \
        --steps_per_epoch_eval=$STEPS_PER_EPOCH_EVAL \
        --batch_size_eval=$BATCH_SIZE_EVAL \
        --input_eval_tfrecords=$INPUT_EVAL_TFRECORDS \
        --input_train_tfrecords=$INPUT_TRAIN_TFRECORDS \
        --output_dir=$OUTPUT_DIR \
        --pretrained_model_dir=$PRETRAINED_MODEL_DIR \
        --verbosity_level='DEBUG' \
#        --stream-logs

jobId: tf_bert_classification_2020_05_12_100204
state: QUEUED


Job [tf_bert_classification_2020_05_12_100204] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe tf_bert_classification_2020_05_12_100204

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs tf_bert_classification_2020_05_12_100204


## Train the model on AI Platform Training with GPU (for production)

In [10]:
# reading metadata
_, info = tensorflow_datasets.load(name='glue/sst2',
                                   data_dir=data_dir,
                                   with_info=True)

INFO:absl:Overwrite dataset info from restored data version.
INFO:absl:Reusing dataset glue (/Users/tarrade/tensorflow_datasets/glue/sst2/1.0.0)
INFO:absl:Constructing tf.data.Dataset for split None, from /Users/tarrade/tensorflow_datasets/glue/sst2/1.0.0


In [11]:
# Maxium length, becarefull BERT max length is 512!
MAX_LENGTH = 128

# define parameters
BATCH_SIZE_TRAIN = 32
BATCH_SIZE_TEST = 32
BATCH_SIZE_VALID = 64
EPOCH = 2

# extract parameters
size_train_dataset=info.splits['train'].num_examples
size_test_dataset=info.splits['test'].num_examples
size_valid_dataset=info.splits['validation'].num_examples

# computer parameter
STEP_EPOCH_TRAIN = math.ceil(size_train_dataset/BATCH_SIZE_TRAIN)
STEP_EPOCH_TEST = math.ceil(size_test_dataset/BATCH_SIZE_TEST)
STEP_EPOCH_VALID = math.ceil(size_valid_dataset/BATCH_SIZE_VALID)


print('Dataset size:          {:6}/{:6}/{:6}'.format(size_train_dataset, size_test_dataset, size_valid_dataset))
print('Batch size:            {:6}/{:6}/{:6}'.format(BATCH_SIZE_TRAIN, BATCH_SIZE_TEST, BATCH_SIZE_VALID))
print('Step per epoch:        {:6}/{:6}/{:6}'.format(STEP_EPOCH_TRAIN, STEP_EPOCH_TEST, STEP_EPOCH_VALID))
print('Total number of batch: {:6}/{:6}/{:6}'.format(STEP_EPOCH_TRAIN*(EPOCH+1), STEP_EPOCH_TEST*(EPOCH+1), STEP_EPOCH_VALID*1))

Dataset size:           67349/  1821/   872
Batch size:                32/    32/    64
Step per epoch:          2105/    57/    14
Total number of batch:   6315/   171/    14


In [12]:
# train on GCP
model_name='tf_bert_classification'
os.environ['JOB_NAME'] = model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['RUNTIME_VERSION'] = '2.1'
os.environ['PYTHON_VERSION'] = '3.7'
os.environ['TRAINER_PACKAGE_PATH'] = os.environ['PYTHONPATH']+'/model'
os.environ['MAIN_TRAINER_MODULE'] = 'model.'+model_name+'.task'
os.environ['REGION'] = 'europe-west1'
os.environ['CONFIG']= os.environ['DIR_PROJ'].split('src')[0]+'deployment/training/tf_bert_classification/custom.yaml'

os.environ['EPOCHS'] = str(EPOCH)
os.environ['STEPS_PER_EPOCH_TRAIN'] = str(STEP_EPOCH_TRAIN)
os.environ['BATCH_SIZE_TRAIN'] = str(BATCH_SIZE_TRAIN)
os.environ['STEPS_PER_EPOCH_EVAL'] = str(STEP_EPOCH_VALID) 
os.environ['BATCH_SIZE_EVAL'] = str(BATCH_SIZE_VALID)
os.environ['PACKAGE_STAGING_PATH'] = 'gs://'+os.environ['BUCKET_STAGING_NAME']
os.environ['INPUT_EVAL_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/valid'
os.environ['INPUT_TRAIN_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/train'
os.environ['OUTPUT_DIR'] = 'gs://'+os.environ['BUCKET_NAME']+'/training_model_gcp/'+model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['PRETRAINED_MODEL_DIR']= 'gs://'+os.environ['BUCKET_NAME']+'/pretrained_model/bert-base-multilingual-uncased'

In [13]:
print('Number of epoch:        {:6}'.format(os.environ['EPOCHS']))
print('Batch size:            {:6}/{:6}'.format(os.environ['BATCH_SIZE_TRAIN'], os.environ['BATCH_SIZE_EVAL']))
print('Step per epoch:        {:6}/{:6}'.format(os.environ['STEPS_PER_EPOCH_TRAIN'], os.environ['STEPS_PER_EPOCH_EVAL']))

Number of epoch:        2     
Batch size:            32    /64    
Step per epoch:        2105  /14    


In [49]:
%%bash
# Use Cloud Machine Learning Engine to train the model on GCP
gcloud ai-platform jobs submit training $JOB_NAME \
        --config $CONFIG \
        --module-name=$MAIN_TRAINER_MODULE \
        --package-path=$TRAINER_PACKAGE_PATH \
        --staging-bucket=$PACKAGE_STAGING_PATH \
        --region=$REGION \
        -- \
        --epochs=$EPOCHS \
        --steps_per_epoch_train=$STEPS_PER_EPOCH_TRAIN \
        --batch_size_train=$BATCH_SIZE_TRAIN \
        --steps_per_epoch_eval=$STEPS_PER_EPOCH_EVAL \
        --batch_size_eval=$BATCH_SIZE_EVAL \
        --input_eval_tfrecords=$INPUT_EVAL_TFRECORDS \
        --input_train_tfrecords=$INPUT_TRAIN_TFRECORDS \
        --output_dir=$OUTPUT_DIR \
        --pretrained_model_dir=$PRETRAINED_MODEL_DIR \
        --verbosity_level='INFO'

jobId: tf_bert_classification_2020_05_06_115937
state: QUEUED


Job [tf_bert_classification_2020_05_06_115937] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe tf_bert_classification_2020_05_06_115937

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs tf_bert_classification_2020_05_06_115937


## Train the model on AI Platform Training using a config file (for production)

In [8]:
model_name='tf_bert_classification'
os.environ['JOB_NAME'] = model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['TRAINER_PACKAGE_PATH'] = os.environ['PYTHONPATH']+'/model'
os.environ['MAIN_TRAINER_MODULE'] = 'model.'+model_name+'.task'
os.environ['CONFIG']= os.environ['DIR_PROJ'].split('src')[0]+'deployment/training/tf_bert_classification/standard.yaml' #or custom.yaml or standard.yaml
os.environ['EPOCHS'] = '2' 
os.environ['STEPS_PER_EPOCH_TRAIN'] = '5' 
os.environ['BATCH_SIZE_TRAIN'] = '32' 
os.environ['STEPS_PER_EPOCH_EVAL'] = '1' 
os.environ['BATCH_SIZE_EVAL'] = '64'
os.environ['PACKAGE_STAGING_PATH'] = 'gs://'+os.environ['BUCKET_STAGING_NAME']
os.environ['INPUT_EVAL_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/valid'
os.environ['INPUT_TRAIN_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/train'
os.environ['OUTPUT_DIR'] = 'gs://'+os.environ['BUCKET_NAME']+'/training_model_gcp/'+model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['PRETRAINED_MODEL_DIR']= 'gs://'+os.environ['BUCKET_NAME']+'/pretrained_model/bert-base-multilingual-uncased'

In [83]:
%%bash
# Use Cloud Machine Learning Engine to train the model on GCP
gcloud ai-platform jobs submit training $JOB_NAME \
        --module-name=$MAIN_TRAINER_MODULE \
        --package-path=$TRAINER_PACKAGE_PATH \
        --staging-bucket=$PACKAGE_STAGING_PATH \
        --config $CONFIG \
        -- \
        --epochs=$EPOCHS \
        --steps_per_epoch_train=$STEPS_PER_EPOCH_TRAIN \
        --batch_size_train=$BATCH_SIZE_TRAIN \
        --steps_per_epoch_eval=$STEPS_PER_EPOCH_EVAL \
        --batch_size_eval=$BATCH_SIZE_EVAL \
        --input_eval_tfrecords=$INPUT_EVAL_TFRECORDS \
        --input_train_tfrecords=$INPUT_TRAIN_TFRECORDS \
        --output_dir=$OUTPUT_DIR \
        --pretrained_model_dir=$PRETRAINED_MODEL_DIR \
        --verbosity_level='DEBUG'

jobId: tf_bert_classification_2020_05_02_151451
state: QUEUED


Job [tf_bert_classification_2020_05_02_151451] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe tf_bert_classification_2020_05_02_151451

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs tf_bert_classification_2020_05_02_151451


## Train the model on AI Platform Training using TPU (for production)

In [61]:
# train on GCP
model_name='tf_bert_classification'
os.environ['JOB_NAME'] = model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['RUNTIME_VERSION'] = '2.1'
os.environ['PYTHON_VERSION'] = '3.7'
os.environ['TRAINER_PACKAGE_PATH'] = os.environ['PYTHONPATH']+'/model'
os.environ['MAIN_TRAINER_MODULE'] = 'model.'+model_name+'.task'
os.environ['REGION_TPU'] = 'europe-west4' #'us-central1' #'europe-west4'

os.environ['USE_TPU'] = 'True' 
os.environ['EPOCHS'] = '1' 
os.environ['STEPS_PER_EPOCH_TRAIN'] = '1' 
os.environ['BATCH_SIZE_TRAIN'] = '32' 
os.environ['STEPS_PER_EPOCH_EVAL'] = '1' 
os.environ['BATCH_SIZE_EVAL'] = '64'
os.environ['PACKAGE_STAGING_PATH'] = 'gs://'+os.environ['BUCKET_STAGING_NAME']
os.environ['INPUT_EVAL_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/valid'
os.environ['INPUT_TRAIN_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/train'
os.environ['OUTPUT_DIR'] = 'gs://'+os.environ['BUCKET_NAME']+'/training_model_gcp/'+model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['PRETRAINED_MODEL_DIR']= 'gs://'+os.environ['BUCKET_NAME']+'/pretrained_model/bert-base-multilingual-uncased'

In [62]:
%%bash
# Use Cloud Machine Learning Engine to train the model on GCP
gcloud ai-platform jobs submit training $JOB_NAME \
        --scale-tier basic_tpu\
        --python-version $PYTHON_VERSION \
        --runtime-version $RUNTIME_VERSION \
        --module-name=$MAIN_TRAINER_MODULE \
        --package-path=$TRAINER_PACKAGE_PATH \
        --staging-bucket=$PACKAGE_STAGING_PATH \
        --region=$REGION_TPU \
        -- \
        --use_tpu=$USE_TPU \
        --epochs=$EPOCHS \
        --steps_per_epoch_train=$STEPS_PER_EPOCH_TRAIN \
        --batch_size_train=$BATCH_SIZE_TRAIN \
        --steps_per_epoch_eval=$STEPS_PER_EPOCH_EVAL \
        --batch_size_eval=$BATCH_SIZE_EVAL \
        --input_eval_tfrecords=$INPUT_EVAL_TFRECORDS \
        --input_train_tfrecords=$INPUT_TRAIN_TFRECORDS \
        --output_dir=$OUTPUT_DIR \
        --pretrained_model_dir=$PRETRAINED_MODEL_DIR \
        --verbosity_level='DEBUG' \
#        --stream-logsgithub

ERROR: (gcloud.ai-platform.jobs.submit.training) RESOURCE_EXHAUSTED: Field: region Error: No zone in region europe-west4 has accelerators of all requested types.
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: No zone in region europe-west4 has accelerators of all requested
      types.
    field: region


CalledProcessError: Command 'b"# Use Cloud Machine Learning Engine to train the model on GCP\ngcloud ai-platform jobs submit training $JOB_NAME \\\n        --scale-tier basic_tpu\\\n        --python-version $PYTHON_VERSION \\\n        --runtime-version $RUNTIME_VERSION \\\n        --module-name=$MAIN_TRAINER_MODULE \\\n        --package-path=$TRAINER_PACKAGE_PATH \\\n        --staging-bucket=$PACKAGE_STAGING_PATH \\\n        --region=$REGION_TPU \\\n        -- \\\n        --use_tpu=$USE_TPU \\\n        --epochs=$EPOCHS \\\n        --steps_per_epoch_train=$STEPS_PER_EPOCH_TRAIN \\\n        --batch_size_train=$BATCH_SIZE_TRAIN \\\n        --steps_per_epoch_eval=$STEPS_PER_EPOCH_EVAL \\\n        --batch_size_eval=$BATCH_SIZE_EVAL \\\n        --input_eval_tfrecords=$INPUT_EVAL_TFRECORDS \\\n        --input_train_tfrecords=$INPUT_TRAIN_TFRECORDS \\\n        --output_dir=$OUTPUT_DIR \\\n        --pretrained_model_dir=$PRETRAINED_MODEL_DIR \\\n        --verbosity_level='DEBUG' \\\n#        --stream-logsgithub\n"' returned non-zero exit status 1.

## Hyperparameter tunning of the model on AI Platform Training using a config file (for production)

In [220]:
model_name='tf_bert_classification'
os.environ['JOB_NAME'] = model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['TRAINER_PACKAGE_PATH'] = os.environ['PYTHONPATH']+'/model'
os.environ['MAIN_TRAINER_MODULE'] = 'model.'+model_name+'.task'
os.environ['CONFIG']= os.environ['DIR_PROJ'].split('src')[0]+'deployment/hp-tuning/tf_bert_classification/hyperparam.yaml'
os.environ['EPOCHS'] = '1' 
os.environ['STEPS_PER_EPOCH_TRAIN'] = '5' 
os.environ['BATCH_SIZE_TRAIN'] = '32' 
os.environ['STEPS_PER_EPOCH_EVAL'] = '1' 
os.environ['BATCH_SIZE_EVAL'] = '64'
os.environ['PACKAGE_STAGING_PATH'] = 'gs://'+os.environ['BUCKET_STAGING_NAME']
os.environ['INPUT_EVAL_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/valid'
os.environ['INPUT_TRAIN_TFRECORDS'] = 'gs://'+os.environ['BUCKET_NAME']+'/tfrecord/sst2/bert-base-multilingual-uncased/train'
os.environ['OUTPUT_DIR'] = 'gs://'+os.environ['BUCKET_NAME']+'/training_model_gcp/'+model_name+'_'+datetime.now().strftime("%Y_%m_%d_%H%M%S")
os.environ['PRETRAINED_MODEL_DIR']= 'gs://'+os.environ['BUCKET_NAME']+'/pretrained_model/bert-base-multilingual-uncased'

In [221]:
%%bash
# Use Cloud Machine Learning Engine to train the model on GCP
gcloud ai-platform jobs submit training $JOB_NAME \
        --module-name=$MAIN_TRAINER_MODULE \
        --package-path=$TRAINER_PACKAGE_PATH \
        --staging-bucket=$PACKAGE_STAGING_PATH \
        --config $CONFIG \
        -- \
        --epochs=$EPOCHS \
        --steps_per_epoch_train=$STEPS_PER_EPOCH_TRAIN \
        --batch_size_train=$BATCH_SIZE_TRAIN \
        --steps_per_epoch_eval=$STEPS_PER_EPOCH_EVAL \
        --batch_size_eval=$BATCH_SIZE_EVAL \
        --input_eval_tfrecords=$INPUT_EVAL_TFRECORDS \
        --input_train_tfrecords=$INPUT_TRAIN_TFRECORDS \
        --output_dir=$OUTPUT_DIR \
        --pretrained_model_dir=$PRETRAINED_MODEL_DIR \
        --verbosity_level='INFO'

jobId: tf_bert_classification_2020_05_12_200340
state: QUEUED


Job [tf_bert_classification_2020_05_12_200340] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe tf_bert_classification_2020_05_12_200340

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs tf_bert_classification_2020_05_12_200340


# TensorBoard for job running on GCP

In [113]:
# View open TensorBoard instance
#notebook.list() 

In [224]:
# View pid
!ps -ef|grep tensorboard

  501 31125     1   0 Sun03PM ??        13:49.70 /Users/tarrade/anaconda-release/conda-env/env_multilingual_class/bin/python /Users/tarrade/anaconda-release/conda-env/env_multilingual_class/bin/tensorboard --logdir gs://multilingual_text_classification/training_model_gcp/tf_bert_classification_2020_05_10_150723/tensorboard
  501 62245 43999   0  8:27PM ??         3:36.47 /Users/tarrade/anaconda-release/conda-env/env_multilingual_class/bin/python /Users/tarrade/anaconda-release/conda-env/env_multilingual_class/bin/tensorboard --logdir gs://multilingual_text_classification/training_model_gcp/tf_bert_classification_vera_2020_05_08_063801/tensorboard/ --reload_multifile True
  501 62318 43999   0  8:29PM ??         3:55.12 /Users/tarrade/anaconda-release/conda-env/env_multilingual_class/bin/python /Users/tarrade/anaconda-release/conda-env/env_multilingual_class/bin/tensorboard --logdir gs://multilingual_text_classification/training_model_gcp/tf_bert_classification_vera_2020_05_08_063801/te

In [225]:
# Killed Tensorboard process by using pid
#!kill -9 pid

In [222]:
%load_ext tensorboard
#%reload_ext tensorboard
%tensorboard  --logdir {os.environ['OUTPUT_DIR']+'/tensorboard'} \
              #--host 0.0.0.0 \
              #--port 6006 \
              #--debugger_port 6006

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


In [226]:
%load_ext tensorboard
#%reload_ext tensorboard
%tensorboard  --logdir {os.environ['OUTPUT_DIR']+'/hparams_tuning'} \
              #--host 0.0.0.0 \
              #--port 6006 \
              #--debugger_port 6006

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
