# Conv. neuronal network sentence classifier notebook
In this notebook we will attemp to build a sentence classifier model based on https://arxiv.org/pdf/1408.5882.pdf

## 00. Packages setup

We start by installing a set of requerired packages, we need to override some exiting package installations (e.g. fairing 0.5)

In [None]:
import os
import logging
import site
from pathlib import Path
import sys

In [None]:
home = str(Path.home())
local_py_path = os.path.join(home, ".local/lib/python3.6/site-packages")
if local_py_path not in sys.path:
    logging.info("Adding %s to python path", local_py_path)
    sys.path.insert(0, local_py_path)
site.getsitepackages()    

In [None]:
if not os.getenv("GOOGLE_APPLICATION_CREDENTIALS"):
    raise ValueError("Notebook is missing google application credentials")
else:
    print('GCP Credentials OK')

In [None]:
!pip install --user --upgrade 
!pip install --user pandas
!pip install --user tensorflow
!pip install --user keras
!pip install --user numpy
!pip install --user gcsfs
!pip install --user google-cloud-storage
!pip install --user gensim
!pip install --user kubeflow

We install a recent fairing commit, we hit a couple of bugs with the released one: https://github.com/kubeflow/kubeflow/issues/3643 

In [None]:
!pip install --user git+git://github.com/kubeflow/fairing.git@dc61c4c88f233edaf22b13bbfb184ded0ed877a4

## 01.Data preparation

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
import gensim.models.keyedvectors as word2vec
from gensim.models import Word2Vec

We will start by loading train and test data files from Google Cloud Storage. We will using the Wikipedia Movies Plot Dataset (https://www.kaggle.com/jrobischon/wikipedia-movie-plots). This dataset features plot summary descriptions scraped from Wikipedia.

In [None]:
train_data_path = 'data/wiki_movie_plots_deduped.csv'
test_data_path = 'data/wiki_movie_plots_deduped_test.csv'
gcp_bucket = 'velascoluis-test'
column_target_value = 'Genre'
column_text_value = 'Plot'

We drop rows with missing data and deduplicate rows

In [None]:
train_data_load = pd.read_csv("gs://" + gcp_bucket + "/" + train_data_path, sep=',')
test_data_load = pd.read_csv("gs://" + gcp_bucket + "/" + test_data_path, sep=',')
train_data = train_data_load.dropna().drop_duplicates(subset=column_text_value, keep='first', inplace=False)
test_data = test_data_load.dropna().drop_duplicates(subset=column_text_value, keep='first', inplace=False)

Lets have a glimpse of the data

In [None]:
train_data.head()

We will focus on two columns: Plot and Genre. The algorithm goal will be to infer the movie genre based on the plot.
Next step is to drop rows with unknown genre

In [None]:
train_data = train_data[train_data.Genre != 'unknown']

In [None]:
train_data.head()

We will exploring the histogram of genres distribution

In [None]:
plt.hist(train_data[column_target_value], color = 'blue', edgecolor = 'black')
plt.title('Histogram of movies by genre')
plt.xlabel('Genre')
plt.ylabel('Movies')

The data is severely swekedm and we have a long tail of genres.

In [None]:
train_data[column_target_value].value_counts()

We will focus only on the genres featuring at least 900 observations

In [None]:
train_data = train_data.groupby(column_target_value).filter(lambda x : len(x)>900)

In [None]:
train_data[column_target_value].value_counts()

In order to balance the data, we will randomly trim data from the drama and comeny genres

In [None]:
train_data = train_data.drop(((train_data[train_data[column_target_value] == 'drama' ]).sample(frac=.8,random_state=200))).index)
train_data = train_data.drop(((train_data[train_data[column_target_value] == 'comedy' ]).sample(frac=.75,random_state=200))).index)

In [None]:
train_data[column_target_value].value_counts()

In [None]:
plt.hist(train_data[column_target_value], color = 'blue', edgecolor = 'black')
plt.title('Histogram of movies by genre')
plt.xlabel('Genre')
plt.ylabel('Movies')

In [None]:
classifier_values = train_data[column_target_value].unique()
print(classifier_values)

As a next step, we will generate numerical labels for the genres

In [None]:
dic = {}
for i, class_value in enumerate(classifier_values):
    dic[class_value] = i
labels = train_data[column_target_value].apply(lambda x: dic[x])
num_classes = i + 1

We also split the data between training and validation

In [None]:
val_data_pct = 0.2
val_data = train_data.sample(frac=val_data_pct, random_state=200)
train_data = train_data.drop(val_data.index)

Next, will be generating representations of the sentences to classify, we create a vocabulary index based on word frequency and then transform the text to numerical vectors

In [None]:
num_words = 10000
texts = train_data[column_text_value]
tokenizer = Tokenizer(num_words=num_words, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n\'', lower=True)
tokenizer.fit_on_texts(texts)
sequences_train = tokenizer.texts_to_sequences(texts)
sequences_valid = tokenizer.texts_to_sequences(val_data[column_text_value])

In [None]:
print(sequences_train[0])

Now , we pad the sequences so all of them will have the same lenght, for the labels text we create categorical vectors

In [None]:
x_train = pad_sequences(sequences_train)
sequence_length = x_train.shape[1]
x_val = pad_sequences(sequences_valid, maxlen=sequence_length)
y_train = to_categorical(np.asarray(labels[train_data.index]))
y_val = to_categorical(np.asarray(labels[val_data.index]))

In [None]:
print(x_val[2])

In [None]:
print(y_train[1])

Now, we will generate the word embeddings, we will use transfer learning and re-use a pretrained word2vec model. In this case we will use GloVe (https://nlp.stanford.edu/projects/glove/) 100 dimensions. We had to tranform the Glove representation to word2vec using the glove2word2vec util

In [None]:
embedding_dim = 100
w2v_model_path = 'model/word2vec100d.txt'
w2v_model = word2vec.KeyedVectors.load_word2vec_format("gs://" + gcp_bucket + "/" + w2v_model_path)
word_vectors = w2v_model.wv
word_index = tokenizer.word_index
vocabulary_size = min(len(tokenizer.word_index) + 1, num_words)
embedding_matrix = np.zeros((vocabulary_size, embedding_dim))
for word, i in word_index.items():
    if i >= num_words:
        continue
    try:
        embedding_vector = word_vectors[word]
        embedding_matrix[i] = embedding_vector
    except KeyError:
        embedding_matrix[i] = np.random.normal(0, np.sqrt(0.25), embedding_dim)

In [None]:
w2v_model.most_similar('summer')

## 02.Model generation and training

In [None]:
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.layers import Input, Dense, Embedding, Conv2D, MaxPooling2D, Dropout, concatenate, Reshape, \
    Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
import datetime
import os

We define the model using the Keras functional API, in essence the model is made of an Embedding  + Conv  + Maxpool + Dense layers. The tunnable hps are:
- number of convolutions + maxpool
- filter sizes, we add 3,5,7 ..
- number of filters
- dropout

In [None]:
filter_size = 3
activation_conv = 'relu'
activation_max = 'softmax'
drop = 0.5
num_filters = 200

embedding_layer = Embedding(vocabulary_size, embedding_dim, weights=[embedding_matrix], trainable=True)
inputs = Input(shape=(sequence_length))
embedding = embedding_layer(inputs)
reshape = Reshape((sequence_length, embedding_dim, 1))(embedding)
conv_layer_1 = Conv2D(num_filters, (3, embedding_dim),
                    activation=activation_conv,kernel_regularizer=regularizers.l2(0.01))(reshape)
maxpool_layer_1 = MaxPooling2D((sequence_length - 3 + 1, 1), 
                             strides=(1,1))(conv_layer_1)

flatten = Flatten()(maxpool_layer_1)
reshape = Reshape((num_filters,))(flatten)
dropout = Dropout(drop)(flatten)
output = Dense(units=num_classes, activation=activation_max, kernel_regularizer=regularizers.l2(0.01))(dropout)
model = Model(inputs, output)
adam = Adam(lr=1e-3)
model.compile(loss='categorical_crossentropy',optimizer=adam,metrics=['acc'])
print(model.summary())

Finally, we train the model

In [None]:
epochs = 20
batch_size = 500
now = datetime.datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "model/tf_logs"
if not os.path.exists(root_logdir):
    os.mkdir(root_logdir)
log_dir = "{}/run-{}/".format(root_logdir, now)
callback_tensorboard = TensorBoard(log_dir=log_dir, histogram_freq=1)
callback_earlystopping = EarlyStopping(monitor='val_loss')

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_val, y_val),
                  callbacks=[callback_earlystopping, callback_tensorboard])
loss, acc = model.evaluate(x_train, y_train, verbose=2)
print("Accuracy = {:5.2f}%".format(100 * acc))
print("Loss = {:5.2f}%".format(100 * loss))

## 03. Hyperparameter tunning

In order to improve the accuracy we will use katib to search the optima hyperparmeters for us. First we need to build an image with the training code. Note we install a specific version of fairing, otherwise we may hit https://github.com/kubeflow/kubeflow/issues/3643
This is adapted from https://github.com/jlewi/examples/blob/hptuning/xgboost_synthetic/build-train-deploy.ipynb

In [None]:
from kubeflow.fairing import cloud
from kubeflow.fairing.builders import append
from kubeflow.fairing.builders import cluster
from kubeflow.fairing.deployers import job
from kubeflow.fairing import utils
from kubeflow.fairing.preprocessors.converted_notebook import ConvertNotebookPreprocessorWithFire    

In [None]:
gcp_project = cloud.gcp.guess_project_name()
docker_registry = 'gcr.io/{}/text-cnn-class-dev'.format(gcp_project)

We need to define a wapper class for launching the k8s job, and the re-arrange the code around functions. The main change is adding the configurable hyper-parameters in the generate_model func.

In [None]:
# fairing:include-cell
import pandas as pd
import numpy as np
import io
import os
import json
import datetime
import json
import gensim.models.keyedvectors as word2vec
from gensim.models import Word2Vec
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Input, Dense, Embedding, Conv2D, MaxPooling2D, Dropout, concatenate, Reshape, \
    Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers        
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import TensorBoard

We will transform the next class to a CLI callable python file, basically we will be able to do python3 <nb_name.py> launch_rig --param1=value1 ... paramN=valueN 

In [None]:
# fairing:include-cell
class CNN():

    
    def __init__(self):
        self.sequence_length = None
        self.num_classes = None
        self.word_index = None
        self.class_values_list = None
        self.vocabulary_size = None

    def prepare_data_train(self, num_words, train_data_path, test_data_path, column_target_value, column_text_value,
                           val_data_pct, json_tokenizer_path,gcp_bucket):
        
        train_data_load = pd.read_csv("gs://" + gcp_bucket + "/" + train_data_path, sep=',')
        test_data_load = pd.read_csv("gs://" + gcp_bucket + "/" + test_data_path, sep=',')
        train_data = train_data_load.dropna().drop_duplicates(subset=column_text_value, keep='first', inplace=False)
        test_data = test_data_load.dropna().drop_duplicates(subset=column_text_value, keep='first', inplace=False)
        train_data = train_data[train_data.Genre != 'unknown']
        train_data = train_data.groupby(column_target_value).filter(lambda x : len(x)>900)
        train_data = train_data.drop(((train_data[train_data[column_target_value] == 'drama' ]).sample(frac=.8)).index)
        train_data = train_data.drop(((train_data[train_data[column_target_value] == 'comedy' ]).sample(frac=.75)).index)
        classifier_values = train_data[column_target_value].unique()
        dic = {}
        for i, class_value in enumerate(classifier_values):
            dic[class_value] = i
        labels = train_data[column_target_value].apply(lambda x: dic[x])
        val_data = train_data.sample(frac=val_data_pct, random_state=200)
        train_data = train_data.drop(val_data.index)
        texts = train_data[column_text_value]
        tokenizer = Tokenizer(num_words=num_words, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n\'', lower=True)
        tokenizer.fit_on_texts(texts)
        tokenizer_json = tokenizer.to_json()
        if not os.path.exists(os.path.dirname(json_tokenizer_path)):
            os.mkdir(os.path.dirname(json_tokenizer_path))
        with io.open(json_tokenizer_path, 'w', encoding='utf-8') as f:
            f.write(json.dumps(tokenizer_json, ensure_ascii=False))
        sequences_train = tokenizer.texts_to_sequences(texts)
        sequences_valid = tokenizer.texts_to_sequences(val_data[column_text_value])
        x_train = pad_sequences(sequences_train)
        x_val = pad_sequences(sequences_valid, maxlen=x_train.shape[1])
        y_train = to_categorical(np.asarray(labels[train_data.index]))
        y_val = to_categorical(np.asarray(labels[val_data.index]))
        self.sequence_length = x_train.shape[1]
        self.num_classes = i + 1
        self.word_index = tokenizer.word_index
        self.class_values_list = classifier_values
        return (x_train, x_val, y_train, y_val)

    def prepare_embeddings(self, num_words, w2v_model_path, embedding_dim,gcp_bucket):
        
        
        w2v_model = word2vec.KeyedVectors.load_word2vec_format("gs://" + gcp_bucket + "/" + w2v_model_path)
        word_vectors = w2v_model.wv
        vocabulary_size = min(len(self.word_index) + 1, num_words)
        embedding_matrix = np.zeros((vocabulary_size, embedding_dim))
        for word, i in self.word_index.items():
            if i >= num_words:
                continue
            try:
                embedding_vector = word_vectors[word]
                embedding_matrix[i] = embedding_vector
            except KeyError:
                embedding_matrix[i] = np.random.normal(0, np.sqrt(0.25), embedding_dim)
        del (word_vectors)
        self.vocabulary_size = vocabulary_size
        return (embedding_matrix)

    def generate_keras_model(self, num_conv_layers, maxpool_strides, drop,num_filters, embedding_dim,
                             embedding_matrix):
        
        
        
        
        embedding_layer = Embedding(self.vocabulary_size, embedding_dim, weights=[embedding_matrix], trainable=True)
        inputs = Input(shape=(self.sequence_length,))
        embedding = embedding_layer(inputs)
        reshape = Reshape((self.sequence_length, embedding_dim, 1))(embedding)
        filter_sizes = []
        for i in range(0, num_conv_layers*2 - 1, 2):
            filter_sizes.append(i+3)
        convolutions = []
        for layer_index in range(num_conv_layers):
            conv_layer = Conv2D(num_filters, (filter_sizes[layer_index], embedding_dim), activation='relu',
                                kernel_regularizer=regularizers.l2(0.01))(reshape)
            maxpool_layer = MaxPooling2D((self.sequence_length - filter_sizes[layer_index] + 1, 1), strides=(maxpool_strides[0], maxpool_strides[1]))(
                conv_layer)
            convolutions.append(maxpool_layer)
        if (num_conv_layers > 1):
            merged_tensor = concatenate(convolutions, axis=1)
        else:
            merged_tensor = convolutions[0]
        flatten = Flatten()(merged_tensor)
        reshape = Reshape((num_conv_layers * num_filters,))(flatten)
        dropout = Dropout(drop)(flatten)
        output = Dense(units=self.num_classes, activation='softmax', kernel_regularizer=regularizers.l2(0.01))(dropout)
        model = Model(inputs, output)
        adam = Adam(lr=1e-3)
        model.compile(loss='categorical_crossentropy',
                      optimizer=adam,
                      metrics=['acc'])
        
        return model

    def train_model(self, model, x_train, x_val, y_train, y_val, batch_size, epochs):
        
        now = datetime.datetime.utcnow().strftime("%Y%m%d%H%M%S")
        root_logdir = "model/tf_logs"
        if not os.path.exists(root_logdir):
            os.mkdir(root_logdir)
        log_dir = "{}/run-{}/".format(root_logdir, now)
        callback_tensorboard = TensorBoard(log_dir=log_dir, histogram_freq=1)
        callback_earlystopping = EarlyStopping(monitor='val_loss')
        model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_val, y_val),
                  callbacks=[callback_earlystopping, callback_tensorboard])
        loss, acc = model.evaluate(x_train, y_train, verbose=2)
        print("accuracy = {:5.2f}%".format(100 * acc))
                   
    def launch_rig(self,train_data_path,test_data_path,column_target_value,column_text_value,
                   json_tokenizer_path,gcp_bucket,w2v_model_path,
                   num_conv_layers,dropout,num_filters,batch_size,epochs):
        
        logging.basicConfig(level=logging.INFO)
        logging.info('Arguments:')
        
        logging.info('train_data_path:{}'.format(train_data_path))
        logging.info('test_data_path:{}'.format(test_data_path))
        logging.info('column_target_value:{}'.format(column_target_value))
        logging.info('column_text_value:{}'.format(column_text_value))
        logging.info('json_tokenizer_path:{}'.format(json_tokenizer_path))
        logging.info('gcp_bucket:{}'.format(gcp_bucket))
        logging.info('w2v_model_path:{}'.format(w2v_model_path))
        logging.info('num_conv_layers:{}'.format(num_conv_layers))
        logging.info('dropout:{}'.format(dropout))
        logging.info('num_filters:{}'.format(num_filters))
        logging.info('batch_size:{}'.format(batch_size))
        logging.info('epochs:{}'.format(epochs))
        
        x_train, x_val, y_train, y_val = self.prepare_data_train(10000,train_data_path,test_data_path,column_target_value,column_text_value,0.2,json_tokenizer_path,gcp_bucket)
        embedding_matrix = self.prepare_embeddings(10000,w2v_model_path,100,gcp_bucket)
        model = self.generate_keras_model(num_conv_layers,[1,1],dropout,num_filters,100,embedding_matrix)
        self.train_model(model,x_train,x_val,y_train,y_val,batch_size,epochs)

 

Local test to make sure the launch_rig function works

In [None]:
#Parameters
num_words = 10000
train_data_path = 'data/wiki_movie_plots_deduped.csv'
test_data_path = 'data/wiki_movie_plots_deduped_test.csv'
gcp_bucket = 'velascoluis-test'
column_target_value = 'Genre'
column_text_value = 'Plot'
val_data_pct = 0.2
json_tokenizer_path = 'model/tokens.json'
w2v_model_path = 'model/word2vec100d.txt'
embedding_dim = 100
num_conv_layers = 3
maxpool_strides = [1,1]
dropout = 0.5
num_filters = 200
batch_size = 300
epochs = 15
#Sequence
CNN_instance = CNN()
CNN_instance.launch_rig(train_data_path,test_data_path,column_target_value,column_text_value,json_tokenizer_path,gcp_bucket,w2v_model_path,num_conv_layers,dropout,num_filters,batch_size,epochs)

We create the preprocessor marking requirements.txt as input

In [None]:
preprocessor = ConvertNotebookPreprocessorWithFire("CNN")
if not preprocessor.input_files:
    preprocessor.input_files = set()
input_files=["requirements.txt"]
preprocessor.input_files =  set([os.path.normpath(f) for f in input_files])
preprocessor.preprocess()

Small setup for getting to the docker repo

In [None]:
print(os.getenv("GOOGLE_APPLICATION_CREDENTIALS"))

In [None]:
!gcloud auth configure-docker --quiet
!gcloud auth activate-service-account --key-file=/secret/gcp/user-gcp-sa.json

Here we create the base image and push it to GCR, note we are using a custom dockerfile for the image build

In [None]:
base_image = "tensorflow/tensorflow:latest-py3"
namespace= "kubeflow-velascoluis"
cluster_builder = cluster.cluster.ClusterBuilder(registry=docker_registry,
                                                 base_image=base_image,
                                                 dockerfile_path='Dockerfile',
                                                 preprocessor=preprocessor,
                                                 pod_spec_mutators=[cloud.gcp.add_gcp_credentials_if_exists],
                                                 namespace=namespace,
                                                 context_source=cluster.gcs_context.GCSContextSource())
cluster_builder.build()

We execute only the AppendBuilder if made changes of the CNN Class

In [None]:
preprocessor.preprocess()
builder = append.append.AppendBuilder(registry=docker_registry,
                                      base_image=cluster_builder.image_tag, preprocessor=preprocessor)
builder.build()

In [None]:
builder.image_tag

In [None]:
preprocessor.executable.name

In [None]:
def set_image(raw_yaml, image):
    """Set the container image given raw yaml.
    
    Args:
      raw_yaml: A string containing raw YAML for a job
      image: The docker image to use
    """
    lines = raw_yaml.splitlines()
    
    for i, l in enumerate(lines):
        if l.strip().startswith("image:"):
            lines[i] = l.split(":", 1)[0] + ":" + " " + image
            
    return "\n".join(lines)

Description of the katib job, we use the random algorithm, it will basically select a number of random states in the search space. For mor algorithms (TPE, hyperband, grid search ..) see https://github.com/kubeflow/katib

In [None]:
import yaml
hp_experiment_raw = """
apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
  labels:
    controller-tools.k8s.io: "1.0"
spec:
  objective:
    type: maximize
    goal: 0.99
    objectiveMetricName: accuracy
    additionalMetricNames:
      - train-accuracy
  algorithm:
    algorithmName: random
  trialTemplate:
    goTemplate:
      rawTemplate:
  parallelTrialCount: 2
  maxTrialCount: 4
  metricsCollectorSpec:
    collector:
      kind: StdOut
    objective:
      additionalMetricNames:
        - accuracy
  maxFailedTrialCount: 1
  parameters:
    - name: "--num_filters"
      parameterType: int
      feasibleSpace:
        min: "80"
        max: "150"
    - name: "--num_conv_layers"
      parameterType: int
      feasibleSpace:
        min: "2"
        max: "3"
    - name: "--dropout"
      parameterType: double
      feasibleSpace:
        min: "0.45"
        max: "0.55"      
"""        

# The batch job that will be launched on each trial
# 
trial_job_raw = """apiVersion: batch/v1
kind: Job
metadata:
  name: {{.Trial}}
  namespace: {{.NameSpace}}
spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
    spec:
      containers:
      - name: {{.Trial}}
        image: xxx
        workingDir: /app
        command:
        - "python"
        - "TextClassifier.py"
        - launch_rig
        - "--train_data_path=data/wiki_movie_plots_deduped.csv"
        - "--test_data_path=data/wiki_movie_plots_deduped_test.csv"
        - "--column_target_value=Genre"
        - "--column_text_value=Plot"
        - "--json_tokenizer_path=model/tokens.json"
        - "--gcp_bucket=velascoluis-test"
        - "--w2v_model_path=model/word2vec100d.txt"
        - "--batch_size=300"
        - "--epochs=15"        
        {{- with .HyperParameters}}
        {{- range .}}
        - "{{.Name}}={{.Value}}"
        {{- end}}
        {{- end}}
      restartPolicy: Never
"""


hp_experiment = yaml.load(hp_experiment_raw)
hp_experiment["metadata"]["namespace"] = utils.get_current_k8s_namespace()
trial_job_raw = set_image(trial_job_raw, builder.image_tag)
hp_experiment["spec"]["trialTemplate"]["goTemplate"]["rawTemplate"] = trial_job_raw


import datetime
now = datetime.datetime.now().strftime("%y%m%d-%H%M%S")
hp_experiment["metadata"]["name"] = "cnn-text-exp-gpu-{0}".format(now)
print(yaml.safe_dump(hp_experiment))

Launch experiment

In [None]:
from kubernetes import client as k8s_client
client = k8s_client.ApiClient()
crd_api = k8s_client.CustomObjectsApi(client)

group, version = hp_experiment['apiVersion'].split('/')

result = crd_api.create_namespaced_custom_object(
  group=group,
  version=version,
  namespace=hp_experiment["metadata"]["namespace"],
  plural='experiments',
  body=hp_experiment)

View status, the completion should take a while

In [None]:
result = crd_api.get_namespaced_custom_object(
  group=group,
  version=version,
  namespace=hp_experiment["metadata"]["namespace"],
  plural='experiments',
  name=hp_experiment["metadata"]["name"])

print(yaml.dump(result))