**For running this notebook** open GCP console and go to Vertex AI / Workbench / New Notebook.

Make sure that you create a machine with tensorflow already installed
(With a small machine you'll be fine). 

## Stackoverflow text-classifier

* [Github Repo](https://github.com/GoogleCloudPlatform/ai-platform-text-classifier-shap/blob/master/stackoverflow-classifier.ipynb)

* [Sara's Blog](https://sararobinson.dev/2019/04/23/interpret-bag-of-words-models-shap.html)

* [Youtube Video Tutorial](https://www.youtube.com/watch?v=_RPHiqF2bSs)

# Objective
Build a model in GCP to predict the tags of questions from Stack Overflow. To keep things simple our dataset includes questions containing 5 possible ML-related tags: <br> `tensorflow, keras, pandas, scikit-learn, matplotlib`

# Let's code!

In [1]:
# checking tensorflow libraries installed
!pip freeze | grep tensorflow

tensorflow @ file:///opt/conda/conda-bld/dlenv-tf-2-8-gpu_1643754343905/work/tensorflow-2.8.0-cp37-cp37m-linux_x86_64.whl
tensorflow-cloud==0.1.16
tensorflow-datasets==4.4.0
tensorflow-estimator==2.8.0
tensorflow-hub==0.12.0
tensorflow-io==0.23.1
tensorflow-io-gcs-filesystem==0.23.1
tensorflow-metadata==1.6.0
tensorflow-probability==0.14.1
tensorflow-serving-api==2.7.0
tensorflow-transform==1.6.0


In [2]:
import tensorflow as tf 
import pandas as pd
import numpy as np 

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.utils import shuffle

import pickle
import os
import numpy as np

In [3]:
# getting data
!gsutil cp 'gs://cloudml-demo-lcm/SO_ml_tags_avocado_188k_v2.csv' ./

Copying gs://cloudml-demo-lcm/SO_ml_tags_avocado_188k_v2.csv...
\ [1 files][276.7 MiB/276.7 MiB]                                                
Operation completed over 1 objects/276.7 MiB.                                    


In [4]:
# checking the sieze of our dataset
file = os.path.join('.','SO_ml_tags_avocado_188k_v2.csv')
size = round(os.stat(file).st_size/(1024*1024))
print(f'File Size of {file} is {size} MegaBytes')

File Size of ./SO_ml_tags_avocado_188k_v2.csv is 277 MegaBytes 


In [6]:
# reading data and dropping NAs and defining the column's names
data = pd.read_csv('SO_ml_tags_avocado_188k_v2.csv', names=['tags', 'original_tags', 'text'], header=0)
data = data.dropna()

In [7]:
data.head()

Unnamed: 0,tags,original_tags,text
0,"matplotlib,pandas","python,matplotlib,pandas",setting xticks and yticks for scatter plot mat...
1,"scikitlearn,keras","python,numpy,scikit-learn,keras,grid-search",gridseachcv - valueerror: found input variable...
2,"matplotlib,scikitlearn","python,numpy,matplotlib,scikit-learn,nmf",non negative matrix factorisation in python on...
3,"pandas,tensorflow","python,pandas,tensorflow,time-series",avocado equivalent to avocado.dataframe.resamp...
4,"matplotlib,pandas","python,matplotlib,plot,pandas",how to plot on avocado python i have a data fr...


In [8]:
data = data.drop(columns=['original_tags'])

In [9]:
#get rid of any order inherited from the table
data = shuffle(data, random_state = 22)
data.head()

Unnamed: 0,tags,text
182914,"tensorflow,keras",avocado image captioning model not compiling b...
48361,pandas,return excel file from avocado with flask in f...
181447,"tensorflow,keras",validating with generator (avocado) i'm trying...
66307,pandas,avocado multiindex dataframe selecting data gi...
11283,pandas,get rightmost non-zero value position for each...


In [10]:
# How does the first row looks like
data.iloc[0].text

'avocado image captioning model not compiling because of concatenate layer when mask_zero=true in a previous layer i am new to avocado and i am trying to implement a model for an image captioning project.   i am trying to reproduce the model from image captioning pre-inject architecture (the picture is taken from this paper: where to put the image in an image captioning generator) (but with a minor difference: generating a word at each time step instead of only generating a single word at the end), in which the inputs for the lstm at the first time step are the embedded cnn features. the lstm should support variable input length and in order to do this i padded all the sequences with zeros so that all of them have maxlen time steps.  the code for the model i have right now is the following:    def get_model(model_name, batch_size, maxlen, voc_size, embed_size,          cnn_feats_size, dropout_rate):      # create input layer for the cnn features     cnn_feats_input = input(shape=(cnn_f

## Feature Engineering

In [11]:
# Encode top tags to multi-hot
tags_split = [tags.split(',') for tags in data['tags'].values]

['tensorflow', 'keras']


In [13]:
# One hot encoding
tag_encoder = MultiLabelBinarizer()
tags_encoded = tag_encoder.fit_transform(tags_split)
num_tags = len(tags_encoded[0])

print(f'The number of tags are {num_tags}:')
print(tag_encoder.classes_)

The number of tags are 5:
['keras' 'matplotlib' 'pandas' 'scikitlearn' 'tensorflow']


In [14]:
#label vector of the first row
print(tags_split[0])
tags_encoded[0]

['tensorflow', 'keras']


array([1, 0, 0, 0, 1])

## Modeling

In [15]:
# Split our data into train and test sets from the label tags
train_size = int(len(data) * .8)
print ("Train size: %d" % train_size)
print ("Test size: %d" % (len(data) - train_size))

Train size: 150559
Test size: 37640


In [16]:
train_tags = tags_encoded[:train_size]
test_tags = tags_encoded[train_size:]

In [17]:
train_tags

array([[1, 0, 0, 0, 1],
       [0, 0, 1, 0, 0],
       [1, 0, 0, 0, 1],
       ...,
       [0, 0, 1, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 0, 0, 1]])



### **Creating a class to import in the future**
[Keras preprocessing text method](https://keras.io/preprocessing/text/)

In [18]:
%%writefile preprocess.py

# Pre-processing data: create our tokenizer class
from tensorflow.keras.preprocessing import text

class TextPreprocessor(object):
  def __init__(self, vocab_size):
    self._vocab_size = vocab_size
    self._tokenizer = None
  
  def create_tokenizer(self, text_list):
    """
    This class allows to vectorize a text corpus, by turning each text into either a sequence of 
    integers (each integer being the index of a token in a dictionary) or into a vector where the 
    coefficient for each token could be binary, based on word count, based on tf-idf.
    """
    tokenizer = text.Tokenizer(num_words=self._vocab_size)
    tokenizer.fit_on_texts(text_list)
    self._tokenizer = tokenizer

  def transform_text(self, text_list):
    ## using simple binary bag of words -> https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer#texts_to_matrix
    text_matrix = self._tokenizer.texts_to_matrix(text_list)
    return text_matrix

Writing preprocess.py


In [21]:
# importhing the method from the class
from preprocess import TextPreprocessor

In [23]:
# creating the train/test split based on the train/test batches
train_qs = data['text'].values[:train_size]
test_qs = data['text'].values[train_size:]

In [27]:
print(type(train_qs))
print(train_qs[0])

<class 'numpy.ndarray'>
avocado image captioning model not compiling because of concatenate layer when mask_zero=true in a previous layer i am new to avocado and i am trying to implement a model for an image captioning project.   i am trying to reproduce the model from image captioning pre-inject architecture (the picture is taken from this paper: where to put the image in an image captioning generator) (but with a minor difference: generating a word at each time step instead of only generating a single word at the end), in which the inputs for the lstm at the first time step are the embedded cnn features. the lstm should support variable input length and in order to do this i padded all the sequences with zeros so that all of them have maxlen time steps.  the code for the model i have right now is the following:    def get_model(model_name, batch_size, maxlen, voc_size, embed_size,          cnn_feats_size, dropout_rate):      # create input layer for the cnn features     cnn_feats_inp

In [28]:
# initializing the class with the VOCAB SIZE
VOCAB_SIZE = 400 # This is a hyperparameter -> take the 400 "most important words"
processor = TextPreprocessor(VOCAB_SIZE)
type(processor)

preprocess.TextPreprocessor

In [29]:
#creating the matrix with the words size and the corpus of train qs
processor.create_tokenizer(train_qs)

In [30]:
#Creating the bag of words
body_train = processor.transform_text(train_qs)
body_test = processor.transform_text(test_qs)

In [31]:
#print the size of the matrix & the first vector of the corpus in train
print(len(body_train[0]))
print(body_train[0])

400
[0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 0. 1. 1. 1. 0. 1. 1. 1. 0.
 0. 1. 1. 0. 1. 1. 1. 0. 1. 0. 1. 0. 0. 1. 0. 0. 0. 1. 0. 1. 0. 1. 1. 0.
 1. 0. 0. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 1. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0.
 1. 0. 1. 1. 0. 1. 0. 0. 1. 1. 1. 1. 1. 1. 0. 1. 1. 0. 1. 1. 1. 1. 0. 1.
 0. 0. 1. 1. 1. 0. 1. 0. 1. 1. 1. 1. 0. 1. 0. 0. 1. 0. 1. 1. 0. 1. 1. 1.
 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0.
 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 1. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0.
 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0.
 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 1. 1. 0.
 0. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0

In [32]:
# Save the tokenizer in a pickle file
import pickle

with open('./processor_state.pkl', 'wb') as f:
  pickle.dump(processor, f)

## Build and train our model

In [33]:
# defining the neural net 

def create_model(vocab_size, num_tags):
    
    #Model groups layers into an object with training and inference features.
    model = tf.keras.models.Sequential()
    
    #Input shape = sizeof our matrix vector bag of words -> output 50 nodes
    model.add(tf.keras.layers.Dense(50, input_shape=(VOCAB_SIZE,), activation='relu'))
    #A hidden layer from 50 to 25 nodes
    model.add(tf.keras.layers.Dense(25, activation='relu'))
    #Output layer to the number of tags that we want to predict
    model.add(tf.keras.layers.Dense(num_tags, activation='sigmoid'))
    
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

In [34]:
model = create_model(VOCAB_SIZE, num_tags)
model.summary()

2022-03-18 00:26:20.584488: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-03-18 00:26:20.585284: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-03-18 00:26:20.585341: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (vm-1a1132b4-6adb-45c9-b1ce-76bd1b8b9bc5): /proc/driver/nvidia/version does not exist
2022-03-18 00:26:20.623080: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                20050     
                                                                 
 dense_1 (Dense)             (None, 25)                1275      
                                                                 
 dense_2 (Dense)             (None, 5)                 130       
                                                                 
Total params: 21,455
Trainable params: 21,455
Non-trainable params: 0
_________________________________________________________________


In [35]:
# Train

#_train = input bag of words's array -> features
#_tags = labels 
#epochs =  times where the model will iterate through the entire 
#batch size = how many elements the model will look at a time to update weights
#validation split = validation size 

model.fit(body_train, train_tags, epochs=3, batch_size=128, validation_split=0.1)


2022-03-18 00:27:19.764213: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 216804800 exceeds 10% of free system memory.


Epoch 1/3

2022-03-18 00:27:26.850995: W tensorflow/core/framework/cpu_allocator_impl.cc:82] Allocation of 24089600 exceeds 10% of free system memory.


Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7ff057726a90>

In [36]:
print('Eval loss/accuracy:{}'.format(model.evaluate(body_test, test_tags, batch_size=128)))

Eval loss/accuracy:[0.10137631744146347, 0.8972901105880737]


In [37]:
# Export the model to a file
model.save('keras_saved_model.h5')

## Test our model (locally)

1. Instantiate (the saved) model from the file,
2. Instantiate the tokenizer
3. preprocess the text data input text and transform 
4. Predict (the sigmoid probability array)

In [39]:
%%writefile model_prediction.py
import pickle
import os
import numpy as np

class CustomModelPrediction(object):

  def __init__(self, model, processor):
    self._model = model
    self._processor = processor
  
  def predict(self, instances, **kwargs):
    preprocessed_data = self._processor.transform_text(instances)
    predictions = self._model.predict(preprocessed_data)
    return predictions.tolist()

  @classmethod
  def from_path(cls, model_dir):
    import os
    import tensorflow.keras as keras
    model = keras.models.load_model(os.path.join(model_dir,'keras_saved_model.h5'))
    with open(os.path.join(model_dir, 'processor_state.pkl'), 'rb') as f:
      processor = pickle.load(f)

    return cls(model, processor)

Overwriting model_prediction.py


In [40]:
# Taking two external questions

test_requests = [
  "How to preprocess strings in Keras models Lambda layer? I have the problem that the value passed on to the Lambda layer (at compile time) is a placeholder generated by keras (without values). When the model is compiled, the .eval () method throws the error: You must feed a value for placeholder tensor 'input_1' with dtype string and shape [?, 1] def text_preprocess(x): strings = tf.keras.backend.eval(x) vectors = [] for string in strings: vector = string_to_one_hot(string.decode('utf-8')) vectors.append(vector) vectorTensor = tf.constant(np.array(vectors),dtype=tf.float32) return vectorTensor input_text = Input(shape=(1,), dtype=tf.string) embedding = Lambda(text_preprocess)(input_text) dense = Dense(256, activation='relu')(embedding) outputs = Dense(2, activation='softmax')(dense) model = Model(inputs=[input_text], outputs=outputs) model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=['accuracy']) model.summary() model.save('test.h5') If I pass a string array into the input layer statically, I can compile the model, but I get the same error if I want to convert the model to tflite. #I replaced this line: input_text = Input(shape=(1,), dtype=tf.string) #by this lines: test = tf.constant(['Hello', 'World']) input_text = Input(shape=(1,), dtype=tf.string, tensor=test) #but calling this ... converter = TFLiteConverter.from_keras_model_file('string_test.h5') tfmodel = converter.convert() #... still leads to this error: InvalidArgumentError: You must feed a value for placeholder tensor 'input_3' with dtype string and shape [2] [[{{node input_3}}]] ",
  "Change the bar item name in Pandas I have a test excel file like: df = pd.DataFrame({'name':list('abcdefg'), 'age':[10,20,5,23,58,4,6]}) print (df) name  age 0    a   10 1    b   20 2    c    5 3    d   23 4    e   58 5    f    4 6    g    6 I use Pandas and matplotlib to read and plot it: import pandas as pd import numpy as np import matplotlib.pyplot as plt import os excel_file = 'test.xlsx' df = pd.read_excel(excel_file, sheet_name=0) df.plot(kind='bar') plt.show() the result shows: enter image description here it use index number as item name, how can I change it to the name, which stored in column name?"
]

In [44]:
from model_prediction import CustomModelPrediction

classifier = CustomModelPrediction.from_path('.')
print(classifier)
print(model)
print(processor)

<model_prediction.CustomModelPrediction object at 0x7ff0486d9e90>
<keras.engine.sequential.Sequential object at 0x7ff04f594bd0>
<preprocess.TextPreprocessor object at 0x7ff04f5b9b50>


In [45]:
results = classifier.predict(test_requests)

In [46]:
results[0]

[0.9614534378051758,
 3.75794525098172e-06,
 0.002194911241531372,
 0.0005911886692047119,
 0.7085863351821899]

In [47]:
results[1]

[4.6565291995648295e-05,
 0.7481787204742432,
 0.7220635414123535,
 0.00103721022605896,
 1.2635499842872377e-05]

In [48]:
for i in range(len(results)):
  print('Predicted labels for text-{}:'.format(i))
  for idx, val in enumerate(results[i]):
    if val > 0.7:
      print(tag_encoder.classes_[idx])
  print('\n')

Predicted labels for text-0:
keras
tensorflow


Predicted labels for text-1:
matplotlib
pandas




## Package our model and deploy to AI Platform

## GCP Configuration

In [49]:
!gcloud config set project itam-dpa-2022

Updated property [core/project].


In [50]:
!gcloud config set ai_platform/region global

Updated property [ai_platform/region].


In [52]:
#Copying model and preprocessor to GCS Bucket

!gsutil cp keras_saved_model.h5 gs://itam-dpa-2022-text-classifier/v1
!gsutil cp processor_state.pkl gs://itam-dpa-2022-text-classifier/v1

Copying file://keras_saved_model.h5 [Content-Type=application/x-hdf5]...
/ [1 files][282.8 KiB/282.8 KiB]                                                
Operation completed over 1 objects/282.8 KiB.                                    
Copying file://processor_state.pkl [Content-Type=application/octet-stream]...
- [1 files][ 32.0 MiB/ 32.0 MiB]                                                
Operation completed over 1 objects/32.0 MiB.                                     


### Source Distribution (or “sdist”)
A distribution format (usually generated using python setup.py sdist) that provides metadata and the essential source files needed for installing by a tool like pip, or for generating a Built Distribution. <br>
A **source distribution**, or more commonly sdist, is a distribution that contains all of the python source code (i.e. .py files), any data files that the library requires, and a setup.py file which describes to the setuptools module how your python code should be packaged.


In [53]:
%%writefile setup.py

from setuptools import setup

setup(
  name="so_predict",
  version="0.1",
  include_package_data=True,
  scripts=["preprocess.py", "model_prediction.py"]
)

Writing setup.py


In [54]:
!python setup.py sdist

running sdist
running egg_info
creating so_predict.egg-info
writing so_predict.egg-info/PKG-INFO
writing dependency_links to so_predict.egg-info/dependency_links.txt
writing top-level names to so_predict.egg-info/top_level.txt
writing manifest file 'so_predict.egg-info/SOURCES.txt'
reading manifest file 'so_predict.egg-info/SOURCES.txt'
writing manifest file 'so_predict.egg-info/SOURCES.txt'

running check


creating so_predict-0.1
creating so_predict-0.1/so_predict.egg-info
copying files to so_predict-0.1...
copying model_prediction.py -> so_predict-0.1
copying preprocess.py -> so_predict-0.1
copying setup.py -> so_predict-0.1
copying so_predict.egg-info/PKG-INFO -> so_predict-0.1/so_predict.egg-info
copying so_predict.egg-info/SOURCES.txt -> so_predict-0.1/so_predict.egg-info
copying so_predict.egg-info/dependency_links.txt -> so_predict-0.1/so_predict.egg-info
copying so_predict.egg-info/top_level.txt -> so_predict-0.1/so_predict.egg-info
Writing so_predict-0.1/setup.cfg
creating di

In [55]:
!gsutil cp ./dist/so_predict-0.1.tar.gz gs://itam-dpa-2022-text-classifier/v1/packages/so_predict-0.1.tar.gz

Copying file://./dist/so_predict-0.1.tar.gz [Content-Type=application/x-tar]...
/ [1 files][  1.3 KiB/  1.3 KiB]                                                
Operation completed over 1 objects/1.3 KiB.                                      


In [57]:
# To configure and give access to the API
#!glcloud init

/bin/bash: glcloud: command not found


To review details visit https://cloud.google.com/ai-platform/prediction/docs/deploying-models#gcloud

https://cloud.google.com/ai-platform/prediction/docs/custom-prediction-routines

In [59]:
!gcloud ai-platform models create itam_dpa_2022_text_classifier


Learn more about regional endpoints and see a list of available regions: https://cloud.google.com/ai-platform/prediction/docs/regional-endpoints
Using endpoint [https://ml.googleapis.com/]
Created ai platform model [projects/itam-dpa-2022/models/itam_dpa_2022_text_classifier].


In [60]:
!pip freeze | grep tensorflow

tensorflow @ file:///opt/conda/conda-bld/dlenv-tf-2-8-gpu_1643754343905/work/tensorflow-2.8.0-cp37-cp37m-linux_x86_64.whl
tensorflow-cloud==0.1.16
tensorflow-datasets==4.4.0
tensorflow-estimator==2.8.0
tensorflow-hub==0.12.0
tensorflow-io==0.23.1
tensorflow-io-gcs-filesystem==0.23.1
tensorflow-metadata==1.6.0
tensorflow-probability==0.14.1
tensorflow-serving-api==2.7.0
tensorflow-transform==1.6.0


https://cloud.google.com/ai-platform/prediction/docs/runtime-version-list

In [61]:
!python --version

Python 3.7.12


In [66]:
!gcloud beta ai-platform versions create v1 --model itam_dpa_2022_text_classifier --python-version 3.7 --runtime-version 2.7  --origin gs://itam-dpa-2022-text-classifier/v1/ --package-uris gs://itam-dpa-2022-text-classifier/v1/packages/so_predict-0.1.tar.gz --prediction-class model_prediction.CustomModelPrediction

Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......done.                    


# Generating predictions in our deployed trained model

In [67]:
# https://stackoverflow.com/questions/55517871/how-to-preprocess-strings-in-keras-models-lambda-layer
# https://stackoverflow.com/questions/55508547/plot-histogram-for-feature-of-array-with-known-and-limited-values

In [68]:
%%writefile predictions.txt
"How to preprocess strings in Keras models Lambda layer? I have the problem that the value passed on to the Lambda layer (at compile time) is a placeholder generated by keras (without values). When the model is compiled, the .eval () method throws the error"
"I have a test excel file like:df = pd.DataFrame({'name':list('abcdefg'), 'age':[10,20,5,23,58,4,6]})print (df)name  age0    a   101    b   202    c    53    d   234    e   585    f    46    g    6I use Pandas and matplotlib to read and plot it:import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport osexcel_file = 'test.xlsx'df = pd.read_excel(excel_file, sheet_name=0)df.plot(kind='bar')plt.show()the result shows: enter image description hereit use index number as item name, how can I change it to the name, which stored in column name?"

Writing predictions.txt


In [69]:
# Getting predictions for our model
predictions = !gcloud ai-platform predict --model=itam_dpa_2022_text_classifier --version=v1 --text-instances=predictions.txt 

In [70]:
predictions

['Using endpoint [https://ml.googleapis.com/]',
 '[[0.35895562171936035, 0.001184225082397461, 0.014009594917297363, 0.03241389989852905, 0.7719103097915649], [7.194868521764874e-05, 0.7333732843399048, 0.7804746627807617, 0.0009480714797973633, 2.0992163626942784e-05]]']

In [71]:
for sigmoid_arr in eval(predictions[1]):
  print(sigmoid_arr)
  for idx,probability in enumerate(sigmoid_arr):
    if probability > 0.7:
      print(tag_encoder.classes_[idx])
  print('\n')

[0.35895562171936035, 0.001184225082397461, 0.014009594917297363, 0.03241389989852905, 0.7719103097915649]
tensorflow


[7.194868521764874e-05, 0.7333732843399048, 0.7804746627807617, 0.0009480714797973633, 2.0992163626942784e-05]
matplotlib
pandas




In [72]:
print(tag_encoder.classes_, '\n')

for sigmoid_arr in eval(predictions[1]):
  print(sigmoid_arr)
  for idx,probability in enumerate(sigmoid_arr):
    if probability > 0.7:
      print(tag_encoder.classes_[idx])
  print('\n')

['keras' 'matplotlib' 'pandas' 'scikitlearn' 'tensorflow'] 

[0.35895562171936035, 0.001184225082397461, 0.014009594917297363, 0.03241389989852905, 0.7719103097915649]
tensorflow


[7.194868521764874e-05, 0.7333732843399048, 0.7804746627807617, 0.0009480714797973633, 2.0992163626942784e-05]
matplotlib
pandas


