# Predictions based on clip finetuned model

This notebook uses the model already finetuned (```./clip_features_model_kf5cnvvi```) to make predictions on a larger set of images from a commercial database.

The steps followed were:
1. The image features were precomputed using clip (using ```python save_clip_and_resnet_features.py --data_base_path my_data_folder/ --model clip```).
    * Folders were created with the same class names as used during the model train/validation/test process
    * The images from the commercial database were randomly set across the class folders (as the true values were not known)
2. The model was ran to create predictions for all images 
    * The comparison between predicted and true values were skipped as they were not relevant (the true values were not known)
3. The dataset was saved to be further consolidated with the commercial dataset information

    

In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame as df
import tensorflow as tf

2023-10-14 09:42:41.213510: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-14 09:42:41.241578: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-14 09:42:41.241603: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-14 09:42:41.241619: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-14 09:42:41.246728: I tensorflow/core/platform/cpu_feature_g

### Load validation set in tensorflow format

In [3]:
def load_dataset(path):
    data = pd.read_pickle(path)
    n_features = data['clip_features'][0].shape[-1]
    class_names = data['y'].unique()
    y = np.array([np.where(class_names == e)[0][0] for e in data['y']])
    x = np.concatenate(data['clip_features'].to_numpy())
    print(f'Loaded {len(x)} instances from {path}.')
    print(f'X shape {x.shape} and y shape {y.shape} with labels:\n{data["y"].value_counts()}.')
    train_ds = tf.data.Dataset.from_tensor_slices((x, y)).batch(16)
    return train_ds, class_names, n_features


validation_ds, class_names, n_features = load_dataset('/home/theo/robot_images/validation.pk')


Loaded 22480 instances from /home/theo/robot_images/validation.pk.
X shape (22480, 512) and y shape (22480,) with labels:
y
class_Superior human                     22464
class_Thinking machine                       4
class_None of the above                      2
class_Mysterious AI                          2
class_Collaborative or Interactive AI        2
class_Complex AI                             2
class_Learning or recognition machine        2
class_Acting or Performing machine           2
Name: count, dtype: int64.


In [4]:
class_list = class_names

In [5]:
trues = [class_list[int(y)] for _x, y in validation_ds.unbatch()]

### Print class counts for each split

In [6]:
df([y for y in trues]).value_counts()

class_Superior human                     22464
class_Thinking machine                       4
class_Acting or Performing machine           2
class_Collaborative or Interactive AI        2
class_Learning or recognition machine        2
class_Complex AI                             2
class_None of the above                      2
class_Mysterious AI                          2
Name: count, dtype: int64

Some of these counts are horrible. The number of examples in 'Learning or recognition machine', 'Superior human' and 'Mysterious AI' are really too small. Also, the 'None of the above' class is very large, although that's not really a problem.

### Load our trained model

In [7]:
import tensorflow as tf

# model = tf.keras.models.load_model('./fine_tuned_model_3m6herki')
# model = tf.keras.models.load_model('./fine_tuned_model_3f0vzk68')
# model = tf.keras.models.load_model('./fine_tuned_model_qdesgan9')
# model = tf.keras.models.load_model('./clip_features_model_kuw3ehqp')
model = tf.keras.models.load_model('./clip_features_model_kf5cnvvi')

### Make predictions

In [8]:
logits = model.predict(validation_ds)
predicted = [class_list[v] for v in np.argmax(logits, 1)]
from pandas import DataFrame as df

print(df(predicted).value_counts())
print(df(trues).value_counts())

 143/1405 [==>...........................] - ETA: 0s 

2023-10-14 09:43:18.629900: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:521] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.8
  /usr/local/cuda
  /home/theo/anaconda3/envs/robot/lib/python3.9/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
  /home/theo/anaconda3/envs/robot/lib/python3.9/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-10-14 09:43:18.658819: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-10-14 09:43:18.659712: I ten

class_Complex AI                         8557
class_Mysterious AI                      4418
class_Acting or Performing machine       3849
class_Thinking machine                   2824
class_None of the above                  1737
class_Learning or recognition machine     828
class_Superior human                      267
Name: count, dtype: int64
class_Superior human                     22464
class_Thinking machine                       4
class_Acting or Performing machine           2
class_Collaborative or Interactive AI        2
class_Learning or recognition machine        2
class_Complex AI                             2
class_None of the above                      2
class_Mysterious AI                          2
Name: count, dtype: int64


In [16]:
df_predicted = df(predicted).reset_index()
df_trues = df(trues).reset_index()

In [18]:
df_predicted = df_predicted.rename(columns={0:'predictions'})
df_trues = df_trues.rename(columns={0:'folder_validation'})

In [22]:
results = df_predicted.merge(df_trues)

In [23]:
results

Unnamed: 0,index,predictions,folder_validation
0,0,class_Mysterious AI,class_Mysterious AI
1,1,class_Mysterious AI,class_Mysterious AI
2,2,class_Acting or Performing machine,class_None of the above
3,3,class_Learning or recognition machine,class_None of the above
4,4,class_None of the above,class_Thinking machine
...,...,...,...
22475,22475,class_Thinking machine,class_Complex AI
22476,22476,class_Thinking machine,class_Learning or recognition machine
22477,22477,class_Thinking machine,class_Learning or recognition machine
22478,22478,class_Acting or Performing machine,class_Acting or Performing machine


In [24]:
results.to_csv('predictions_commercial.csv', index=False)
results.to_pickle('predictions_commercial.pkl')
