# TASK 2

In this task, you will work on building your ML pipeline that consists of 2 models responsible for
totally different tasks. The main goal is to understand what the user is asking (NLP) and check if
he is correct or not (Computer Vision).

You will need to:
1) find or collect an animal classification/detection dataset that contains at least 10
classes of animals.
2) train NER model for extracting animal titles from the text. Please use some
transformer-based model (not LLM).
3) Train the animal classification model on your dataset.
4) Build a pipeline that takes as inputs the text message and the image.

In general, the flow should be the following:
1. The user provides a text similar to “There is a cow in the picture.” and an image that
contains any animal.
2. Your pipeline should decide if it is true or not and provide a boolean value as the output.
You should take care that the text input will not be the same as in the example, and the
user can ask it in a different way.


The solution should contain:
● Jupyter notebook with exploratory data analysis of your dataset;
● Parametrized train and inference .py files for the NER model;
● Parametrized train and inference .py files for the Image Classification model;
● Python script for the entire pipeline that takes 2 inputs (text and image) and provides
1 boolean value as an output;

In [1]:
import spacy

import tensorflow as tf



In [2]:
from abc import ABC, abstractmethod
import pathlib

In [None]:
NER_MODEL_PATH = 'models/custom_ner_model'
CLASSIFICATION_MODEL_PATH = 'models/classification_model.keras'

In [4]:
class CustomModelInterface(ABC):
  @classmethod
  @abstractmethod
  def fit():
    pass
  
  @classmethod
  @abstractmethod
  def predict():
    pass

In [5]:
class NERCustomModel(CustomModelInterface):
  def __init__(self, model_path = NER_MODEL_PATH):
    self.nlp = spacy.load(model_path)
  
  def fit(self):
    raise NotImplementedError()
  def predict(self, text):
    return self.nlp(text).ents

In [6]:
class ImageHandler:
  img_height, img_width = 252, 320 # got from the notebook
  
  @staticmethod
  def prepare(img_path):
    new_img_path = pathlib.Path(img_path)

    img = tf.keras.utils.load_img(
        new_img_path, 
        target_size=(ImageHandler.img_height, ImageHandler.img_width)
    )
    img_array = tf.keras.utils.img_to_array(img)
    img_array = tf.expand_dims(img_array, 0) # Create a batch
    return img_array

In [None]:
# I know they tried to make it easier for us fellow user but not including all names really wasn't obligatory
from data.translate import translate
# import importlib
# imported_module = importlib.import_module("data.translate")
# importlib.reload(imported_module)

class ClassificationCustomModel(CustomModelInterface):
  class_names = ['cane', 'cavallo', 'elefante', 'farfalla', 'gallina', 'gatto', 'mucca', 'pecora', 'ragno', 'scoiattolo']
  translation = translate
  
  def __init__(self, model_path = CLASSIFICATION_MODEL_PATH):
    self.model = tf.keras.models.load_model(model_path)
  
  def fit(self):
    raise NotImplementedError()
  def predict(self, img_path, num_of_examples=3) -> list:
    img_array = ImageHandler.prepare(img_path)
    
    predictions = self.model.predict(img_array)
    score = tf.nn.softmax(predictions[0])

    most_prob = ClassificationCustomModel.most_probable(score, num_of_examples)
    return most_prob
  
  @staticmethod
  def most_probable(score, num_of_units = 2) -> list:
    score = score.numpy()
    ranks = [(ClassificationCustomModel.translation.get(n), int(s * 100)) for n, s in zip(ClassificationCustomModel.class_names, score)]
    ranks.sort(key=lambda x: x[1], reverse=True)
    return ranks[:num_of_units]

In [8]:
ner_model = NERCustomModel()

  from .autonotebook import tqdm as notebook_tqdm


In [9]:
res = ner_model.predict("This is a cat")
res[0].__str__()

'cat'

In [10]:
class ResultWrapper:
  def __init__(self, res_list):
    self.res = res_list
    
  def __str__(self):
    return '\n'.join(f"> {res[0][0]}, {res[0][1]}% - {'✔' if res[1] else '❌'}" for res in self.res)

In [11]:
class CustomPipeline:
  def __init__(self, ner_model = None, classification_model = None):
    self.ner_model = ner_model or NERCustomModel()
    self.classification_model = classification_model or ClassificationCustomModel()
    
  def predict(self, img_path, text, num_probable_guesses=1) -> bool:
    ner_result = self.ner_model.predict(text)
    print(ner_result)
    
    if len(ner_result) <= 0:
      raise LookupError("Couldn't detect animal names")
    
    ner_result = ner_result[0].__str__()
    classification_result = self.classification_model.predict(img_path, num_of_examples=num_probable_guesses)
    
    result = [(res, res[0] == ner_result) for res in classification_result]
    result_bool = result[0][0][0] == ner_result
    print(f"ner_result - {ner_result}, res {result[0][0]}")
    print(ResultWrapper(result))
    return result_bool
  

In [12]:
pipe = CustomPipeline(
  #ner_model=NERCustomModel(),
  #classification_model=ClassificationCustomModel(),
)

In [14]:
animals_path = pathlib.Path('./data/animals_test_img')
for pic in animals_path.glob('*.jpeg'):
  name = pic.name.split('.')[0]
  
  res = pipe.predict(img_path=pic, text=f"this is {name}", num_probable_guesses=1)

(butterfly,)
ner_result - butterfly, res ('butterfly', 99)
> butterfly, 99% - ✔
(cat,)
ner_result - cat, res ('sheep', 20)
> sheep, 20% - ❌
(chicken,)
ner_result - chicken, res ('chicken', 99)
> chicken, 99% - ✔
(cow,)
ner_result - cow, res ('cow', 53)
> cow, 53% - ✔
(dog,)
ner_result - dog, res ('dog', 47)
> dog, 47% - ✔


my god, I have been training and retraining this model for three days and it finally produces more or less fine results.
It isn't perfect but it works. Additional data analyses won't hurt but I may retrain the model anytime so I guess I'll focus on the rest of the stuff to implement. I am sooo happy it works though

In [None]:
res = pipe.predict(
  img_path='./data/animals_test_img/doggy.jpeg',
  text="this is a dog",
  num_probable_guesses=3
)
print(res)

(dog,)
ner_result - dog, res ('dog', 60)
> dog, 60% - ✔
> sheep, 15% - ❌
> cow, 9% - ❌
True
