# AutoMM for Image + Text + Tabular - Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/multimodal_prediction/beginner_multimodal.ipynb)
[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/autogluon/autogluon/blob/master/docs/tutorials/multimodal/multimodal_prediction/beginner_multimodal.ipynb)



AutoMM is a deep learning "model zoo" of model zoos. It can automatically build deep learning models that are suitable for multimodal datasets. You will only need to convert the data into the multimodal dataframe format
and AutoMM can predict the values of one column conditioned on the features from the other columns including images, text, and tabular data.

In [None]:
# !pip install autogluon.multimodal

In [1]:
import os
import numpy as np
import warnings
warnings.filterwarnings('ignore')
np.random.seed(123)

## Dataset

For demonstration, we use a simplified and subsampled version of [PetFinder dataset](https://www.kaggle.com/c/petfinder-adoption-prediction). The task is to predict the animals' adoption rates based on their adoption profile information. In this simplified version, the adoption speed is grouped into two categories: 0 (slow) and 1 (fast).

To get started, let's download and prepare the dataset.

In [2]:
download_dir = './ag_automm_tutorial'
zip_file = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip'
from autogluon.core.utils.loaders import load_zip
load_zip.unzip(zip_file, unzip_dir=download_dir)

Downloading ./ag_automm_tutorial/file.zip from https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip...


100%|██████████| 18.8M/18.8M [00:20<00:00, 904kiB/s] 


Next, we will load the CSV files.

In [3]:
import pandas as pd
dataset_path = download_dir + '/petfinder_for_tutorial'
train_data = pd.read_csv(f'{dataset_path}/train.csv', index_col=0)
test_data = pd.read_csv(f'{dataset_path}/test.csv', index_col=0)
label_col = 'AdoptionSpeed'

We need to expand the image paths to load them in training.

In [4]:
image_col = 'Images'
train_data[image_col] = train_data[image_col].apply(lambda ele: ele.split(';')[0]) # Use the first image for a quick tutorial
test_data[image_col] = test_data[image_col].apply(lambda ele: ele.split(';')[0])


def path_expander(path, base_folder):
    path_l = path.split(';')
    return ';'.join([os.path.abspath(os.path.join(base_folder, path)) for path in path_l])

train_data[image_col] = train_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))
test_data[image_col] = test_data[image_col].apply(lambda ele: path_expander(ele, base_folder=dataset_path))

train_data[image_col].iloc[0]

'/Users/elnath/004_deep_learning/AutoGloun-Official/v1_0_0/docs/tutorials/multimodal/multimodal_prediction/ag_automm_tutorial/petfinder_for_tutorial/images/7d7a39d71-1.jpg'

Each animal's adoption profile includes pictures, a text description, and various tabular features such as age, breed, name, color, and more. Let's look at an example row of data and display the text description and a picture.

In [5]:
example_row = train_data.iloc[0]

example_row

Type                                                             2
Name                                                 Yumi Hamasaki
Age                                                              4
Breed1                                                         292
Breed2                                                         265
Gender                                                           2
Color1                                                           1
Color2                                                           5
Color3                                                           7
MaturitySize                                                     2
FurLength                                                        2
Vaccinated                                                       1
Dewormed                                                         3
Sterilized                                                       2
Health                                                        

In [6]:
example_row['Description']

"I rescued Yumi Hamasaki at a food stall far away in Kelantan. At that time i was on my way back to KL, she was suffer from stomach problem and looking very2 sick.. I send her to vet & get the treatment + vaccinated and right now she's very2 healthy.. About yumi : - love to sleep with ppl - she will keep on meowing if she's hugry - very2 active, always seeking for people to accompany her playing - well trained (poo+pee in her own potty) - easy to bathing - I only feed her with these brands : IAMS, Kittenbites, Pro-formance Reason why i need someone to adopt Yumi: I just married and need to move to a new house where no pets are allowed :( As Yumi is very2 special to me, i will only give her to ppl that i think could take care of her just like i did (especially on her foods things).."

In [7]:
example_image = example_row[image_col]

from IPython.display import Image, display
pil_img = Image(filename=example_image)
display(pil_img)

<IPython.core.display.Image object>

## Training
Now let's fit the predictor with the training data. Here we set a tight time budget for a quick demo.

In [8]:
from autogluon.multimodal import MultiModalPredictor
predictor = MultiModalPredictor(label=label_col)
predictor.fit(
    train_data=train_data,
    time_limit=120, # seconds
)

No path specified. Models will be saved in: "AutogluonModels/ag-20231228_050640"
AutoGluon Version:  1.0.0
Python Version:     3.10.13
Operating System:   Darwin
Platform Machine:   x86_64
Platform Version:   Darwin Kernel Version 23.2.0: Wed Nov 15 21:54:10 PST 2023; root:xnu-10002.61.3~2/RELEASE_X86_64
CPU Count:          16
Pytorch Version:    2.0.0.post104
CUDA Version:       CUDA is not available
Memory Avail:       43.46 GB / 64.00 GB (67.9%)
Disk Space Avail:   468.98 GB / 931.55 GB (50.3%)
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [0, 1]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have insta

Epoch 0:   2%|▏         | 1/60 [01:38<1:36:50, 98.49s/it]                  

INFO: Time limit reached. Elapsed time is 0:02:11. Signaling Trainer to stop.


Epoch 0:   3%|▎         | 2/60 [02:11<1:03:19, 65.52s/it]
Validation: 0it [00:00, ?it/s][A
Validation:   0%|          | 0/15 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|          | 0/15 [00:00<?, ?it/s][A
Validation DataLoader 0:   7%|▋         | 1/15 [00:09<02:17,  9.85s/it][A
Validation DataLoader 0:  13%|█▎        | 2/15 [00:17<01:55,  8.87s/it][A
Validation DataLoader 0:  20%|██        | 3/15 [00:25<01:43,  8.61s/it][A
Validation DataLoader 0:  27%|██▋       | 4/15 [00:33<01:31,  8.33s/it][A
Validation DataLoader 0:  33%|███▎      | 5/15 [00:46<01:33,  9.30s/it][A
Validation DataLoader 0:  40%|████      | 6/15 [00:58<01:28,  9.81s/it][A
Validation DataLoader 0:  47%|████▋     | 7/15 [01:07<01:16,  9.61s/it][A
Validation DataLoader 0:  53%|█████▎    | 8/15 [01:14<01:05,  9.36s/it][A
Validation DataLoader 0:  60%|██████    | 9/15 [01:23<00:55,  9.30s/it][A
Validation DataLoader 0:  67%|██████▋   | 10/15 [01:35<00:47,  9.52s/it][A
Validation DataLoader 0:  73%|██████

AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/Users/elnath/004_deep_learning/AutoGloun-Official/v1_0_0/docs/tutorials/multimodal/multimodal_prediction/AutogluonModels/ag-20231228_050640")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).




<autogluon.multimodal.predictor.MultiModalPredictor at 0x1a6184d30>

Under the hood, AutoMM automatically infers the problem type (classification or regression), detects the data modalities, selects the related models from the multimodal model pools, and trains the selected models. If multiple backbones are available, AutoMM appends a late-fusion model (MLP or transformer) on top of them.


## Evaluation
Then we can evaluate the predictor on the test data.

In [9]:
scores = predictor.evaluate(test_data, metrics=["roc_auc"])
scores

Predicting DataLoader 0: 100%|██████████| 4/4 [01:52<00:00, 28.15s/it]


{'roc_auc': 0.6516000000000001}

## Prediction
Given a multimodal dataframe without the label column, we can predict the labels.

In [10]:
predictions = predictor.predict(test_data.drop(columns=label_col))
predictions[:5]

Predicting DataLoader 0: 100%|██████████| 4/4 [02:29<00:00, 37.32s/it]


8     0
70    0
82    0
28    0
63    1
Name: AdoptionSpeed, dtype: int64

For classification tasks, we can get the probabilities of all classes.

In [11]:
probas = predictor.predict_proba(test_data.drop(columns=label_col))
probas[:5]

Predicting DataLoader 0: 100%|██████████| 4/4 [01:59<00:00, 29.78s/it]


Unnamed: 0,0,1
8,0.513769,0.486231
70,0.774978,0.225022
82,0.891982,0.108018
28,0.96565,0.03435
63,0.311051,0.688949


Note that calling `.predict_proba()` on one regression task will throw an exception.


## Extract Embeddings

Extracting embeddings can also be useful in many cases, where we want to convert each sample (per row in the dataframe) into an embedding vector.

In [12]:
embeddings = predictor.extract_embedding(test_data.drop(columns=label_col))
embeddings.shape

Predicting DataLoader 0: 100%|██████████| 4/4 [01:59<00:00, 29.92s/it]


(100, 128)

## Save and Load
It is also convenient to save a predictor and re-load it.

```{warning}

`MultiModalPredictor.load()` uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source, or that could have been tampered with. **Only load data you trust.**

```

In [None]:
import uuid

model_path = f"./tmp/{uuid.uuid4().hex}-saved_model"
predictor.save(model_path)
loaded_predictor = MultiModalPredictor.load(model_path)
scores2 = loaded_predictor.evaluate(test_data, metrics=["roc_auc"])
scores2

## Other Examples

You may go to [AutoMM Examples](https://github.com/autogluon/autogluon/tree/master/examples/automm) to explore other examples about AutoMM.

## Customization
To learn how to customize AutoMM, please refer to [Customize AutoMM](../advanced_topics/customization.ipynb).