# Multi-label classification encoding with TensorFlow
> From integers to multi-hot encoding

- toc: true 
- badges: true
- comments: true
- categories: [tensorflow, multi-label classification]

Multi-label classication problems happen when an observation can belong to more than one class. They happen quite often in practice, one example being [video classification](https://thigm85.github.io/blog/youtube%208m/video%20data/tensorflow/2021/10/08/youtube-8m-video-level.html). In order to solve multi-label classification problems with TensorFlow, we need to be able to express the label variable using multi-hot encoding. For the sake of this post, assume our classification has 5 classes and that each observation can belong to one or more classes.

## Requirement

In [1]:
import tensorflow as tf

Tensorflow version used:

In [2]:
print(tf.__version__)

2.6.0


## Single observation

Here is a single observation that belong to the second and third class. 

In [3]:
indice = tf.constant([1, 2]) # We want to generate [0, 1, 1, 0, 0]

In [4]:
one_hot = tf.one_hot(indices=indice, depth=5)
multi_hot = tf.reduce_max(one_hot, axis = 0) # reduce across axis = 0

In [5]:
one_hot.shape

TensorShape([2, 5])

In [6]:
multi_hot

<tf.Tensor: shape=(5,), dtype=float32, numpy=array([0., 1., 1., 0., 0.], dtype=float32)>

## Batch of observation

In [7]:
indices = tf.ragged.constant([[1, 2], [1], [3, 2]]) # We want [
                                                    # [0, 1, 1, 0, 0],
                                                    # [0, 1, 0, 0, 0],
                                                    # [0, 0, 1, 1, 0]
                                                    # ]

In [8]:
one_hot = tf.one_hot(indices=indices, depth=5)
multi_hot = tf.reduce_max(one_hot, axis = 1) # reduce across axis = 1

In [9]:
one_hot.shape

TensorShape([3, None, 5])

In [10]:
multi_hot

<tf.Tensor: shape=(3, 5), dtype=float32, numpy=
array([[0., 1., 1., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 1., 0.]], dtype=float32)>