In [1]:
from typing import List


def label_process(labels: List[str], sample_y: List[str]) -> List[List[int]]:
    """
    Transforms a list of sample labels into a format suitable for classification tasks.

    The function creates a binary list for each sample label, where the position
    of the label in the 'labels' list is marked as 1 and all other positions are 0.
    This is known as one-hot encoding.

    Args:
        labels (List[str]): List of unique labels/classes in the dataset.
        sample_y (List[str]): List of sample labels to be transformed.

    Returns:
        List[List[int]]: Transformed labels, each represented as a binary list corresponding
        to the positions in the 'labels' list.
    """
    train_y = []
    for y in sample_y:
        train = [0] * len(labels)
        train[labels.index(y)] = 1
        train_y.append(train)
    return train_y

if __name__ == "__main__":
    labels = ["Python", "Java", "Tensorflow", "Springboot", "Keras"]
    sample_y = ["Python", "Python", "Python", "Java", "Java", "Keras"]
    train_y = label_process(labels, sample_y)
    print(train_y)


[[1, 0, 0, 0, 0], [1, 0, 0, 0, 0], [1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 0, 0, 1]]


Introduction
Classification is one of the most common tasks in machine learning.

The goal of a classification task is to determine the label(s) of a given example.

For example, in the LabEx course classification task, given the title of a course, the task is to determine the type of the course.

There are different types of classification tasks, such as:

Single-label classification: Each example has only one label. For example, Your First Python Lab belongs to the label "Beginner" among the labels "Beginner", "Intermediate" and "Advanced".
Multi-label classification: Each example can have multiple labels. For example, Random Number Generationw with NumPy belongs to both the "random", "sqrt" and "sum" skills.
Multi-class classification: Each example can belong to multiple categories, and each category has its own set of labels. For example, Linear Regression belongs to the "Pro" category in terms of course type, and belongs to the "linear_model", "sklearn", "numpy", "matplotlib" and "pandas" categories in terms of course direction.
In this challenge, we will be focusing on performing one-hot encoding for label data. Given a list of labels and the labels of the examples, the task is to encode the labels using one-hot encoding. One-hot encoding involves creating a binary representation of the labels, where each category is represented by a unique position in the encoded sequence. The goal is to complete the label_process function in the provided label_process.py file to perform the one-hot encoding.

Encoding Label to One-Hot
The first step in any supervised learning task is to process the labeled data and convert it into the format required by the algorithm for training.

In the case of single-label classification tasks, the most common method of converting label variables into data is through one-hot encoding.

For example, in the course type classification task, given a label list ["Beginner", "Intermediate", "Advanced"], one-hot encoding is used to encode the label. To do this, the number of labels in the list is first determined (3 in this case), and then a zero-filled sequence of length 3 is created. For each category, a 1 is placed at the corresponding position in the sequence.

For example:
The encoding for the label "Beginner" is [1,0,0], indicating that the label "Beginner" is in position 0 in the label list.
The encoding for the label "Intermediate" is [0,1,0], indicating that the label "Intermediate" is in position 1 in the label list.
The encoding for the label "Advanced" is [0,0,1], indicating that the label "Advanced" is in position 2 in the label list.