#  Multi-label Classification

**다중 분류 vs 다중 레이블 분류**

| 구분    | 다중 분류 (Multi-class)          | 다중 레이블 분류 (Multi-label)         |
| ----- | ---------------------------- | ------------------------------- |
| 정의    | 하나의 샘플이 여러 클래스 중 **하나**에만 속함 | 하나의 샘플이 **여러 클래스에 동시에** 속할 수 있음 |
| 예시    | 고양이, 개, 새 중 하나               | 영화가 Action + Sci-Fi + Drama     |
| 출력 형태 | 정수 인덱스 (`y=3`)               | 이진 벡터 (`y=[1, 0, 1, 0, 1]`)     |
| 모델 출력 | `argmax` 사용                  | `sigmoid` 후 **각 클래스마다 이진 판단**   |

**다중 레이블 문제의 대표 예시**

* 텍스트 분류 (뉴스 → 여러 주제)
* 영화/음악 장르 분류
* 이미지에서 객체 감지 (여러 객체 포함 가능)
* 질병 진단 (동시 복합 질병)

## MultiLabelBinarizer

In [None]:
import pandas as pd

data = pd.DataFrame({
    'plot': [
        "A man fights crime in a futuristic city.",
        "A love story set in wartime.",
        "Aliens invade Earth and a war begins.",
        "A detective solves a complicated crime case.",
        "A dramatic romance in the midst of a tragedy."
    ],  # 영화 줄거리
    'genres': [
        ['Action', 'Sci-Fi'],
        ['Romance', 'Drama'],
        ['Action', 'Sci-Fi', 'War'],
        ['Crime', 'Mystery'],
        ['Drama', 'Romance']
    ]   # 장르
})
data

Unnamed: 0,plot,genres
0,A man fights crime in a futuristic city.,"[Action, Sci-Fi]"
1,A love story set in wartime.,"[Romance, Drama]"
2,Aliens invade Earth and a war begins.,"[Action, Sci-Fi, War]"
3,A detective solves a complicated crime case.,"[Crime, Mystery]"
4,A dramatic romance in the midst of a tragedy.,"[Drama, Romance]"


In [2]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   plot    5 non-null      object
 1   genres  5 non-null      object
dtypes: object(2)
memory usage: 212.0+ bytes


In [None]:
# 다중라벨 전처리
from sklearn.preprocessing import MultiLabelBinarizer  # 다중 라벨 -> 멀티핫(0/1) 변환 도구

mlb = MultiLabelBinarizer()
y = mlb.fit_transform(data['genres'])  # 장르 리스트를 멀티핫 행렬로 변환
print(y)
print(mlb.classes_)    # 열(클래스) 순서 확인

label_df = pd.DataFrame(
    y,
    columns=mlb.classes_,    # 컬럼 : 장르명
    index=data['plot']       # 행 : 영화 줄거리
)
label_df

[[1 0 0 0 0 1 0]
 [0 0 1 0 1 0 0]
 [1 0 0 0 0 1 1]
 [0 1 0 1 0 0 0]
 [0 0 1 0 1 0 0]]
['Action' 'Crime' 'Drama' 'Mystery' 'Romance' 'Sci-Fi' 'War']


Unnamed: 0_level_0,Action,Crime,Drama,Mystery,Romance,Sci-Fi,War
plot,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A man fights crime in a futuristic city.,1,0,0,0,0,1,0
A love story set in wartime.,0,0,1,0,1,0,0
Aliens invade Earth and a war begins.,1,0,0,0,0,1,1
A detective solves a complicated crime case.,0,1,0,1,0,0,0
A dramatic romance in the midst of a tragedy.,0,0,1,0,1,0,0


## 다중레이블 분류 모델

In [None]:
# 입력데이터 전처리
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(data['plot'])  # 사전학습 및 줄거리 텍스트를 TF-IDF 희소행렬로 변환

input_df = pd.DataFrame(
    X.toarray(),           # 희소행렬을 밀집 (샘플수, 단어 수) 배열 형태로 만듬
    columns=vectorizer.get_feature_names_out(),  # 단어(특징) 이름
    index=data['plot']                           # 행 : 줄거리
)
input_df

Unnamed: 0_level_0,aliens,and,begins,case,city,complicated,crime,detective,dramatic,earth,...,midst,of,romance,set,solves,story,the,tragedy,war,wartime
plot,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
A man fights crime in a futuristic city.,0.0,0.0,0.0,0.0,0.442832,0.0,0.357274,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A love story set in wartime.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.474125,0.0,0.474125,0.0,0.0,0.0,0.474125
Aliens invade Earth and a war begins.,0.408248,0.408248,0.408248,0.0,0.0,0.0,0.0,0.0,0.0,0.408248,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.408248,0.0
A detective solves a complicated crime case.,0.0,0.0,0.0,0.463693,0.0,0.463693,0.374105,0.463693,0.0,0.0,...,0.0,0.0,0.0,0.0,0.463693,0.0,0.0,0.0,0.0,0.0
A dramatic romance in the midst of a tragedy.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.393795,0.0,...,0.393795,0.393795,0.393795,0.0,0.0,0.0,0.393795,0.393795,0.0,0.0


In [None]:
# 모델
from sklearn.multiclass import OneVsRestClassifier    # 다중 라벨/다중 클래스 확장(클래스별 이진 분류)
from sklearn.linear_model import LogisticRegression

clf = OneVsRestClassifier(LogisticRegression())  # 각 장르별로 로지스틱 회귀 이진 분류기를 학습
clf.fit(X, y)    # TF-IDF 입력(X)과 멀티핫 라벨(y)로 모델 학습

0,1,2
,"estimator  estimator: estimator object A regressor or a classifier that implements :term:`fit`. When a classifier is passed, :term:`decision_function` will be used in priority and it will fallback to :term:`predict_proba` if it is not available. When a regressor is passed, :term:`predict` is used.",LogisticRegression()
,"n_jobs  n_jobs: int, default=None The number of jobs to use for the computation: the `n_classes` one-vs-rest problems are computed in parallel. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details. .. versionchanged:: 0.20  `n_jobs` default changed from 1 to None",
,"verbose  verbose: int, default=0 The verbosity level, if non zero, progress messages are printed. Below 50, the output is sent to stderr. Otherwise, the output is sent to stdout. The frequency of the messages increases with the verbosity level, reporting all iterations at 10. See :class:`joblib.Parallel` for more details. .. versionadded:: 1.1",0

0,1,2
,"penalty  penalty: {'l1', 'l2', 'elasticnet', None}, default='l2' Specify the norm of the penalty: - `None`: no penalty is added; - `'l2'`: add a L2 penalty term and it is the default choice; - `'l1'`: add a L1 penalty term; - `'elasticnet'`: both L1 and L2 penalty terms are added. .. warning::  Some penalties may not work with some solvers. See the parameter  `solver` below, to know the compatibility between the penalty and  solver. .. versionadded:: 0.19  l1 penalty with SAGA solver (allowing 'multinomial' + L1) .. deprecated:: 1.8  `penalty` was deprecated in version 1.8 and will be removed in 1.10.  Use `l1_ratio` instead. `l1_ratio=0` for `penalty='l2'`, `l1_ratio=1` for  `penalty='l1'` and `l1_ratio` set to any float between 0 and 1 for  `'penalty='elasticnet'`.",'deprecated'
,"C  C: float, default=1.0 Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. `C=np.inf` results in unpenalized logistic regression. For a visual example on the effect of tuning the `C` parameter with an L1 penalty, see: :ref:`sphx_glr_auto_examples_linear_model_plot_logistic_path.py`.",1.0
,"l1_ratio  l1_ratio: float, default=0.0 The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Setting `l1_ratio=1` gives a pure L1-penalty, setting `l1_ratio=0` a pure L2-penalty. Any value between 0 and 1 gives an Elastic-Net penalty of the form `l1_ratio * L1 + (1 - l1_ratio) * L2`. .. warning::  Certain values of `l1_ratio`, i.e. some penalties, may not work with some  solvers. See the parameter `solver` below, to know the compatibility between  the penalty and solver. .. versionchanged:: 1.8  Default value changed from None to 0.0. .. deprecated:: 1.8  `None` is deprecated and will be removed in version 1.10. Always use  `l1_ratio` to specify the penalty type.",0.0
,"dual  dual: bool, default=False Dual (constrained) or primal (regularized, see also :ref:`this equation `) formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer `dual=False` when n_samples > n_features.",False
,"tol  tol: float, default=1e-4 Tolerance for stopping criteria.",0.0001
,"fit_intercept  fit_intercept: bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.",True
,"intercept_scaling  intercept_scaling: float, default=1 Useful only when the solver `liblinear` is used and `self.fit_intercept` is set to `True`. In this case, `x` becomes `[x, self.intercept_scaling]`, i.e. a ""synthetic"" feature with constant value equal to `intercept_scaling` is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``. .. note::  The synthetic feature weight is subject to L1 or L2  regularization as all other features.  To lessen the effect of regularization on synthetic feature weight  (and therefore on the intercept) `intercept_scaling` has to be increased.",1
,"class_weight  class_weight: dict or 'balanced', default=None Weights associated with classes in the form ``{class_label: weight}``. If not given, all classes are supposed to have weight one. The ""balanced"" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. .. versionadded:: 0.17  *class_weight='balanced'*",
,"random_state  random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag', 'saga' or 'liblinear' to shuffle the data. See :term:`Glossary ` for details.",
,"solver  solver: {'lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'}, default='lbfgs' Algorithm to use in the optimization problem. Default is 'lbfgs'. To choose a solver, you might want to consider the following aspects: - 'lbfgs' is a good default solver because it works reasonably well for a wide  class of problems. - For :term:`multiclass` problems (`n_classes >= 3`), all solvers except  'liblinear' minimize the full multinomial loss, 'liblinear' will raise an  error. - 'newton-cholesky' is a good choice for  `n_samples` >> `n_features * n_classes`, especially with one-hot encoded  categorical features with rare categories. Be aware that the memory usage  of this solver has a quadratic dependency on `n_features * n_classes`  because it explicitly computes the full Hessian matrix. - For small datasets, 'liblinear' is a good choice, whereas 'sag'  and 'saga' are faster for large ones; - 'liblinear' can only handle binary classification by default. To apply a  one-versus-rest scheme for the multiclass setting one can wrap it with the  :class:`~sklearn.multiclass.OneVsRestClassifier`. .. warning::  The choice of the algorithm depends on the penalty chosen (`l1_ratio=0`  for L2-penalty, `l1_ratio=1` for L1-penalty and `0 < l1_ratio < 1` for  Elastic-Net) and on (multinomial) multiclass support:  ================= ======================== ======================  solver l1_ratio multinomial multiclass  ================= ======================== ======================  'lbfgs' l1_ratio=0 yes  'liblinear' l1_ratio=1 or l1_ratio=0 no  'newton-cg' l1_ratio=0 yes  'newton-cholesky' l1_ratio=0 yes  'sag' l1_ratio=0 yes  'saga' 0<=l1_ratio<=1 yes  ================= ======================== ====================== .. note::  'sag' and 'saga' fast convergence is only guaranteed on features  with approximately the same scale. You can preprocess the data with  a scaler from :mod:`sklearn.preprocessing`. .. seealso::  Refer to the :ref:`User Guide ` for more  information regarding :class:`LogisticRegression` and more specifically the  :ref:`Table `  summarizing solver/penalty supports. .. versionadded:: 0.17  Stochastic Average Gradient (SAG) descent solver. Multinomial support in  version 0.18. .. versionadded:: 0.19  SAGA solver. .. versionchanged:: 0.22  The default solver changed from 'liblinear' to 'lbfgs' in 0.22. .. versionadded:: 1.2  newton-cholesky solver. Multinomial support in version 1.6.",'lbfgs'


In [None]:
# 예측
test_plot = ["An alien spaceship lands in the middle of a war."]

X_test = vectorizer.transform(test_plot)    # 학습 때 사용한 TF-IDF 기준으로 벡터화
y_pred = clf.predict(X_test)
print(y_pred)

y_pred_proba = clf.predict_proba(X_test)    # 장르별 예측 확률(또는 점수) 계산
print(y_pred_proba)

# 임계치 조정
y_pred = (y_pred_proba >= 0.3).astype(int)  # 0.3 이상이면 해당 장르로 판단
print(y_pred)

y_pred_label = mlb.inverse_transform(y_pred)  # 멀티핫 -> 장르 라벨 리스트로 역변환
y_pred_label

[[0 0 0 0 0 0 0]]
[[0.38623748 0.17121135 0.44411123 0.17121135 0.44411123 0.38623748
  0.20204508]]
[[1 0 1 0 1 1 0]]


[('Action', 'Drama', 'Romance', 'Sci-Fi')]

임계치를 낮추면 재현율은 늘어난다 (미탐이 감소하여 더 많이 예측)  
그렇지만 대신 오탐이 증가할 가능성이 높다.

## RNN기반 다중레이블 분류

In [None]:
# 데이터준비
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import torch

tokenizer = Tokenizer(oov_token='OOV')
tokenizer.fit_on_texts(data['plot'])             # plot text로 단어 사전 학습
X = tokenizer.texts_to_sequences(data['plot'])   # plot 단어 인덱스 -> 시퀀스로 변환
X = pad_sequences(X, maxlen=10)                  # 길이 10 통일 (패딩/자르기)
X = torch.tensor(X, dtype=torch.long)            # 임베딩 입력용 LongTensor로 변환
X

tensor([[ 0,  0,  2,  5,  6,  4,  3,  2,  7,  8],
        [ 0,  0,  0,  0,  2,  9, 10, 11,  3, 12],
        [ 0,  0,  0, 13, 14, 15, 16,  2, 17, 18],
        [ 0,  0,  0,  2, 19, 20,  2, 21,  4, 22],
        [ 0,  2, 23, 24,  3, 25, 26, 27,  2, 28]])

In [None]:
mlb = MultiLabelBinarizer()             # 다중 라벨 -> 멀티핫 변환기
y = mlb.fit_transform(data['genres'])   # 장르 리스트를 0/1 멀티핫 행렬로 변환
y = torch.tensor(y, dtype=torch.float)  # BCE 계열 손실함수 계산용 floatTensor
y

tensor([[1., 0., 0., 0., 0., 1., 0.],
        [0., 0., 1., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 1., 1.],
        [0., 1., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 1., 0., 0.]])

In [None]:
# 모델 생성
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class MultiLabelNet(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)  # 토큰 ID -> 임베딩
        self.gru = nn.GRU(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)  # 마지막 은닉 -> 라벨 수만큼 로짓

    def forward(self, x):
        x = self.embedding(x) # (B, T) -> (batch_size, seq_len, embedding_dim)
        _, hidden = self.gru(x) # (num_layers, batch_size, hidden_dim)
        output = self.fc(hidden[-1]) # (B, H) -> (batch_size, output_dim)
        return output


In [None]:
# GUR 다중 라벨 모델학습
vocab_size = len(tokenizer.word_index) + 1 # padding추가
embedding_dim = 100
hidden_dim = 64
output_dim = len(mlb.classes_) # 7

model = MultiLabelNet(vocab_size, embedding_dim, hidden_dim, output_dim)
criterion = nn.BCEWithLogitsLoss() # 클래스별 sigmoid 사용
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 100
for epoch in range(epochs):
    optimizer.zero_grad()
    output = model(X)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()
    if (epoch + 1) % 10 == 0:
        print(f'Epoch ({epoch + 1}/{epochs}): Loss = {loss.item():.4f}')


Epoch (10/100): Loss = 0.5157
Epoch (20/100): Loss = 0.3570
Epoch (30/100): Loss = 0.2413
Epoch (40/100): Loss = 0.1631
Epoch (50/100): Loss = 0.1153
Epoch (60/100): Loss = 0.0864
Epoch (70/100): Loss = 0.0681
Epoch (80/100): Loss = 0.0557
Epoch (90/100): Loss = 0.0470
Epoch (100/100): Loss = 0.0404


In [None]:
# 예측
test_plot = ["An alien spaceship lands in the middle of a war."]
X_test = tokenizer.texts_to_sequences(test_plot)
X_test = pad_sequences(X_test, maxlen=10)
X_test = torch.tensor(X_test, dtype=torch.long)

model.eval()
with torch.no_grad():
    output = model(X_test)     # 라벨별 로짓 출력
    p = torch.sigmoid(output)  # 로짓 -> 라벨별 확률 (0~1)
    pred = (p >= 0.5).int()    # 임계치 0.5 기준 멀티핫 예측(0/1)
    pred_label = mlb.inverse_transform(pred)  # 멀티핫 -> 라벨 리스트로 변환
    print(pred_label)

[('Action',)]


## BERT Tokenizer/Embedding 적용

In [12]:
%pip install transformers huggingface_hub -q

Note: you may need to restart the kernel to use updated packages.


In [None]:
# BERT 사전학습된 tokernizer/model 가져오기
from transformers import BertTokenizer, BertModel

model_name = 'bert-base-uncased'    # 소문자 영어 기반 BERT 체크포인트
bert_tokenizer = BertTokenizer.from_pretrained(model_name)  # 사전학습 토크나이저 로드
bert_model = BertModel.from_pretrained(model_name)          # 사전학습 BERT 인코더(모델) 로드

  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [None]:
# BERT 임베딩: 차원수 768
import torch

# 여러 문장을 BERT에 넣어 토큰 단위 문맥 임베딩(last_hidden_state)을 반환
def get_bert_embedding(plots):
    encoded = bert_tokenizer(plots, padding=True, truncation=True, return_tensors='pt')  # 토큰화 + 패딩(잘라내기) + 텐서 변환
    # print(encoded.input_ids)
    with torch.no_grad():
        output = bert_model(**encoded) # input_ids, token_type_ids, attention_mask

    return output.last_hidden_state

plots = data['plot'].values.tolist()
X_tensor = get_bert_embedding(plots)
print(X_tensor.shape) # batch_size, seq_len, embedding_dim

torch.Size([5, 12, 768])


In [15]:
# 멀티레이블 전처리
from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
y = mlb.fit_transform(data['genres'])
y_tensor = torch.tensor(y, dtype=torch.float)
y_tensor

tensor([[1., 0., 0., 0., 0., 1., 0.],
        [0., 0., 1., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 1., 1.],
        [0., 1., 0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 1., 0., 0.]])

In [16]:
# 모델 설계
import torch.nn as nn

class MultiLabelNet(nn.Module):
    """
    BERT기반 임베딩처리된 입력을 가지므로, 별도의 Embedding 레이어 사용안함
    """
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.gru = nn.GRU(input_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        _, hidden = self.gru(x)
        output = self.fc(hidden[-1])
        return output

In [17]:
# 모델 학습
import torch.optim as optim

input_dim = X_tensor.shape[-1] # 768 BERT 임베딩 차원수
hidden_dim = 64
output_dim = y_tensor.shape[-1] # 7 예측클래스 수

model = MultiLabelNet(input_dim, hidden_dim, output_dim)
criterion = nn.BCEWithLogitsLoss() # 클래스별 sigmoid 사용
optimizer = optim.Adam(model.parameters(), lr=0.001)

epochs = 100
for epoch in range(epochs):
    optimizer.zero_grad()
    output = model(X_tensor)
    loss = criterion(output, y_tensor)
    loss.backward()
    optimizer.step()
    if (epoch + 1) % 10 == 0:
        print(f'Epoch ({epoch + 1}/{epochs}): Loss = {loss.item():.4f}')


Epoch (10/100): Loss = 0.3896
Epoch (20/100): Loss = 0.2451
Epoch (30/100): Loss = 0.1690
Epoch (40/100): Loss = 0.1238
Epoch (50/100): Loss = 0.0955
Epoch (60/100): Loss = 0.0768
Epoch (70/100): Loss = 0.0635
Epoch (80/100): Loss = 0.0537
Epoch (90/100): Loss = 0.0462
Epoch (100/100): Loss = 0.0404


In [18]:
# 예측
test_plot = ["An alien spaceship lands in the middle of a war."]
X_test = get_bert_embedding(test_plot)

model.eval()
with torch.no_grad():
    output = model(X_test)
    p = torch.sigmoid(output)
    pred = (p >= 0.5).int()
    pred_label = mlb.inverse_transform(pred)
    print(pred_label)

[('Action', 'Sci-Fi')]
