KLUE_Relation Extraction

🔥 Getting Started

Dependencies

torch==1.7.1
transformers==4.10.0
pandas==1.1.5
scikit-learn==0.24.1
matplotlib==3.3.4
tqdm==4.51.0
numpy==1.19.2
glob2==0.7

Install Requirements

pip install -r requirements.txt

Training

SM_CHANNEL_TRAIN=[train csv data dir] SM_MODEL_DIR=[model saving dir] python train.py

Inference

SM_CHANNEL_EVAL=[test csv dir] SM_CHANNEL_MODEL=[model saved dir] SM_OUTPUT_DATA_DIR=[inference output dir] python inference.py

🔍 Overview

Background

관계 추출(Relation Extraction)이란 문장의 단어(Entity)에 대한 속성과 관계를 예측하는 문제입니다.
문장 속에서 단어간에 관계성을 파악하는 관계 추출(Relation Extraction)은 지식 그래프 구축을 위한 핵심 구성 요소로, 구조화된 검색, 감정 분석, 질문 답변하기, 요약과 같은 자연어처리 응용 프로그램에서 중요합니다. 비구조적인 자연어 문장에서 구조적인 triple을 추출해 정보를 요약하고, 중요한 성분을 핵심적으로 파악할 수 있습니다.

Problem definition

주어진 문장과 문장의 단어(subject entity, object entity)를 이용하여,
subject entity와 object entity가 어떤 관계가 있는지 예측하는 시스템 or 모델 구축하기

Development environment

GPU V100 원격 서버
PyCharm 또는 Visual Studio Code | Python 3.7(or over)

Evaluation

Dataset Preparation

Prepare Images

train.csv: 총 32470개
test_data.csv: 총 7765개 (정답 라벨은 blind = 100으로 임의 표현)
Input: 문장과 두 Entity의 위치(start_idx, end_idx)
Target: 카테고리 30개 중 1개

Data Labeling

크게 no-relation, org, per기준 30개의 클래스로 분류

🏃 Training

# 단일 모델 train 시
$ python new_mlm.py

Train Models

RoBERTa
- klue/roberta-small(https://huggingface.co/klue/roberta-small)
- klue/roberta-base(https://huggingface.co/klue/roberta-base)
- klue/roberta-large(https://huggingface.co/klue/roberta-large/tree/main)
BERT
- klue/bert-base(https://huggingface.co/klue/bert-base)
xlm-roberta-base
koelectra-base

# 단일 모델 train 시
$ python train.py

Stratified K-fold

from sklearn.model_selection import StratifiedKFold

# cross_validation 사용해 train 시
$ python train.py --cv True

train.py cross_validation 함수

💭 Inference

# cross_validation 을 사용안할시
$ python inference.py \
  --model_name={kinds of models} \
  --model_dir={model_filepath} \
  --output_name={output_filename} \
  --inference_type=default \
  --run_name = exp\
  --cv = False\
  --tem = (typed entitiy 사용시 True, 아니면 False)

# cross_validation 을 사용해 나온 model 5개를 통해 inference 시
$ python inference.py \
  --model_name={kinds of models} \
  --model_dir={model_filepath} \
  --output_name={output_filename} \
  --inference_type = cv\
  --run_name = exp\
  --cv = True\
  --tem = (if typed entity: True, else: False)

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.gitignore		.gitignore
README.md		README.md
dict_label_to_num.pkl		dict_label_to_num.pkl
dict_num_to_label.pkl		dict_num_to_label.pkl
hyperparameter_search.py		hyperparameter_search.py
inference.py		inference.py
load_data.py		load_data.py
loss.py		loss.py
model.py		model.py
new_mlm.py		new_mlm.py
requirements.txt		requirements.txt
train.py		train.py
train_v1.csv		train_v1.csv
utils.py		utils.py

whatchang/klue-level2-nlp-04

Folders and files

Latest commit

History

Repository files navigation

KLUE_Relation Extraction

🔥 Getting Started

Dependencies

Install Requirements

Contents

Training

Inference

🔍 Overview

Background

Problem definition

Development environment

Evaluation

Dataset Preparation

Prepare Images

Data Labeling

🏃 Training

Train Models

Stratified K-fold

💭 Inference

About

Resources

Stars

Watchers

Forks

Languages