Ranked 2nd(team: baseIine) over 450 teams in E-commerce Product Classification | Kakao AI Challenge 2018
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
docs
models
scripts
.dockerignore
.gitignore
Dockerfile
LICENSE.md
README.md
build.sh
eval.py
inference.py
input_generator.py
logger.py
losses.py
misc.py
reproduce.sh
run.sh
train.py
utils.py

README.md

Kakao Arena - Product Classification

Code for '쇼핑몰 상품 카테고리 분류'

  • Team: baseIine

Public Leaderboard(2019/01/07)

public leaderboard

Features

  • Fully dockerized environment
  • Input Pipeline
    • Tokenize product metadata with Okt POS Tagger
    • Use TFRecord
  • 5 classifiers with 2-layer MLP
    • one for concatenated label of b,m,s,d
    • 4 classifiers for each category
  • Adversarial Training

Results

  • The metric 'score' is calculated by the equation as follows:
    • score=(1.0 * b_acc + 1.2 * m_acc + 1.3 * s_acc + 1.4 * d_acc)
  • The model Final was used to report our final results on dev, test
  • Download trained weights here
Model Dev score Test score(TBD) File Size
Intermediate 1.07799 - 966MB
Ensemble 1.080755 - 5*966MB
*Final 1.077696 - 966MB

Requirements

  • Docker
  • python >=2.7
    • Tensorflow >=1.12
    • Keras
    • Othres: h5py, tqdm, easydict
  • Enough storage space at least 400GB

Reproduce results

Setup

  1. Download datasets from kakao arena

Run a docker

$ bash build.sh
$ bash run.sh

[Note] Edit DATA_PATH from run.sh

For example,

ls $DATA_PATH
|- dev.chunk.01
|- test.chunk.01
|- test.chunk.02
|- train.chunk.01
|- train.chunk.02
|- train.chunk.03
|- train.chunk.04
|- train.chunk.05
|- train.chunk.06
|- train.chunk.07
|- train.chunk.08
`- train.chunk.09

Option1: Use pretrained weights

  1. Download weights Dropbox Link
  2. Copy weights to /data/output/interim, /data/output/final
$ bash scripts/eval.sh 0 interim 70 # for validation
$ bash scripts/inference.sh 0 interim 70 dev # for submission
$ bash scripts/inference.sh 0 interim 70 test # for submission

$ bash scripts/inference.sh 0 final 12 dev # for submission
$ bash scripts/inference.sh 0 final 12 test # for submission

Option2: Train a model from scratch

$ bash reproduce.sh

Reference

  1. Baseline code
  2. KoNLPy: Korean natural language processing in Python
  3. Adversarial Training

License

© Taekmin Kim, 2019. Licensed under the MIT License.