Seed_Cup_TextCNN

https://github.com/Dedsec-Xu/Seed_Cup_TextCNN

Code for Seed Cup Machine Learning competetion. We developed a CNN NLP model which can process Masked Data of a product description and decide the category of the product. The final score is 86.04%.

Example

By analyzing this:

We get the category of the product

Category

data/: stores dataset

model/: stores model

config.py: configuration file

class_idx.py: to map the output number to product category

word_idx.py: use onehot to generate number

accurancy.py: F1 class uses save_data to save data from batch and caculate_f1 calculates f1

Dataloader.py: load files

main.py: include train() and val()

test.py: generate test results

word_preprocess.py: process dataset and generate npy for data analyze

word_plot.py: visualize words

word_delete.py: delete a specific word in dataset

word_delete_high_freq.py: calculate words over thereshold and save them into npy

connect_all.py: to put two datasets together. This give us 0.82 improvement

Output Score

Output achieved 0.8604 f_point

f1_cate1: 0.9577

f1_cate2: 0.8884

f1_cate3: 0.8302

Instruction

Use python main.py to train

Use python test.py to run test

Enviorment

Ubuntu 18.04

python:3.7.6

cuda:9.0.176

torch 0.4.1

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
image		image
model		model
DataLoader.py		DataLoader.py
README.md		README.md
README_ZH.MD		README_ZH.MD
accurancy.py		accurancy.py
cal.py		cal.py
class_idx.py		class_idx.py
config.py		config.py
connect_all.py		connect_all.py
main.py		main.py
test.py		test.py
word_delete.py		word_delete.py
word_delete_high_freq.py		word_delete_high_freq.py
word_idx.py		word_idx.py
word_plot.py		word_plot.py
word_preprocess.py		word_preprocess.py

yoghur/Seed_Cup_TextCNN

Folders and files

Latest commit

History

Repository files navigation

Seed_Cup_TextCNN

Example

Category

Output Score

Instruction

Enviorment

About

Resources

Stars

Watchers

Forks

Languages