Skip to content
This repository was archived by the owner on Apr 10, 2019. It is now read-only.

chakki-works/elephant_sense

Repository files navigation

elephant-sense

Content itself quality evaluation by machine learning

top.PNG

You can try from Here.

Setup

Get Qiita API token and set it to environment variable.

$ export QiitaToken=xxx

(only read_qiita scope is required)

Then use Dockerfile and run!

For Training the Model

Data Preparation

  • Locate the Qiita posts on data/raw/items
    • You can get Qiita posts by Qiita API
    • 1 post is 1 json file whose name is post id (like 0a0000aa0a0000a00aa0.json).
  • Locate the annotated file labeled_qiita_posts.csv on data/raw.
    • It's format is No,url,Title, and annotator1, annotator2... (column names are as you like ).

Data Preprocessing

Run the following script.

python scripts/data/make_data.py

Then, labeled json file is stored at data/processed/items.

Next, execute preprocessing.

python scripts/data/preprocessing.py

posts.json will be created at data/processed/.
posts.json includes splited tokens of each posts. You can use this to get the words in the posts.

About

content itself quality evaluation by machine learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •