elephant-sense

Content itself quality evaluation by machine learning

Setup

Get Qiita API token and set it to environment variable.

$ export QiitaToken=xxx

(only read_qiita scope is required)

Then use Dockerfile and run!

For Training the Model

Data Preparation

Locate the Qiita posts on data/raw/items
- You can get Qiita posts by Qiita API
- 1 post is 1 json file whose name is post id (like 0a0000aa0a0000a00aa0.json).
Locate the annotated file labeled_qiita_posts.csv on data/raw.
- It's format is No,url,Title, and annotator1, annotator2... (column names are as you like ).

Data Preprocessing

Run the following script.

python scripts/data/make_data.py

Then, labeled json file is stored at data/processed/items.

Next, execute preprocessing.

python scripts/data/preprocessing.py

posts.json will be created at data/processed/.
posts.json includes splited tokens of each posts. You can use this to get the words in the posts.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
docs		docs
elephant_sense		elephant_sense
models		models
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
requirements_app.txt		requirements_app.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

elephant-sense

Setup

For Training the Model

Data Preparation

Data Preprocessing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

chakki-works/elephant_sense

Folders and files

Latest commit

History

Repository files navigation

elephant-sense

Setup

For Training the Model

Data Preparation

Data Preprocessing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages