Skip to content

A personal implementation of CNN text (news) classifier with TensorFlow and Keras for Chinese Corpus.

Notifications You must be signed in to change notification settings

tonyzhang95/Text-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

CNN Text Classifier for Chinese Corpus: News

I trained this CNN text classifier on Chinese news during my internship with Alibaba Cloud, and achieved 99% accuracy on test sets.

The sources news text are collected from internal APIs and are labeled with their corresponding category. The deep learning architecture used in this project is CNN. We also tried RNN with LSTM, but the results are worse and training takes more time. We believe the reason for CNN to work for this task is that the news are inherently focused on one topic, short, and self-contained. The contextual significance is also reasonably constant throughout a single piece of news, unlike chat bots usually use RNN because newer conversations carry more weight than older conversations.

I used TensorFlow for the first implementation, achieved ~95%, and later switched to Keras, which is built on top of TensorFlow, to achieve ~99% accuracy.

About

A personal implementation of CNN text (news) classifier with TensorFlow and Keras for Chinese Corpus.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published