twitter_sentiment_bert_scikit

Twitter US Airline数据集情感分析（sentiment Analysis），使用Bert Sentence encoding作为特征，实现了SVM、XGBoost、RandomForest（随机森林）等若干分类算法，做了交叉验证。

The data comes from Kaggle Twitter US Airline Sentiment Dataset

Preinstallation

We run the project in a Python 3 Environment, and we recommend you use Anaconda 3 to install the required package with the scripts below. Of course you can use pip to install.

 conda create -n tweet_sentiment -c anaconda python=3.7 numpy scikit-learn xgboost pandas tensorflow tensorflow-gpu

 conda activate tweet_sentiment

 pip install bert-serving-server # bert-service-server

 pip install bert-serving-client # bert-service-client

make sure Tensorflow >= 1.10

Download BERT Model and Start BERT-Service

Go to Bert and download a model.

In this project we choose BERT-Large, Uncased (WWM)

Download the model , unzip it, and start the bert-service. You can also refer to bert-as-service, but in this project you only need to run the scripts below to achieve our goal.

 mkdir /tmp/bert_models

 unzip -d /tmp/bert_models/wwm_uncased_L-24_H-1024_A-16.zip # or another model zip file you downlaod

 bert-serving-start -model_dir /tmp/bert_models/wwm_uncased_L-24_H-1024_A-16/ -num_worker=4 -max_seq_len 256

Launch the Sentiment Classification Script

Open another terminal or screen while running the BERT-Service

 conda activate tweet_sentiment

 python gen_vec.py # generate the sentence vectors and save to the npy file

 python model.py -d svm # svm model

 python model.py -d rf # random forest model

 python model.py -d xb # xgboost model

RESULT

SVM Precision: 88%

Random Forest Precision:76%

XGBoost Precision:79%

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitignore		.gitignore
README.md		README.md
gen_vec.py		gen_vec.py
model.py		model.py
基于支持向量机、随机森林与XGBoost的推特情感分类.pdf		基于支持向量机、随机森林与XGBoost的推特情感分类.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

README.md

README.md

gen_vec.py

gen_vec.py

model.py

model.py

基于支持向量机、随机森林与XGBoost的推特情感分类.pdf

基于支持向量机、随机森林与XGBoost的推特情感分类.pdf

Repository files navigation

twitter_sentiment_bert_scikit

Preinstallation

Download BERT Model and Start BERT-Service

Launch the Sentiment Classification Script

RESULT

About

Releases

Packages

Languages

zhengyima/twitter_sentiment_bert_scikit

Folders and files

Latest commit

History

Repository files navigation

twitter_sentiment_bert_scikit

Preinstallation

Download BERT Model and Start BERT-Service

Launch the Sentiment Classification Script

RESULT

About

Topics

Resources

Stars

Watchers

Forks

Languages