HashtagGen

Model

// Model Description

Usage

step 1： download requirements

conda create -n topic python=3.6
pip install -r requirements.txt
source activate topic

step 2: train/test/eval model

./run.sh gives the commands of train/test/eval model
./bert/[sample|topic|topic_ltp]/bert_config.json gives the train config files, you can follow our configurations.
./bert/[sample|topic|topic_ltp]/vocab.txt gives the bert vocabulary files
you can read ./run.py to get more usage of our code.

Data

We construct a Chinese large-scaletopic hashtag generation dataset (WHG) containing multiple areas from Weibo. It can be download at google drive. We also construct a English dataset from Twitter(THG).

Preview

Here is an example of dataset:

weibo:
src: 天猫2017年双11成交额在今日零时40分20秒左右时突破500亿元。亿邦动力网注意到，2016年凌晨2点钟时，天猫双11成交额达到486亿元。
dst: 2017天猫双11
twitter:
src: former pl ams2 la reina adams credits her time with peo eis in her development as a leader . talent management is one of ms. smiths key priorities as peo . usa as c us army army acquisition
dst: talent management

Table 1: Data of WHG

WeiBo: WHG Dataset	Train	Dev	Test
Count	312,762	20,000	20,000
AvgSourceLen (+W)	75.1	75.3	75.6
CovSourceLen(95%)(+W)	141	137	145
AvgTargetLen(+W)	54.2	4.2	4.2
CovTargetLen(95%)(+W)	8	8	8

Table 2: Data of THG

Twitter: THG Dataset	Train	Dev	Test
Count	204,039	11,335	11,336
AvgSourceLen	23.5	23.8	23.5
CovSourceLen(95%)	46	47	46
AvgTargetLen	10.1	10.0	10.0
CovTargetLen(95%)	30	30	30

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bert		bert
data/sample		data/sample
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
extract_features.py		extract_features.py
model.py		model.py
modeling.py		modeling.py
modeling_bert.py		modeling_bert.py
optimization.py		optimization.py
requirements.txt		requirements.txt
run.py		run.py
run.sh		run.sh
run_test.sh		run_test.sh
tokenization.py		tokenization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HashtagGen

Model

Usage

Data

Preview

About

Releases

Packages

Languages

License

whxf/HashtagGen

Folders and files

Latest commit

History

Repository files navigation

HashtagGen

Model

Usage

Data

Preview

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages