Skip to content

whxf/HashtagGen

 
 

Repository files navigation

HashtagGen

Model

// Model Description

Usage

step 1: download requirements

conda create -n topic python=3.6
pip install -r requirements.txt
source activate topic

step 2: train/test/eval model

  1. ./run.sh gives the commands of train/test/eval model
  2. ./bert/[sample|topic|topic_ltp]/bert_config.json gives the train config files, you can follow our configurations.
  3. ./bert/[sample|topic|topic_ltp]/vocab.txt gives the bert vocabulary files
  4. you can read ./run.py to get more usage of our code.

Data

We construct a Chinese large-scaletopic hashtag generation dataset (WHG) containing multiple areas from Weibo. It can be download at google drive. We also construct a English dataset from Twitter(THG).

Preview

Here is an example of dataset:

weibo:
src: 天猫2017年双11成交额在今日零时40分20秒左右时突破500亿元。亿邦动力网注意到,2016年凌晨2点钟时,天猫双11成交额达到486亿元。
dst: 2017天猫双11
twitter:
src: former pl ams2 la reina adams credits her time with peo eis in her development as a leader . talent management is one of ms. smiths key priorities as peo . usa as c us army army acquisition
dst: talent management

Table 1: Data of WHG

WeiBo: WHG Dataset Train Dev Test
Count 312,762 20,000 20,000
AvgSourceLen (+W) 75.1 75.3 75.6
CovSourceLen(95%)(+W) 141 137 145
AvgTargetLen(+W) 54.2 4.2 4.2
CovTargetLen(95%)(+W) 8 8 8

Table 2: Data of THG

Twitter: THG Dataset Train Dev Test
Count 204,039 11,335 11,336
AvgSourceLen 23.5 23.8 23.5
CovSourceLen(95%) 46 47 46
AvgTargetLen 10.1 10.0 10.0
CovTargetLen(95%) 30 30 30

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Shell 0.4%