For the abstractive summarization task, I wanted to experiment the transformer model. I recreated a transformer model (thanks to tensorflow transformer tutorial) and added a pointer module (have a look at this paper for more informations on the pointer generator network : https://arxiv.org/abs/1704.04368 ).
PS : I will add very soon a section explaining the integration of the pointer module in the transformer
Please follow the next steps to launch the project :
Download the data (chunk files format : tfrecords) https://drive.google.com/open?id=1uHrMWd7Pbs_-DCl0eeMxePbxgmSce5LO
Use this project : https://github.com/steph1793/CNN-DailyMail-Bin-To-TFRecords
python main.py --max_enc_len=400 \
--max_dec_len=100 \
--batch_size=16 \
--vocab_size=50000 \
--num_layers=3 \
--model_depth=512 \
--num_heads=8 \
--dff=2048 \
--seed=123 \
--log_step_count_steps=1 \
--max_steps=230000 \
--mode=train \
--save_summary_steps=10000 \
--checkpoints_save_steps=10000 \
--model_dir=model_folder \
--data_dir=data_folder \
--vocab_path=vocab \
PS : Feel free to change some of the hyperparameters
python main.py --help , for more details on the hyperparameters
- python >= 3.6
- tensorflow 2.0.0
- argparse
- os
- glob
- numpy