Skip to content

tangg555/acl-anthology-helper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

acl-anthology-helper

License: MIT

To help search, filter, and download papers from 'acl anthology' (https://aclanthology.org/).

Main Features

  • Retrieve papers from acl anthology.
    retrieve directly from website acl anthology.
    e.g. Retriever.acl(2021, ConfConsts.LONG)
    download all papers's info to local (MySQL database).
    e.g.
    db = AnthologyMySQL(cache_enable=True)
    db.create_tables()
    db.load_data() # load data and put into database
  • Import ABuilder to support chain operations for MySQL.
    e.g.
    data = ABuilder().table('paper').where({"year": ["in", years_limit]}).where({"venue": ["in", venue_limit]}).query()
  • Filter papers with by keyword.
    e.g. filtered = papers.filter('title', 'xxx') | papers.filter('abstract', 'xxx')
    e.g. filtered = papers.and_containing_filter(attr, [keyword1, keyword2])
  • Download papers.
    e.g. downloader.multi_download(filtered, download_path)
  • Local cache available.
  • Log available.
  • Statistics available (although I only count the total number of papers).

Get Started

  • Firstly. MySQL is required. Mine is MySQL 8.
    Configurate your MySQL database and add a src/configuration/mysql_cfg.py file.
    The example of src/configuration/mysql_cfg.py is as follows:
class MySQLCFG(object):
    HOST = 'localhost'
    PORT = 3306
    USER = "root"
    PASSWORD = "xxx"
    DB = "xxx"

Meanwhile, create the corresponding database in your MySQL database.

- Secondly. If you want to use ABuilder.
You need to make a tasks/database.py with configurations of you MySQL.
You can refer to the homepage of ABuilder.

In the latest version, I made the tasks/database.py get info from the configuration. No need to make this file any more:

  • Download and decompress the code, open a terminal and checkout to the root directory.
    run
pip install requirements.txt
cd tasks
python basic_task.py

By running this code, this basic_task will firstly download all papers within a certain time span from Acl Anthology to the local disk, and then search papers by input key words.

Note

1. Comments

I develop this project by Python 3.6, and it doesn't support python 2.

2023.6.14 The code is updated to support the lastest acl anthology pages. Current python version is 3.10 . 2023.7.2 Update the README.

2. A survey paper is written with this tool

@article{tang2022recent,
  title={Recent advances in neural text generation: A task-agnostic survey},
  author={Tang, Chen and Guerin, Frank and Li, Yucheng and Lin, Chenghua},
  journal={arXiv preprint arXiv:2203.03047},
  year={2022}
}

3. Others

homepage

There are many conferences and contents belonging to them.

Choose one, and we can see papers' list.

About

To help search, filter, and download papers from 'acl anthology' (https://aclanthology.org/).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages