Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Convert to python package with cli #2

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ahmed-shariff
Copy link

@ahmed-shariff ahmed-shariff commented Mar 26, 2022

Following the discussion in #1 , this PR aims for the following:

  • Convert the project into a package
    I am using poetry to manage the packaging aspect. The package is called paperscraper.
  • Wrap package functions to cli. The cli command is also using the same name as the package.
    • process : command group for the different processes to process data. All subcommand can take one optional flag -f/--force. When this flag is not a given processes will run only when the corresponding output of the process doesn't exist.
      • run_all: run all the processes.
      • db: clean the xml file
      • venues: extract unique venues
      • data-extraction: extract the data from dblp snapshot
      • collect-data: scrape additional information
      • postprocess: clean and extract unique data
    • search: takes pattern string and returns any entries that has a match in the title or abstract. By default uses fuzzy matching. Has the following options:
      • --venue: filter by venue. Can have multiple --venue. Each can be a partial match to either full name or short name.
      • --author: filter by author. Can have multiple --author. Each can be a partial match.
      • --re: a flag, when set, the pattern will be treated as regex.
      • --fuzzy-max-difference: the maximum number of differences allowed from the pattern to get a match.
    • list: summery of the data (lists venues)

@arpitnarechania
Copy link
Member

Thanks @ahmed-shariff, especially for the kind of API you have in mind - looks very promising. I will set it up locally this weekend and get back to you with a more detailed response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants