[WIP] Convert to python package with cli #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Following the discussion in #1 , this PR aims for the following:
I am using poetry to manage the packaging aspect. The package is called
paperscraper
.process
: command group for the different processes to process data. All subcommand can take one optional flag-f
/--force
. When this flag is not a given processes will run only when the corresponding output of the process doesn't exist.run_all
: run all the processes.db
: clean the xml filevenues
: extract unique venuesdata-extraction
: extract the data from dblp snapshotcollect-data
: scrape additional informationpostprocess
: clean and extract unique datasearch
: takespattern
string and returns any entries that has a match in the title or abstract. By default uses fuzzy matching. Has the following options:--venue
: filter by venue. Can have multiple--venue
. Each can be a partial match to either full name or short name.--author
: filter by author. Can have multiple--author
. Each can be a partial match.--re
: a flag, when set, thepattern
will be treated as regex.--fuzzy-max-difference
: the maximum number of differences allowed from thepattern
to get a match.list
: summery of the data (lists venues)