New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue99 #1946
base: main
Are you sure you want to change the base?
Issue99 #1946
Conversation
If the DOI cannot be found in the pdf metadata itself, the recognizer now also tries to parse the file name of the pdf for something that looks like a DOI. Since the `/` character cannot be used in file names, the method (`_extractDOIFromFileName`) assumes that this character will be replaced by `@`.
Added a similar method to check for (13-digit) ISBNs in the file name of an added pdf.
The year function looks for four digit numbers starting with 20 or 19 in the file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrtcode: Any problems you see with this? If a DOI or ISBN is recognized by the server, or if a year is recognized/retrieved, these won't be used.
Sorry these changes took so long, if you're still considering this PR, I went through and made the whitespace changes. It looks like the testing is not running automatically anymore but I merged the latest updates to the master branch in so that there are no conflicts. Thanks. |
2cad5b0
to
08213eb
Compare
Hello,
I took a crack at issue 99. I added three functions to the pdf recognition process that try to extract the DOI, ISBN, and/or year from the file name of an added pdf if the information can't be found in the pdf metadata. Right now these are three separate functions, but they could easily be merged into one function that just also accepts a regular expression. I wasn't sure what would work better stylistically (I thought it would be ugly to have regular expressions floating around in the normal pdf recognition code, but the current implementation is also quite repetitive).
I looked at the contribution guide but didn't see anything about formatting the commit history. Right now these edits are in three separate commits (and there's a merge commit that I pulled from the upstream master). Would you prefer that I squash these down into just one change?
Thanks for keeping this project open source. I use Zotero pretty much everyday and really appreciate the work you all do.
P.S. I think my IDE also inadvertently eliminated a lot of trailing whitespace...
Closes #99