Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue99 #1946

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Issue99 #1946

wants to merge 13 commits into from

Conversation

patricknaughton01
Copy link

@patricknaughton01 patricknaughton01 commented Jan 23, 2021

Hello,

I took a crack at issue 99. I added three functions to the pdf recognition process that try to extract the DOI, ISBN, and/or year from the file name of an added pdf if the information can't be found in the pdf metadata. Right now these are three separate functions, but they could easily be merged into one function that just also accepts a regular expression. I wasn't sure what would work better stylistically (I thought it would be ugly to have regular expressions floating around in the normal pdf recognition code, but the current implementation is also quite repetitive).

I looked at the contribution guide but didn't see anything about formatting the commit history. Right now these edits are in three separate commits (and there's a merge commit that I pulled from the upstream master). Would you prefer that I squash these down into just one change?

Thanks for keeping this project open source. I use Zotero pretty much everyday and really appreciate the work you all do.

P.S. I think my IDE also inadvertently eliminated a lot of trailing whitespace...

Closes #99

If the DOI cannot be found in the pdf metadata itself, the recognizer
now also tries to parse the file name of the pdf for something
that looks like a DOI. Since the `/` character cannot be used in
file names, the method (`_extractDOIFromFileName`) assumes that this
character will be replaced by `@`.
Added a similar method to check for (13-digit) ISBNs in the file
name of an added pdf.
The year function looks for four digit numbers starting with
20 or 19 in the file name.
Copy link
Member

@dstillman dstillman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrtcode: Any problems you see with this? If a DOI or ISBN is recognized by the server, or if a year is recognized/retrieved, these won't be used.

chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved
chrome/content/zotero/xpcom/recognizePDF.js Show resolved Hide resolved
@patricknaughton01
Copy link
Author

Sorry these changes took so long, if you're still considering this PR, I went through and made the whitespace changes. It looks like the testing is not running automatically anymore but I merged the latest updates to the master branch in so that there are no conflicts. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Use filename components in metadata retrieval
2 participants