Issue99 #1946

patricknaughton01 · 2021-01-23T03:24:41Z

Hello,

I took a crack at issue 99. I added three functions to the pdf recognition process that try to extract the DOI, ISBN, and/or year from the file name of an added pdf if the information can't be found in the pdf metadata. Right now these are three separate functions, but they could easily be merged into one function that just also accepts a regular expression. I wasn't sure what would work better stylistically (I thought it would be ugly to have regular expressions floating around in the normal pdf recognition code, but the current implementation is also quite repetitive).

I looked at the contribution guide but didn't see anything about formatting the commit history. Right now these edits are in three separate commits (and there's a merge commit that I pulled from the upstream master). Would you prefer that I squash these down into just one change?

Thanks for keeping this project open source. I use Zotero pretty much everyday and really appreciate the work you all do.

P.S. I think my IDE also inadvertently eliminated a lot of trailing whitespace...

Closes #99

If the DOI cannot be found in the pdf metadata itself, the recognizer now also tries to parse the file name of the pdf for something that looks like a DOI. Since the `/` character cannot be used in file names, the method (`_extractDOIFromFileName`) assumes that this character will be replaced by `@`.

Added a similar method to check for (13-digit) ISBNs in the file name of an added pdf.

The year function looks for four digit numbers starting with 20 or 19 in the file name.

dstillman

@mrtcode: Any problems you see with this? If a DOI or ISBN is recognized by the server, or if a year is recognized/retrieved, these won't be used.

chrome/content/zotero/xpcom/recognizePDF.js

…ocale/csl

patricknaughton01 · 2021-06-06T04:02:45Z

Sorry these changes took so long, if you're still considering this PR, I went through and made the whitespace changes. It looks like the testing is not running automatically anymore but I merged the latest updates to the master branch in so that there are no conflicts. Thanks.

patricknaughton01 added 4 commits January 22, 2021 20:08

Added ISBN checking

627f9af

Added a similar method to check for (13-digit) ISBNs in the file name of an added pdf.

Added similar function for finding the year

702b418

The year function looks for four digit numbers starting with 20 or 19 in the file name.

Merge branch 'master' into issue99

663244e

dstillman requested changes Jan 31, 2021

View reviewed changes

patricknaughton01 added 2 commits February 1, 2021 22:41

Remove debugs, fix formatting, declare vars

9605eb4

Using convenience functions for DOI and ISBN

448da02

dstillman reviewed Feb 10, 2021

View reviewed changes

chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved

dstillman requested changes May 16, 2021

View reviewed changes

chrome/content/zotero/xpcom/recognizePDF.js Outdated Show resolved Hide resolved

patricknaughton01 added 7 commits June 5, 2021 22:41

File name -> filename

3f2e59b

Cleaner year regex checking

8412fd9

Merge branch 'master' of https://github.com/zotero/zotero into issue99

eda8bcc

Undoing translator changes

f4b2a0f

Undoing changes to resource/schema/global and chrome/content/zotero/l…

2168532

…ocale/csl

Undoing whitespace changes

c15059e

Fine grained whitespace edits

ebcd933

patricknaughton01 requested a review from dstillman June 6, 2021 18:52

dstillman force-pushed the master branch 2 times, most recently from 2cad5b0 to 08213eb Compare May 21, 2023 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue99 #1946

Issue99 #1946

patricknaughton01 commented Jan 23, 2021 •

edited

dstillman left a comment

patricknaughton01 commented Jun 6, 2021

Issue99 #1946

Are you sure you want to change the base?

Issue99 #1946

Conversation

patricknaughton01 commented Jan 23, 2021 • edited

dstillman left a comment

Choose a reason for hiding this comment

patricknaughton01 commented Jun 6, 2021

patricknaughton01 commented Jan 23, 2021 •

edited