Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show modified save icon if URL/DOI exists in library #1007

Open
dstillman opened this issue May 23, 2016 · 20 comments
Open

Show modified save icon if URL/DOI exists in library #1007

dstillman opened this issue May 23, 2016 · 20 comments

Comments

@dstillman
Copy link
Member

@dstillman dstillman commented May 23, 2016

Warning somehow when people are about to save an existing item is a common request (most recently here). It's problematic in general, since data isn't available without running the translation process, which we can't really do during normal browsing (too slow, too many site requests, etc.). But there's some low-hanging fruit here that we should probably do — specifically, checking for an existing URL or DOI in the library.

One option here would be to have detectWeb() return an object that included one or more properties with arrays of ids, in addition to itemType. But we'd have to add support for that in advance of rolling out any translators that used it, or else we'd break detection for older versions. Another option would be to have detectWeb() call a certain function with ids it found.

The connector would then have to check the target library for those ids via a call to Zotero or the API.

@bwiernik

This comment has been minimized.

Copy link
Contributor

@bwiernik bwiernik commented May 24, 2016

This feature should probably be limited to the current library, rather than any library in the user's client.

@dstillman

This comment has been minimized.

Copy link
Member Author

@dstillman dstillman commented May 24, 2016

Yes, this would only apply to the target library.

@zuphilip

This comment has been minimized.

Copy link
Contributor

@zuphilip zuphilip commented May 24, 2016

In the detectWeb() we have normally just the information about the URL we are trying to translate. Thus, we could try to check exactly this URL against anything in the database. But it is not uncommon that there is some sort of permanent link which looks different, or user are seeing another URL because they are using a proxy, or the website adds variable information in parameters of the URL.

The DOI translator is currently by default showing the multiple icon, such that it would be impossible (?) to visually show that some of the DOIs already exists in the library but some other not.

Maybe, we should save in each translation process also the URL exactly as it appears in the browser in a extra field urlOriginal (maybe with some normalization). Next time, visiting the same website, we could compare the URL with this urlOriginal and show another icon. (I would suggest to exclude pages with multiple for this anyway.)

@dstillman

This comment has been minimized.

Copy link
Member Author

@dstillman dstillman commented May 24, 2016

My point was that detectWeb() would, where appropriate, pull out identifiers it could get from the page without much of a performance impact (meaning mostly additional HTTP requests). So, for example, it could identify the DOIs on the page, which it's looking for anyway, and if the only one(s) it found were already in the database, the icon could reflect that. Same with URLs — if detectWeb could determine the canonical URL easily, it could return that.

I don't think storing the exact current URL would be useful, given query parameters, etc.

But that's a fair point about multiple and the icon, at least when there are some that exist and some that don't. An alternative for that would be to give some indication in the Select Items dialog.

@dstillman

This comment has been minimized.

Copy link
Member Author

@dstillman dstillman commented May 24, 2016

Basically, I don't think we want to change the DOI/multiple behavior, and if we're going to do this sort of detection at all I don't think we can exclude DOIs, since those provide the best mechanism for this. So I think we need solutions to handle some/all matches on a page existing.

@bwiernik

This comment has been minimized.

Copy link
Contributor

@bwiernik bwiernik commented May 26, 2016

If this feature were implemented, I think the most intuitive behavior for multiple would be to indicate the matching items in the Select Items dialog (perhaps by setting the text color to be the same color as the icon for non-multiple detection, e.g., blue instead of black).

@JessRiedel

This comment has been minimized.

Copy link

@JessRiedel JessRiedel commented Oct 31, 2017

Note that, in addition to preventing accidental duplicates in the library, this feature would help the user to more quickly access any relevant PDF comments they had made when a colleague passes them a link to a paper the user knows is in their library.

@dstillman

This comment has been minimized.

@gurdas

This comment has been minimized.

Copy link

@gurdas gurdas commented Oct 31, 2017

The prevent duplicates plug-in capability is my most (and maybe only) missed feature from Zotero for Firefox. Dan, I second the low hanging fruit comment. Just checking for URL and DOI will hit most of the use cases, making that a solid minimum viable product/feature.

@dstillman

This comment has been minimized.

Copy link
Member Author

@dstillman dstillman commented Nov 1, 2017

Unfortunately I think I was mistaken on the "low-hanging fruit" part. URL is probably doable — I would guess that we can usually determine the canonical URL in detectWeb() — but we don't know whether a given DOI on the page corresponds to the current page or is just a citation, which is the whole reason we always display the folder icon for DOI. (Maybe we need to start encouraging sites to embed DOIs in embedded metadata in page headers.)

So we probably need to rethink the goal here. Instead of trying to show a modified icon, which we could do at best sporadically, the goal may just be to help people avoid ending up with duplicate items, which we can do by showing a notification after translation — when we have all the data — that the item exists in the target library.

Possible options in that dialog:

  1. Save a new item
  2. Replace the existing item's metadata with the new metadata (skipping new files?) and add it to the current collection
  3. Skip the save but add existing item to the current collection

It seems like 2 is better handled in the client via item-merging — where you have the field-level merge UI — or via some sort of Update Metadata function. And there are probably other, better solutions we can implement for adding a PDF to an existing item (e.g., when you save a PDF directly via the save button, have it retrieve metadata and then offer to add the PDF to an existing matching item, or allow you to choose an existing item manually (which isn't really much better than drag and drop but some people would find easier)). So that leaves 1 and 3: Save Anyway and Use Existing Item(s), say. ("Use Existing Item(s)" isn't quite right for the second button. Add Existing Item(s) to “[Collection]”/Skip — depending on whether the items were in the current collection — would be better, but we're going to explore adding a target collection picker to the save popup, and that wouldn't show up until after this dialog. So either the dialog would need the same collection picker or the second button couldn't reflect the target.)

(For focusing the item in Zotero, as @JessRiedel suggests, I think we can make that an option from the save popup in all cases, so we can leave that aside.)

@gurdas

This comment has been minimized.

Copy link

@gurdas gurdas commented Nov 1, 2017

"the goal may just be to help people avoid ending up with duplicate items, which we can do by showing a notification after translation — when we have all the data — that the item exists in the target library."

That is exactly what I was thinking when I referred to the low-hanging fruit - grab the data and use just the URL or DOI to confirm a match (could be more sophisticated, but URL and DOI are a great start). And that is how the prevent duplicates plug-in worked. I would click on the translator to save an item, and if the item was pre-existing, the plug-in would display a popup showing the existing item citation (so I can verify if it's a real match) and provide two choices: cancel the save or continue to save.

There are times when I wish the plug-in had the update existing item option, but it's not big deal since Zotero has a solution for duplicate merge.

@JessRiedel

This comment has been minimized.

Copy link

@JessRiedel JessRiedel commented Nov 1, 2017

As a novice, I second @bwiernik on the multiple behavior ("the most intuitive behavior for multiple would be to indicate the matching items in the Select Items dialog"), which seems to answer @dstillman's concern ("but we don't know whether a given DOI on the page corresponds to the current page or is just a citation, which is the whole reason we always display the folder icon for DOI."). There could also be a modified folder icon for multiple's that have at least one DOI already in the library.

I think the feature could be aggressive in matching URLs and single-DOIs in the library (i.e., non-multiple) without running the translator. If there's a pre-translator match, modify the icon. When it's clicked on, do two things: (1) start the translator service and (2) Open a dialog that says "This item appears to already be in the library: <Insert item's bibliographic information from library>. Do you want to?... [save anyway], [focus in library]". That means that a false positive is at most a second click for the user, with no time lost. If the UI was really slick, it would show a blank comparison field for the bibliographic information of the current webpage that would start with a "busy spinner" and then get filled in when the translator finished.

Depending on the UI complexity that it introduces, it might also be useful to add other options like "[merge in Zotero]" to start the merge process immediately (say, if the user is missing the PDF from the current item in their library and they just want to download it), and "[open item in library]" to automatically open the PDF (or whatever is the default file opened on double-click in the library).

@bwiernik

This comment has been minimized.

Copy link
Contributor

@bwiernik bwiernik commented Nov 1, 2017

Dan, would it just be easier to bring up the merge items dialog in this case, with options to keep original data, keep new data, or create a new item? And a check box to skip identical attachments?

This would seem to be a lot less confusing to me than new options 1 & 3 above (particularly since 3 is hard to describe succinctly). And it has the benefit of accomplishing 2 through another mechanism that is likely to be intuitively adopted by many users who don’t realize a context menu option exists. A context menu option could also simply launch this same dialog.

@dstillman

This comment has been minimized.

Copy link
Member Author

@dstillman dstillman commented Nov 1, 2017

I second @bwiernik on the multiple behavior ("the most intuitive behavior for multiple would be to indicate the matching items in the Select Items dialog")

Same problem there, though — for most translators, we haven't actually done translation in Select Items, even though we have the titles. DOI is the exception, since we get the metadata from CrossRef before displaying the titles, but it wouldn't work for most other multiples. I think it makes more sense to go with a solution that works reliably for all saves, even if it means you don't know ahead of time if you have the item.

I think the feature could be aggressive in matching URLs and single-DOIs in the library (i.e., non-multiple) without running the translator. […] That means that a false positive is at most a second click for the user, with no time lost.

Except a false positive could mean that the user wouldn't click the button, because they would think they already had it, even though what they actually had was just another reference mentioned on the page. (This is why DOIs are always multiple.)

So as a first step I think it makes sense to focus on preventing duplicates during all saves. We can revisit icon modification later (particularly if a metadata standard emerges for identifying the DOI for the current page), but it's a lot more complicated, might require modifying lots of translators (e.g., to extract the canonical URL in detectWeb()), and would be a lot less reliable.

@dstillman

This comment has been minimized.

Copy link
Member Author

@dstillman dstillman commented Nov 1, 2017

Dan, would it just be easier to bring up the merge items dialog in this case, with options to keep original data, keep new data, or create a new item? And a check box to skip identical attachments?

Maybe better, but definitely not easier. We'd have to either port one of the merge dialogs (sync conflict resolution or duplicate merging) to HTML and add it to the connector or bring a Zotero client window to the front (like the word processor plugin does), and in either case we'd have to make a decent number of modifications to the merge dialog to support this use case. We do have to port those to HTML eventually for Electron, but in the meantime we could do a simple prompt much more easily.

@gurdas

This comment has been minimized.

Copy link

@gurdas gurdas commented Nov 1, 2017

So as a first step I think it makes sense to focus on preventing duplicates during all saves.

That gets my vote and for me, that's the minimum viable product. Everything else is additional functionality that can happen over time.

@JessRiedel

This comment has been minimized.

Copy link

@JessRiedel JessRiedel commented Nov 1, 2017

Thanks @dstillman, points taken. However, is the minimum viable product people are talking about now equivalent to this?: "Run everything like normal, except right before the item is added to the library, thereby creating an apparent duplicate, give the user options to [1. Cancel], [2. Create new item], or [3. Merge] where the merge option adds the existing item to the current collection and has checkboxes to update metadata, add new files, etc.".

Would it at least be possible to add a "just open the existing PDF" option that would be accessible after the translators have run but before the new PDF finished downloading? (Waiting for the download is slow enough that it's faster to find it in your library by hand.) Or a right-click option on the unmodified icon that says "find best match in library" based on DOI/URL?

@JessRiedel

This comment has been minimized.

Copy link

@JessRiedel JessRiedel commented Nov 1, 2017

(I don't mean to make perfect the enemy of the good. The most demanded feature appears to be duplicate prevention, and a minimal version of this would be a valuable addition to Zotero.)

@RNAer

This comment has been minimized.

Copy link

@RNAer RNAer commented Jun 5, 2019

agree this is a very desirable feature requested by lots of zotero users.

@lshaheen

This comment has been minimized.

Copy link

@lshaheen lshaheen commented Jan 17, 2020

This would be very helpful. Has there been any headway?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.