Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikimedia Commons: Image file doesn't attach #3136

Open
adam3smith opened this issue Sep 14, 2023 · 5 comments
Open

Wikimedia Commons: Image file doesn't attach #3136

adam3smith opened this issue Sep 14, 2023 · 5 comments
Assignees
Labels

Comments

@adam3smith
Copy link
Collaborator

adam3smith commented Sep 14, 2023

Example: https://commons.wikimedia.org/wiki/File:Baryon_decuplet.png
Reported: https://forums.zotero.org/discussion/107766/unable-to-download-file-from-wikiedia-commons#latest

To be clear: the metadata imports fine, as does the snapshot, by the translator is supposed to also download the image on the page, which it doesn't anymore.

I haven't looked at this in any more detail, but wouldn't be surprising if Commons changed how image files are made available.

@AbeJellinek
Copy link
Member

@zoe-translates: Apparently this translator still uses FW - any interest in rewriting?

@zoe-translates zoe-translates self-assigned this Sep 15, 2023
@zoe-translates
Copy link
Collaborator

This is the most recently-updated translator that still relies on FW. I'm going to stop that :)

One question about downloading the images themselves -

Some files hosted by the Wikimedia Commons could be super large. Should we impose an upper-limit on the attachment's file size, above which the file is saved as a link (snapshot: false) rather than a download?

@AbeJellinek
Copy link
Member

Should we impose an upper-limit on the attachment's file size, above which the file is saved as a link (snapshot: false) rather than a download?

Sure. 10 MB?

@zoe-translates
Copy link
Collaborator

In fact, the reported problem was caused by the xpath here (and the second appearance on line 121:

{
url : FW.Xpath('//div[@id="file"]/a/@href').text().prepend("http:"),
title : "Wikimedia Image"
}],

The xpath //div[@class="fullMedia"]//a[@class="internal"] would have matched the a tag.

However there are a lot more problems with this translator that I'm unwilling to submit just this change of xpath as a hotfix (it won't actually "fix" anything, because the other metadata fields are still broken, and the attachment itself misses mimeType).

So I think this is an opportunity to re-write the translator.

@zoe-translates
Copy link
Collaborator

Todo:

  • Better type determination (not just "artwork" for everything; other content types: book, audio recording, video, or even "document"?)
  • Determine mimeType and file size (maybe complicated by interface language (e.g. "Mio" for mébioctet vs. "MB")
  • What actually goes in the title? For example, using the current FW-based translator, the page https://commons.wikimedia.org/wiki/File:The_Day_the_Earth_Smiled_-_PIA17172.jpg produces the title "English: On July 19, 2013, in an event celebrated the world over, NASA's Cassini spacecraft slipped into Saturn's shadow and turned to image the planet, seven of its moons, its inner rings -- and, in the background, our home planet, Earth." None of these appears in the "cite this page" output provided by WMC itself
  • Authorship: should I simply use "Wikimedia Commons contributors" as recommended by the "cite this page" output, or go for the actual author of an artwork (e.g. Caravaggio, for the first test case)? I think it's OK to cite "Wikimedia Commons contributors" if the item is treated as a web page (which happens to host artwork), but the original intent seems to be going for the artwork itself by piercing the hosting-on-web-page veil (see also this "How To Credit Images Found in the Wikimedia Commons")
  • Are there more structured way to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants