Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprint item type #88

Closed
dstillman opened this issue Nov 12, 2021 · 35 comments
Closed

Preprint item type #88

dstillman opened this issue Nov 12, 2021 · 35 comments

Comments

@dstillman
Copy link
Member

@adam3smith, @bwiernik, is there anything I should be consulting for this? Anything this needs to be mapped to on the CSL side? I'm not seeing any existing issues for it.

@adam3smith
Copy link
Collaborator

adam3smith commented Nov 12, 2021

Huh, surprised we don't have a ticket.
Preprint should map to CSL article.
Preprint server (not wedded to that label) should be publisher.
I think we'll want series and series number to accommodate working papers in series. Beyond that, only standard fields.

Edit: just looking at arXiv and wondering if we should try to get the ID into a number field? It needs to be citeable

@bwiernik
Copy link
Collaborator

How about "repository" for publisher?

@bwiernik
Copy link
Collaborator

Type mapped to genre.

APA style wants the archive ID. We settled on CSL archive_location for that back when we discussed it @adam3smith when I was writing APA 7.

@adam3smith
Copy link
Collaborator

APA style wants the archive ID. We settled on CSL archive_location for that back when we discussed it @adam3smith when I was writing APA 7.

Do you remember why? number as used e.g. for patent, seems a better fit. I'm just a bit worried that we have a fair amount of styles citing archive and.location across all item types

@bwiernik
Copy link
Collaborator

Let me look into it

@dstillman
Copy link
Member Author

dstillman commented Nov 12, 2021

The ids actually get a little tricky. We currently put arXiv IDs (from arXiv.org or Mendeley import) into Extra as arXiv (which maybe should've been arXiv ID), and I assumed we'd want to migrate that to a dedicated field, which later might be part of a more flexible many-to-one id system like we've talked about in the past. But then we'd probably need special logic everywhere to get that to the processor as number or whatever it needs to be — a regular CSL mapping wouldn't work because an import back from CSL-JSON would be ambiguous, with multiple possible fields (number, arXivID, or any other repo-specific ones).

Can we just assume that all preprint archives will use an unambiguous id format, with an identifiable prefix like arXiv:, and we can just store them in a single archiveID field, mapped bidirectionally to an appropriate CSL field? And any automated handling will just use the prefix to identify it?

@adam3smith
Copy link
Collaborator

I like the archiveID. Not sure if all servers have that - e.g. OSF preprints technically habe an ID but they never use it, but leaving the field empty is fine of course.
Where IDs are essential, I think assuming a prefix and unique ID is plausible

@bwiernik
Copy link
Collaborator

Would we maybe want to add archive ID to all types alongside archive, location in archive, and the new archive place and archival collection? That would unambiguously separate physical and digital locations. CSL could add an archive_id variable

@adam3smith
Copy link
Collaborator

adam3smith commented Nov 12, 2021 via email

@dstillman
Copy link
Member Author

Does document stay mapped to article too? Should CSL-JSON article import to document or preprint?

Re: archiveID on everything, a concrete example that I've been unsure about: if you have a preprint with an arXiv ID, and then you update metadata and it now is published and has a DOI, we presumably convert that item to journalArticle. Do we keep the archiveID on the item? Throwing it out seems bad, but it also seems a little conceptually fuzzy, since the item no longer really represents that version. arXiv.org obviously keeps the page and lists the DOI, but the canonical source of metadata would be the publisher, and that metadata wouldn't have the arXiv ID.

More practically, do styles know not to use the archiveID for published articles?

@adam3smith
Copy link
Collaborator

>  Does document stay mapped to article too? Should CSL-JSON article import to document or preprint

CSL 1.0.2 which we are hoping to release on Dec 1 has document so preprint should map to article and document to document

I'm not sure about the answer to the arxiv questions, but as a data point, arxiv's own bibtex no longer includes the arxiv ID once an item is published in a journal

@bwiernik
Copy link
Collaborator

Perhaps converting archiveID to an attached link would be a good way to keep the information but also avoid including the ID in citations to published items?

@dstillman
Copy link
Member Author

That's a good idea.

@dstillman
Copy link
Member Author

But then do we still need archiveID on all item types?

@bwiernik
Copy link
Collaborator

A lot of items might have an electronic archive that should be cited instead of/in addition to a URL. APA for example, wants archive and archive IDs to be included when the item is not widely available (e.g., articles, reports, manuscripts, books, documents). Examples given in the manual are ProQuest ID numbers and ERIC ID numbers.

9.30 Database and Archive Sources
Database and archive information is seldom needed in reference list entries. The purpose of a reference list entry is to provide readers with the details they will need to perform a search themselves if necessary, not to replicate the path the author of the work personally used. Most periodical and book content is available through a variety of databases or platforms, and different readers will have different methods or points of access. Additionally, URLs from databases or library-provided services usually require a login and/or are session specific, meaning they will not be accessible to most readers and are not suitable to include in a reference list.

  • Provide database or other online archive information in a reference only when it is necessary for readers to retrieve the cited work from that exact database or archive.

    • Provide the name of the database or archive when it publishes original, proprietary works available only in that database or archive (e.g., Cochrane Database of Systematic Reviews or UpToDate; see Chapter 10, Examples 13–14). References for these works are similar to journal article references; the name of the database or archive is written in italic title case in the source element, the same as a periodical title.

    • Provide the name of the database or archive for works of limited circulation, such as

      • dissertations and theses published in ProQuest Dissertations and Theses Global,
      • works in a university archive,
      • manuscripts posted in a preprint archive like PsyArXiv (see Chapter 10, Example 73),
      • works posted in an institutional or government repository, and
      • monographs published in ERIC or primary sources published in JSTOR (see Chapter 10, Example 74).

    These references are similar to report references; the name of the database or archive is provided in the source element (in title case without italics), the same as a publisher name.

  • Do not include database information for works obtained from most academic research databases or platforms because works in these resources are widely available. Examples of academic research databases and platforms include APA PsycNET, PsycINFO, Academic Search Complete, CINAHL, Ebook Central, EBSCOhost, Google Scholar, JSTOR (excluding its primary sources collection because these are works of limited distribution), MEDLINE, Nexis Uni, Ovid, ProQuest (excluding its dissertations and theses databases, because dissertations and theses are works of limited circulation), PubMed Central (excluding authors’ final peer-reviewed manuscripts because these are works of limited circulation), ScienceDirect, Scopus, and Web of Science. When citing a work from one of these databases or platforms, do not include the database or platform name in the reference list entry unless the work falls under one of the exceptions.

  • If you are in doubt as to whether to include database information in a reference, refer to the template for the reference type in question (see Chapter 10).

  • Finish the database or archive component of the source element with a period, followed by a DOI or URL as applicable (see Sections 9.34–9.36).

@dstillman
Copy link
Member Author

OK, so use the same archiveID field for preprint and journalArticle/others, but move known preprint-server ids to attached links on metadata updating, and translators/people can populate the non-preprint archiveID fields as needed.

The only problem would be if you manually changed the item type from Preprint to Journal Article. If it's the same field, the archiveID value would be preserved and potentially affect citations, which would be different behavior from metadata updating. Or we could override the default behavior and convert to an attached link at that point, to make it the same as during metadata updating, but we wouldn't do that going in the other direction, so it's a little weird.

@dstillman
Copy link
Member Author

This is what I have so far:

{
  "itemType": "preprint",
  "fields": [
    {
      "field": "title"
    },
    {
      "field": "abstractNote"
    },
    {
      "field": "date"
    },
    {
      "field": "repository",
      "baseField": "publisher"
    },
    {
      "field": "place"
    },
    {
      "field": "archiveID"
    },
    {
      "field": "DOI"
    },
    {
      "field": "citationKey"
    },
    {
      "field": "url"
    },
    {
      "field": "accessDate"
    },
    {
      "field": "archive"
    },
    {
      "field": "archiveLocation"
    },
    {
      "field": "shortTitle"
    },
    {
      "field": "language"
    },
    {
      "field": "libraryCatalog"
    },
    {
      "field": "callNumber"
    },
    {
      "field": "rights"
    },
    {
      "field": "extra"
    }
  ],
  "creatorTypes": [
    {
      "creatorType": "author",
      "primary": true
    },
    {
      "creatorType": "contributor"
    },
    {
      "creatorType": "editor"
    },
    {
      "creatorType": "translator"
    },
    {
      "creatorType": "reviewedAuthor"
    }
  ]
}

Some more questions:

  • Do preprints need "Place" (publisher-place)?
  • @bwiernik, what would type (mapped to genre) be for, if journal articles (which many/most of these will become) don't have that.
  • A little weird to have "Repository" (mapped to publisher) and "Archive ID" next to it, when there are existing "Archive" and "Loc. in Archive" fields down below. And I'm a bit confused about how "Archive ID" interacts with "Archive" on other types. Would "Archive" be used for digital archives as well, and you use either "Archive ID" or "Loc. in Archive" depending on electronic vs. physical? But we can't use "Archive" here because we need it to map to publisher?
  • Should I map archiveID to number for now?

@bwiernik
Copy link
Collaborator

bwiernik commented Nov 16, 2021

  1. The preprint type will encompass things like Working papers (eg, in economics) which are sometimes cited with a place, so I think yes

  2. genre would hold descriptions like "Working paper". It can be dropped if converted to a journal article

  3. That's correct. It's a little funky I agree. In most cases archiveID would pair up with the other Archive variables. Preprints are an unusual case where the archive and the publisher are the same thing.

  4. Hmm, I think so. One concern might be if many styles are written to render number indiscriminately.

@adam3smith Would number generally work as the electronic archive ID, or might items, eg, in ERIC or ProQuest have both, such as a working paper series number and archive ID?

@denismaier @bdarcus What do you think of adding an archive_id variable to CSL to distinguish between physical locations (archive_location) and electronic ones (archive_id)?

@adam3smith
Copy link
Collaborator

adam3smith commented Nov 16, 2021

Agree with Brenton on the above. I think we'll do fine with number - if we want series numbers, we'll use collection-number

Edit: which does mean we'll want series and series number added to the above

dstillman added a commit to zotero/zotero-schema that referenced this issue Nov 17, 2021
And map Document to CSL 1.0.2 `document`
dstillman added a commit to zotero/zotero-schema that referenced this issue Nov 17, 2021
And map Document to CSL 1.0.2 `document`
@dstillman
Copy link
Member Author

Anyone have an idea for an icon for preprints?

We'll need both a custom one in the new style for iOS/web and something based on famfamfam or Fugue for the desktop client:

http://www.famfamfam.com/lab/icons/silk/previews/index_abc.png
https://p.yusukekamiyamane.com/icons/preview/fugue.png

(Could be a combination of icons if necessary.)

@dstillman
Copy link
Member Author

dstillman commented Mar 3, 2022

"script" is sort of funny for this, in a Martin-Luther-nailing-theses-to-the-door sort of way. We're using that for Bill in the client, but our custom icon for Bill is the § symbol, so we could repurpose the script concept for this.

For now, I'm going with "receipt", which doesn't make a ton of sense but looks vaguely unfinished — like a piece of paper ripped off a dot matrix printer.

preprint-icon

@AbeJellinek
Copy link
Member

What about famfamfam's page_white_gear or page_white_go? Or "receipt" but converted to grayscale to match other print-ish types. Something about the blue just feels off to me.

@dstillman
Copy link
Member Author

"receipt" is the top row above. "bill" is the second. I was just saying we could use the bill concept, but we'd definitely do it in white/gray to be closer to the journal article icon.

@AbeJellinek
Copy link
Member

Oh, right.

@bwiernik
Copy link
Collaborator

bwiernik commented Mar 3, 2022

Maybe it's just been too many years of seeing the scroll/script used for Bill, but it looks a little weird to me for preprint

For famfamfam, I think both page_white_lightning and page_white_go are interesting and emphasize the rapidity of preprints.

From Fugue, I really like report or report-share. The notebook fringes on the left side of the page feel like a draft or unfinished paper (like receipt but better). The version with the sharing hand emphasizes the sharing/feedback solicitation of preprints/working papers.

@adam3smith
Copy link
Collaborator

How about page_white_wrench, because they're (often) still being worked on?

@adam3smith
Copy link
Collaborator

You could also pick your four favorites options and make it a Twitter poll, create some preprint buzz

dstillman added a commit to zotero/zotero that referenced this issue Mar 5, 2022
@bwiernik
Copy link
Collaborator

bwiernik commented Mar 6, 2022

Trying out the client on macOS with the new Preprint type. I think the current receipt icon is visually too similar to the Journal Article icon. On the macOS color scheme, I can barely see the fringes at the top and bottom, so the Journal Article and Preprint items look really similar.

A20B3B7F-45FF-42B0-A759-7FF8EEE097FF

@dstillman
Copy link
Member Author

Yes, we'll be changing it. Priority was just getting this out.

@bwiernik
Copy link
Collaborator

bwiernik commented Mar 6, 2022

Cool, just wanted to give some feedback in case that wasn't the plan

dstillman added a commit to zotero/zotero that referenced this issue Mar 8, 2022
Journal article plus pencil, similar to manuscript (blank page plus
pencil)

Follow-up to zotero/zotero-bits#88
@adam3smith
Copy link
Collaborator

Starting to work on preprint citations -- I'm not getting Archive ID mapped to CSL number (testing in the style editor in6.0.8-beta.4+1e3959020 ) -- could someone else check whether that's me or a general issue?

@dstillman
Copy link
Member Author

Can you provide a sample minimal style to test that?

@adam3smith
Copy link
Collaborator

MWE:
https://gist.github.com/adam3smith/786485597971865e2a99687f5401841d
Displays patentNumber for patent but [CSL STYLE ERROR: reference with no printed form.] for Preprint

ArchiveID also doesn't show up in CSL JSON from preprints, but I think that's expected?
FWIW, I'm testing with https://www.nber.org/papers/w14560 as imported using the NBER translator.

dstillman added a commit to zotero/zotero-schema that referenced this issue May 24, 2022
The Zotero field is `archiveID`, not `archive_id`, but this also isn't
necessary after making `archiveID` a base-mapped field in 4277955.

See zotero/zotero-bits#88 and zotero/zotero#2481
@dstillman
Copy link
Member Author

Sorry about that — didn't update a submodule. Try in the latest beta.

@adam3smith
Copy link
Collaborator

Yup, working, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants