Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translator for clinicaltrials.gov #2153

Merged
merged 12 commits into from May 5, 2020
Merged

Conversation

rdvelazquez
Copy link
Contributor

@rdvelazquez rdvelazquez commented Apr 1, 2020

TODO:

  • Decide the itemType to use. Currently using journalArticle but there have been discussions of using dataset or creating a new type using report as recommended by @bwwiernik
  • Set up a local development environment so I can test this out using the Zotero modules and tests Got it set up and everything works except for an issue with matching the "extra" field All three tests are passing now
  • Implement the search (if we think its a feature this should have) I don't see the need for being able to cite all the trials from a particular search of clinicaltrials.gov at this point (could always be added in later if needed, item parsing and other TODOs
  • Determine the values for translatorType and browserSupport in the metadata I think these are correct; just following these docs
  • Fix the lint errors and warnings

closes #1952
relates to manubot/manubot#216

@rdvelazquez
Copy link
Contributor Author

@bwiernik and @adam3smith any input on what item type to use for clinical trials?

@bwiernik
Copy link
Contributor

bwiernik commented Apr 1, 2020

I would use Report for this.

@rdvelazquez
Copy link
Contributor Author

@dhimmel and @agitter two quick questions that I thought you may have answers to:

  1. Date: Clinical trials seem to have lots of dates; I'm currently using the "LastUpdateSubmitDate". Is this the date we would want in the citation or would we want "StudyFirstSubmitDate" or some other date? Here's an example of the date info available:
"StatusVerifiedDate":"March 2020",
              "OverallStatus":"Completed",
              "ExpandedAccessInfo":{
                "HasExpandedAccess":"No"
              },
              "StartDateStruct":{
                "StartDate":"February 6, 2020",
                "StartDateType":"Actual"
              },
              "PrimaryCompletionDateStruct":{
                "PrimaryCompletionDate":"February 25, 2020",
                "PrimaryCompletionDateType":"Actual"
              },
              "CompletionDateStruct":{
                "CompletionDate":"February 25, 2020",
                "CompletionDateType":"Actual"
              },
              "StudyFirstSubmitDate":"February 6, 2020",
              "StudyFirstSubmitQCDate":"February 6, 2020",
              "StudyFirstPostDateStruct":{
                "StudyFirstPostDate":"February 7, 2020",
                "StudyFirstPostDateType":"Actual"
              },
              "LastUpdateSubmitDate":"March 22, 2020",
              "LastUpdatePostDateStruct":{
                "LastUpdatePostDate":"March 24, 2020",
                "LastUpdatePostDateType":"Actual"
              }
  1. Author: I'm using the "ResponsiblePartyInvestigatorFullName" if one exists and the "LeadSponsorName" if it doesn't. Is this standard / ok? It seems like the "LeadSponsorName" will sometimes be a company.

I looked https://www.who.int/ictrp/How_to_cite.pdf and https://blogs.uoregon.edu/annie/2017/10/25/clinical-trial-apa-format/ but they didn't seem to be conclusive.

@bwiernik
Copy link
Contributor

bwiernik commented Apr 1, 2020

For things like preprints, Zotero translators typically save the last updated date (i.e., the date of the version of the item actually being viewed) as the date. The first submit date could be stored in Extra with the label "Original date:"

@adam3smith
Copy link
Collaborator

adam3smith commented Apr 1, 2020

Agree on the regular date, but I'd be careful with using original date too widely. Its most common use is for historical publication dates of reprinted works, which are often rendered in citation styles. I don't really see that that'd be true for clinical trials (or preprints, for that matter).

@bwiernik
Copy link
Contributor

bwiernik commented Apr 1, 2020

That's a really good point. "Submitted" might be a better (and rarely used in citation styles) variable.

@dhimmel
Copy link

dhimmel commented Apr 1, 2020

I'm using the "ResponsiblePartyInvestigatorFullName" if one exists and the "LeadSponsorName" if it doesn't. Is this standard / ok? It seems like the "LeadSponsorName" will sometimes be a company.

Just looking at a random record NCT04291053:

            "SponsorCollaboratorsModule":{
              "ResponsibleParty":{
                "ResponsiblePartyType":"Principal Investigator",
                "ResponsiblePartyInvestigatorFullName":"Chen Xiaoping",
                "ResponsiblePartyInvestigatorTitle":"Principal Investigator",
                "ResponsiblePartyInvestigatorAffiliation":"Tongji Hospital"
              },
              "LeadSponsor":{
                "LeadSponsorName":"Tongji Hospital",
                "LeadSponsorClass":"OTHER"
              }
            },
Some documentation

From https://prsinfo.clinicaltrials.gov/definitions.html:

3. Sponsor/Collaborators

Responsible Party, by Official Title *
Definition: An indication of whether the responsible party is the sponsor, the sponsor-investigator, or a principal investigator designated by the sponsor to be the responsible party. Select one.

  • Sponsor: The entity (for example, corporation or agency) that initiates the study
  • Principal Investigator: The individual designated as responsible party by the sponsor (see Note)
  • Sponsor-Investigator: The individual who both initiates and conducts the study
    Note: The sponsor may designate a principal investigator as the responsible party if such principal investigator meets all of the following requirements: is responsible for conducting the study; has access to and control over the data from the study; has the right to publish the results of the study; and has the ability to meet all of the requirements for submitting and updating clinical study information.

Investigator Information [*]
If the Responsible Party, by Official Title is either "Principal Investigator" or "Sponsor-Investigator," the following is required:

  • Investigator Name: Name of the investigator, including first and last name
  • Investigator Official Title: The official title of the investigator at the primary organizational affiliation
    Limit: 254 characters.
  • Investigator Affiliation: Primary organizational affiliation of the individual;
    Limit: 160 characters.

Name of the Sponsor *
Definition: The name of the entity or the individual who is the sponsor of the clinical study.
Limit: 160 characters.

Note: When a clinical study is conducted under an investigational new drug application (IND) or investigational device exemption (IDE), the IND or IDE holder is considered the sponsor. When a clinical study is not conducted under an IND or IDE, the single person or entity who initiates the study, by preparing and/or planning the study, and who has authority and control over the study, is considered the sponsor.

Collaborators
Definition: Other organizations (if any) providing support. Support may include funding, design, implementation, data analysis or reporting. The responsible party is responsible for confirming all collaborators before listing them.
Limit: 160 characters.

I think perhaps we want everything: the lead investigator, the sponsor, and collaborators. Each one of these could be different authors. I don't know too much about clinical trials however, so would be interested in what others think

@rdvelazquez
Copy link
Contributor Author

Thank you all for the quick responses. Much appreciated!

My last commit attempts to incorporate that feedback.

  • I'm including the initial submission date in extra.submittedDate
  • I'm including the lead investigator, sponsor, and collaborators as creators and, as a way to be able to tell who is who, I'm also including this info in the extra section explicitly stating what type of creator they were.
    (there aren't many creatorTypes for report so I'm just listing everyone as an author but including this info in the extra's will let downstream analyses be able to untangle who was a collaborator vs. sponsor, etc.)

@rdvelazquez
Copy link
Contributor Author

When I run the tests locally it says that the extra field doesn't match. Here's the testing output:

         -   "extra": {
         -     "submittedDate": "February 6, 2020"
         -     "responsiblePartyInvestigator": "undefined"
         -     "sponsor": "Gilead Sciences"
         -   }
         +   "extra": "[object Object]"

i.e. it's getting "[object Object]"

@adam3smith or someone else experienced with Zotero, any input on this?

@adam3smith
Copy link
Collaborator

Haven't looked at your code, but Extra needs to be a string with the different values newline separated. It looks like you have it as an array?

@rdvelazquez
Copy link
Contributor Author

Thanks @adam3smith! That worked. All three test are now passing. Ready for review.

@bwiernik
Copy link
Contributor

bwiernik commented Apr 2, 2020

A few immediate comments:

  1. I would enter submittedDate just as submitted. That is the actual CSL variable. This will make it accessible for citations if needed.
  2. I would suggest not saving the same information in multiple places; that has a decent chance of producing unexpected results in citations. I'll defer to @adam3smith as to where Principal Investigators, other collaborators, and the lead sponsor should go. My intuition is that the Principal Investigator should be stored as Author, any collaborators stored as Contributor, and the Sponsor stored in Extra (and only in Extra) labeled "Sponsor:". Perhaps conditionally if there is no Principal Investigator, then Sponsor could instead be stored as an author.
  3. Use proper narrative capitalization and spacing when storing data in Extra, rather than camelCase (e.g., Principal investigator: rather than nresponsiblePartyInvestigator:).
  4. Undefined or missing values should be dropped rather than being stored as undefined
  5. "ClinicalTrials.gov" should be stored in the institution (publisher) field, and "Clinical trial registration" in the reportType field.
  6. The registration number (I think NCTId) should be stored in reportNumber
  7. The BriefSummary part of the DescriptionModule should be stored in abstract

@rdvelazquez
Copy link
Contributor Author

Thank you for the review @bwiernik! I've addressed everything except point 2.

I think your recommendation makes sense just one point of clarification: We would store the sponsor in extra as "Sponsor: " no matter what, even if there was no principle investigator and the sponsor was also listed as the author?

Pro: You could tell if the sponsor was listed as the author. (if clinical trials always have a sponsor than this is negated as you could just check to see if "Sponsor: " in extra was missing and if so assume know that the "author" was actually the "Sponsor")
Con: You would be storing the same info in two places

@bwiernik
Copy link
Contributor

bwiernik commented Apr 2, 2020

It would be fairly clear that the sponsor was listed as an author by virtue of it being an organizational (single-field) author. Another option rather than placing in Extra at all would be to include it as a "Contributor" or "Author" depending on whether there was a personal author (PI).

@rdvelazquez
Copy link
Contributor Author

@bwiernik That makes sense to me. I implemented that and we can see if anyone has other opinions on how it should be handled. Thanks again for looking at this!

@dhimmel
Copy link

dhimmel commented Apr 15, 2020

Just wanted to check in here. @rdvelazquez are you good on your end and just waiting for review?

@rdvelazquez
Copy link
Contributor Author

Yep. Things are good on my end and just waiting for review.

@dhimmel
Copy link

dhimmel commented Apr 15, 2020

@bwiernik / @adam3smith any chance either you have time to re-review? Would be greatly appreciated.

@rdvelazquez
Copy link
Contributor Author

Just checking on the status of getting this reviewed and incorporated into Zotero. This would be helpful for the https://github.com/greenelab/covid19-review project.

Copy link
Collaborator

@adam3smith adam3smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay -- difficult to keep up with these during wfh. A couple of questions and comments -- I'll try to be quick to get back to you once you reply

*/

function detectWeb(doc, url) {
if (url.includes("https://clinicaltrials.gov/ct2/results")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not make multiples work or have you just not gotten to it?
We strongly prefer having multiples available -- especially for things like systematic reviews this would seems super useful.
In any case, we should then just exclude them via target regex - and make the target regex as well as detectWeb() more restrictive overall: in its current form the translator would detect e.g. the homepage and help pages as reports and then fail on import. This should never happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the target more restrictive. I wasn't sure if any additional checks should go in detectWeb(); it seemed like the target was the preferred place to make it more restrictive.

I thought the single case was more important than the multiples and wanted to focus my limited time on getting that done first. I know we need the single use case for the active https://github.com/greenelab/covid19-review and wasn't aware of a use case for the multiples (I assume this would be getting all the citations for what the clinicalTrials website returns from a search?). I'd be willing to help get that incorporated but I'm not sure how much time I'll have in the next few weeks.

return dateTime.split(" ")[0].split(":").join("-");
}

function nameToFirstAndLast(rawName) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment on what exactly this does, ideally with a data example. If I'm reading this correctly, this would also remove authors initials/second names which doesn't seem ideal. Note that Zotero has a built-in ZU.cleanAuthor function that might be helpful here depending on the format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like I was likely reinventing the wheel when I wrote that function but couldn't find a good replacement. Thanks for pointing out ZU.cleanAuthor that was just what I needed!

"shortTitle": "Study to Evaluate the Safety and Antiviral Activity of Remdesivir (GS-5734™) in Participants With Severe Coronavirus Disease (COVID-19)",
"url": "https://clinicaltrials.gov/ct2/show/NCT04292899",
"institution": "clinicaltrials.gov",
"reportNumber": "NCT04292899",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the test generated with scaffold? It's odd that the indenting is off for report details. Let's fix this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the indenting issue. I generated the tests myself without using scaffold; I think I tried using it but had some issues getting it running if I remember correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, Scaffold is now just Tools → Developer → Translator Editor in Zotero, not a separate plugin. The test needs to be generated by Scaffold to be valid. (If you generated it by hand, how did you test it?)

If you're having trouble with something in Scaffold, let us know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying that @dstillman. That's how I tested it (and now that I'm looking at it again I can see that it says "Scaffold" at the top of the window... ). That was very helpful for developing the translator and how I tested it (I also used a simple node.js script to test the parsing while developing it so that I could get more instantaneous feedback)

rdvelazquez and others added 2 commits May 4, 2020 09:16
@adam3smith
Copy link
Collaborator

(sorry for the linting problems -- linting is a mess on Windows, no idea how people do this. Will fix asap or feel free to fix on your end)

@rdvelazquez
Copy link
Contributor Author

@adam3smith Thanks so much for the edits! I fixed the listing issues by running yarn lint --fix clinicaltrials.js and it looks like it took care of it. JS development on windows is tough (I've been there).

@adam3smith adam3smith merged commit dee2b36 into zotero:master May 5, 2020
@adam3smith
Copy link
Collaborator

Terrific, thanks -- this looks great!

MylesFTOP pushed a commit to MylesFTOP/translators that referenced this pull request Aug 23, 2020
using report item type for now; 
test for multiple  fails, so not adding, but tested successfully in Firefox
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Add clinicaltrials.gov
5 participants