Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create departmentScoringStrategy #229

Closed
paulalbert1 opened this issue Jun 30, 2018 · 0 comments
Closed

Create departmentScoringStrategy #229

paulalbert1 opened this issue Jun 30, 2018 · 0 comments

Comments

@paulalbert1
Copy link
Contributor

paulalbert1 commented Jun 30, 2018

Prerequisite: #200. Right now, we're only getting academic departments. But divisions and academic program are strong signals, so we should import them into identity as well.

  1. For each targetAuthor, grab article.affiliation.

  2. Retrieve org units from identity.departments.

  3. Create org unit alias if possible.

  • Substitute any and for & and vise versa in identity.departments
  • Remove any commas or dashes from identity.departments. Remove any commas or dashes from article.affiliation.
  • Substitute any Tri-I for Tri-Institutional and vise versa.
  • Look up all possible orgunit synonyms in identity.organizationalUnits (see Create organizationalUnitSynonym property in application.properties #264). Use case: nes3001 and 10499040.
  1. We will now attempt a match on each available org unit from identity.departments to article.affiliation.

  2. What is the type of the identity.department?

  • If "program", go to 10.
  • If "department" or other type, go to 6.
  1. Does the org unit meet the following two criteria:
    a. identity.department like "%Center%" or like "%Program%" or like "%Institute%"
    b. stringLength(identity.department) > 14 characters
  • If yes, go to 7
  • If no, go to 9
  1. Does article.affiliation = "%identity.department%"?
  • If yes, output the following:
articleAffiliation: "Center for Integrative Medicine, Weill Cornell Medicine, New York, NY, USA."
identityDepartment: "Center for Integrative Medicine"
departmentMatchingScore: 2

Unit test case: 29247405 mecharl Center for integrative medicine

  • If no, continue with other potential matches.
  1. Get keywords from strategy.authorAffiliationScoringStrategy.institutionStopwords. Substitute out any of these words from identity.department, e.g., Brain and Mind Research Institute --> Brain Mind Research Institute. Are any of these a match?
articleAffiliation: "Brain Mind Research Institute, Weill Cornell Medicine, New York, NY, USA. diw2004@med.cornell.edu."
identityDepartment: "Brain and Mind Research Institute"
departmentMatchingScore: 2

Test case: diw2004, 26984475, Brain Mind Research Institute

  • If no, continue with other potential matches.
  1. Attempt match of article.affiliation to identity.affiliation with either one of the following:
    a. article.department = "%" + "Department of" + identity.department + "%"
    b. article.department = "%" + "Dept of" + identity.department + "%"
    c. article.department = "%" + "Division of" + identity.department + "%"
    d. article.department = "%" + "Departments of" + identity.department + "%"
    e. article.department = "%" + "Depts of" + identity.department + "%"
    f. article.department = "%" + "Divisions of" + identity.department + "%"
  • If there is a match, output the following:
articleAffiliation: "Department of Pharmacology, Weill Cornell Medical College. New York, NY 10021, USA. jobuck@med.cornell.edu"
identityDepartment: "Pharmacology"
departmentMatchingScore: 2

Unit test case: 21544217 Pharmacology jobuck

  • If no, continue with other potential matches.
  1. Is matching department for "Medicine"? (A lot of people claim this as a departmental affiliation.)
  • If yes, add modifier:
organizationalUnitMatchModifierScore: Medicine ,-1
  • If no, stop.
  1. Attempt match of article.affiliation to identity.affiliation with either one of the following:
    a. article.affiliation = "%" + "Program in " + identity.department + "%"
    b. article.affiliation = "%" + identity.department + "%" + " Program" + "%"
    c. article.affiliation = "%" + identity.department + "%" + " Graduate Program" + "%"
  • If there is a match, output the following:
articleAffiliation: "Program in Biochemistry, Cell and Molecular Biology, Weill Cornell Graduate School of Medical Sciences, New York, NY 10065, USA; Program in Developmental Biology, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA."
identityDepartment: "Program in Biochemistry, Cell and Molecular Biology"
programMatchingScore: 3

Unit test case: mrb2006 29689191 Program in Biochemistry, Cell and Molecular Biology

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants