Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage departmental affiliation string matching for phase two matching #79

Closed
paulalbert1 opened this issue May 23, 2015 · 3 comments
Closed

Comments

@paulalbert1
Copy link
Contributor

Use "Department of [target author's department(s) as tracked in rc_identity]" to match target co-authors

Sometimes the affiliation of the target co-author in PubMed will explicitly state the department. For example:

  • Take Rita Shankovich (ris9004).
  • Here - http://www.ncbi.nlm.nih.gov/pubmed/25867473 - is a candidate article in which R. Shankovich is listed as an author
  • When you go to the "primary_department" and "primary_affiliation" of that author, it says, "Pathology and Laboratory Medicine" and "Weill Cornell Medical College, Cornell University," respectively.
  • Go to rc_identity and look up primary and other departments. They are "Pathology and Laboratory Medicine" and "Medicine"
  • Does either "Department of Pathology and Laboratory Medicine" or "Department of Medicine" exist for the target author?
  • Not only that, but it also says "Weill Cornell" however... see Helen Fernandes's case below.
  • So that match should increase the likelihood we have a match.
  • Please be sure to translate "and" into different ways it's represented. "Pathology and Laboratory Medicine" should become:
    • Pathology and Laboratory Medicine
    • Pathology/Laboratory Medicine
    • Pathology & Laboratory Medicine

Other examples:

  • Shahin Rafii (srafii) is in the Department of Genetic Medicine, which is what these publications say - 24733255, 24799717
  • Fuqiang Geng (fug2001) is in the "Department of Genetic Medicine" and that is listed as this author's affiliation: 24799717 (currently false negative)
  • Helen Fernandes (hef9020) is in the "Department of Pathology and Laboratory Medicine" and that is listed as this author's affiliation (currently false negative): 9682005 1382908 11140879 21881595 1467538. Note that these publications were authored while she was at another institution! For example:
    • Department of Pathology, New Jersey Medical School, UMDNJ, Newark 07103, USA. (21881595)
    • Department of Pathology and Lab Medicine, University of Medicine and Dentistry/New Jersey Medical School, Newark, NJ 07103, USA. (11140879)
    • Department of Laboratory Medicine and Pathology, UMDNJ-New Jersey Medical School, Newark 07103. (1382908)

Related to #46

@michaelbales1 michaelbales1 changed the title Use "Department of [target author's department(s) as tracked in rc_identity]" to match target co-authors Leverage departmental affiliation string matching for phase 2 matching Jun 5, 2015
@michaelbales1 michaelbales1 changed the title Leverage departmental affiliation string matching for phase 2 matching Leverage departmental affiliation string matching for phase two matching Jun 5, 2015
@jl987-Jie
Copy link
Contributor

if (selectingTarget) {
    // Grab columns "primary_department" and "other_departments" from table "rc_identity".
    /*
     * Please be sure to translate "and" into different ways it's represented. "Pathology and Laboratory Medicine" should become:
        1. Pathology and Laboratory Medicine
        2. Pathology/Laboratory Medicine
        3. Pathology & Laboratory Medicine
    */
    /*
    for (ReCiterArticle article : finalCluster.get(id).getArticleCluster()) {
        // compare the above departments information with the article's affiliation information.
        // increase the sim score if departments match.
    }
*/

jl987-Jie added a commit that referenced this issue Jun 28, 2015
ghost pushed a commit that referenced this issue Jul 16, 2015
@jl987-Jie
Copy link
Contributor

    /**
     * Extract Department information from string of the form "Department of *,".
     * 
     * @param department Department string
     * @return Department name.
     */
    private String extractDepartment(String department) {
        final Pattern pattern = Pattern.compile("Department of (.+?),");
        final Matcher matcher = pattern.matcher(department);
        if (matcher.find()) {
            return matcher.group(1);
        } else {
            return "";
        }
    }

    /**
     * Leverage departmental affiliation string matching for phase two matching.
     * 
     * If reCiterAuthor has department information, extract the "department of ***" string and use string comparison
     * to match to target author's primary department and other department fields. If both party's department match,
     * return true, else return false.
     * 
     * (Github issue: https://github.com/wcmc-its/ReCiter/issues/79)
     * @return True if the department of the ReCiterAuthor and TargetAuthor match.
     */
    public boolean departmentMatch(ReCiterAuthor reCiterAuthor, TargetAuthor targetAuthor) {

        if (reCiterAuthor.getAffiliation() != null) {
            String affiliation = reCiterAuthor.getAffiliation().getAffiliationName();
            String extractedDept = extractDepartment(affiliation);
            String targetAuthorDept = targetAuthor.getDepartment();
            String targetAuthorOtherDept = targetAuthor.getOtherDeparment();
            if (extractedDept.equalsIgnoreCase(targetAuthorDept) || extractedDept.equalsIgnoreCase(targetAuthorOtherDept)) {
                return true;
            }
        }
        return false;
    }

@michaelbales1
Copy link
Contributor

Hanumantha has implemented and reports that Jie has integrated his code into his the ReCiterAlgorithmRevisionJul2015 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants