ReCiter

Purpose
Technical
Installation
- Local
- Amazon AWS
Configuration
Functionality
Future work
Funding acknowledgment
Follow up

Purpose

ReCiter is a highly accurate system for guessing which publications in PubMed a given person has authored. ReCiter includes a Java application, a DynamoDB-hosted database, and a set of RESTful microservices which collectively allow institutions to maintain accurate and up-to-date author publication lists for thousands of people. This software is optimized for disambiguating authorship in PubMed and, optionally, Scopus.

ReCiter accurately identifies articles, including those at previous affiliations, by a given person. It does this by leveraging institutionally maintained identity data (e.g., departments, relationships, email addresses, year of degree, etc.) With the more complete and efficient searches that result from combining these types of data, you can save time and your institution can be more productive. If you run ReCiter daily, you can ensure that the desired users are the first to learn when a new publication has appeared in PubMed.

ReCiter is fast. It uses an advanced multi-threading strategy known as a work stealing pool to make up to 10 retrieval requests at a time.

ReCiter is freely available and open source under the Apache 2.0 license.

Please see the ReCiter wiki for more information.

Technical

Prerequisites

Java 11
Latest version of Maven. To install Maven navigate to the directory where ReCiter will be installed, execute brew install maven and then mvn clean install

If you want to use Java 8 then update <java.version>1.8</java.version> in pom.xml

It is not necessary to install ReCiter in order to use the API.

Technological stack

Key technologies include:

ReCiter stores data about researchers and publications in DynamoDB, which can be hosted on Amazon AWS or installed locally.
Its main computation logic is written in Java.
It employs the Spring Framework, a Java-based application framework designed to manage RESTful web services and server requests.
ReCiter uses Swagger, a toolset that provides a user interface with helpful cues for how to interact with the application's RESTful APIs.

You may choose to run ReCiter on either:

A server - ReCiter will run on Linux, Mac OS X, and Windows versions 7 and higher. A minimum of 4GB of RAM is required; 16GB of RAM are recommended. An Internet connection is required to download article data from scholarly databases.
A local machine - ReCiter's APIs may be run in a browser on any modern machine. The ReCiter server must be accessible to the local machine via a local area network or internet connection.

Architecture

Related code repositories

The ReCiter application depends on the following separate GitHub-hosted repositories:

Optionally, users can install:

ReCiter Publication Manager - a powerful user interface / web application that streamlines the process of updating and reporting on the publications of an institution's scholars
ReCiterDB - the back end data store for Publication Manager; in addition to the schema and stored procedures, this repository contains a set of scripts that retrieve data from ReCiter and imports them into this MySQL database

Installation

ReCiter can be installed to run locally or in AWS via a cloud formation template. A required dependency is the PubMed Retrieval Tool. The Scopus Retrieval Tool is optional, but can improve overall accuracy by several percent.

Local

Clone the repository to a local folder using git clone https://github.com/wcmc-its/ReCiter.git
Go to the folder where the repository has been cloned and navigate to src/main/resources/application.properties and change port and log location accordingly

change aws.DynamoDb.local=false to aws.DynamoDb.local=true
update location of DynamoDB database, e.g., aws.DynamoDb.local.dbpath=/Users/Paul/Documents/ReCiter/dynamodb_local_latest
By default application security is turned on. If you wish to turn it off you must change the flag to false from spring.security.enabled=true to spring.security.enabled=false
If you have the security as true you must include the following environment variables -

export ADMIN_API_KEY=<api-key>
export CONSUMER_API_KEY=<api-key>

If you do not have scopus subscription you should mark this value to false. Change use.scopus.articles=true to use.scopus.articles=false.

Enter ports for server and services in command line. Note that the Scopus service is optional. You must have Pubmed Service and optionally Scopus Service setup before this step. Enter appropriate hostname and the port numbers.

export SERVER_PORT=5000
export SCOPUS_SERVICE=http://localhost:5001
export PUBMED_SERVICE=http://localhost:5002

Run mvn spring-boot:run. You can add additional options if you want like max and min java memory with export MAVEN_OPTS=-Xmx1024m
Go to http://localhost:<port-number>/swagger-ui/index.html or http://localhost:<port-number>/swagger-ui/ (shorthand swagger url) to test and run any API.

Amazon AWS

The ReCiter CDK allows to install the entire infrastructure for ReCiter and its components and its highly configurable. There you will find instruction to install ReCiter and its components.

Configuration

PubMed API key - Recommended for performance reasons and to prevent and limit the likelihood National Library of Medicine will throttle you, but otherwise not necessary.
Scopus API key and instoken - Use of Scopus is optional. It can improve overall accuracy by several percent; Scopus is helpful because it has disambiguated organizational affiliation and verbose first name, especially for earlier articles. Use of the Scopus API is available only for Scopus subscribers.
Security - Each of ReCiter's APIs can be configured to restrict access to only those requests which provide the correct API key.
Application.properties - All remaining configurations are stored here.

Functionality

How ReCiter works

The wiki article, How ReCiter works, contains a more detailed description on the application works.

Populate identity information for target users
Optional: populate Gold Standard of already accepted or rejected publications; note that this system currently does not offer a user interface for collecting this feedback
Lookup candidate articles in PubMed and, optionally, Scopus
Compute suggestions
Retrieve suggestions

Using the APIs

The wiki article, Using the APIs, contains a full description on how to use the ReCiter APIs.

Category	Function	Relevant API(s)
Manage identity of target users	Add or update identity data for target user(s) from Identity table	`/reciter/identity/` or `/reciter/save/identities/`
Manage identity of target users	Retrieve identity data for target user(s) from Identity table	`/reciter/find/identity/by/uid/` or `/reciter/find/identity/by/uids/` or `/reciter/find/all/identity`
Gold standard	Update the GoldStandard table (includes both accepted and rejected PMIDs) for single user	`/reciter/goldstandard/`
Gold standard	Update the GoldStandard table (includes both accepted and rejected PMIDs) for mutliple users	`/reciter/goldstandard/`
Gold standard	Read from the GoldStandard table (includes both accepted and rejected PMIDs) for target user(s)	`/reciter/goldstandard/{uid}`
Look up candidate articles	Trigger look up of candidate articles for a given user	`/reciter/retrieve/articles/by/uid`
Retrieve suggested articles	Read suggested articles from the Analysis table for target user	`/reciter/article-retrieval/by/uid`
Retrieve suggested articles	Read suggested articles and see supporting evidence from the Analysis table for target user(s)	`/reciter/feature-generator/by/uid` or `/reciter/feature-generator/by/group`

Published articles

Albert PJ, Dutta S, Lin J, Zhu Z, Bales M, Johnson SB, Mansour M, Wright D, Wheeler TR, Cole CL. (2021) ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions. PLoS ONE 16(4): e0244641. https://doi.org/10.1371/journal.pone.0244641

Future work

Both the issue queue and the Roadmap include some areas where we want to improve ReCiter.

Funding acknowledgment

Various components in the ReCiter suite of applications has been funded by:

The National Institutes of Health National Center for Advancing Translational Sciences through grant number UL1TR002384
National Library of Medicine, National Institutes of Health under a cooperative agreement with Region 7
Lyrasis through its Catalyst fund

Follow up

Please submit any questions to Paul Albert. You may expect a response within one to two business days.

We use GitHub issues to track bugs and feature requests. If you find a bug, please feel free to open an issue.

Contributions welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 1,710 Commits
.ebextensions/nginx/conf.d		.ebextensions/nginx/conf.d
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
files		files
kubernetes		kubernetes
native-libs		native-libs
nginx/conf.d		nginx/conf.d
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
Procfile		Procfile
README.md		README.md
_config.yml		_config.yml
buildspec.yml		buildspec.yml
docker-compose.yaml		docker-compose.yaml
k8-buildspec.yml		k8-buildspec.yml
pom.xml		pom.xml
wikinotes.md		wikinotes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReCiter

Purpose

Technical

Prerequisites

Technological stack

Architecture

Related code repositories

Installation

Local

Amazon AWS

Configuration

Functionality

How ReCiter works

Using the APIs

See also

Published articles

Future work

Funding acknowledgment

Follow up

About

Releases 10

Packages

Contributors 9

Languages

License

wcmc-its/ReCiter

Folders and files

Latest commit

History

Repository files navigation

ReCiter

Purpose

Technical

Prerequisites

Technological stack

Architecture

Related code repositories

Installation

Local

Amazon AWS

Configuration

Functionality

How ReCiter works

Using the APIs

See also

Published articles

Future work

Funding acknowledgment

Follow up

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 9

Languages

Packages