You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
0a. User selects one of these refreshFlags for retrieval (this should be an option in the Swagger UI):
If "All publications" - re-import all publications
If "Only newly added publications" - go to 0b
If "False" (default) - retrieve existing records from eSearchResults
0b. Construct date filter.
If "Only newly added publications" is selected, we're going to do an incremental lookup. Grab latestRetrievalDate from esearchresults. Construct query using last retrieval date. Suppose the latest lookup data was August 1. The modifier would look like this. Note that 3000 is PubMed's suggested maximum year, but it could be anything.
3. Retrieve all unique forms of identity.name.lastName and identity.name.firstInitial from Identity table for targetAuthor.
Look at both primaryName and alternateNames.
4. Derive additional names, if possible.
Logic:
A. For any name in primaryName or alternateNames, does targetAuthor have a surname, which satisfies these conditions: contains a space or dash; if you break up the name at the first space or dash, there would be two strings of four characters or greater
If yes, go to B
If no, there is no need to derive name aliases.
B. Attempt to derive additional name aliases by breaking up any surnames, splitting on space or dash.
For example: ses9022 has a primaryName of Selin Somersan-Karakaya. This would translate into:
Somersan-Karakya S
Somersan S
Karakaya S
Some users have multiple spaces (e.g., alg9037 - Gonzalez Della Valle). In such cases, split only on the first space or dash.
Gonzalez A
Della Valle A
Test cases (CWID, surname in Enterprise Directory, surname in article)
lveeck, Veeck Gosden, Veeck
nlt2002, Tottenham-Delafield, Tottenham
sinhana, Sinha Gregory, Gregory
C. Does user have a first initial followed by a period orspace in givenName for primaryName like any of the following?
- W. Clay[firstName] Bracken[lastName]
- W.[firstName] Clay[middleName] Bracken[lastName]
- W Clay[firstName] Bracken[lastName]
- W[firstName] Clay[middleName] Bracken[lastName]
If yes, attempt to derive new name alias for lookup. In all the above cases, we derive the name “Bracken C”. When we do the lookup, let’s call this, “abbreviatedFirstNameRetrievalStrategy”
use case: wcb2001
5. Sanitize
Deaccent for all names.
Remove for all names by referring to any suffixes listed in nameScoringStrategy-excludedSuffixes, which is stored in application.properties.
6. Does this search return more than value set in searchStrategy-leninent-threshold? (Do not include derived names at this point.)
[lastName firstInitial for primaryName] OR [lastName firstInitial for alternateName1] OR [lastName firstInitial for alternateName2]...
If no, do the above search AND dateFilter. Return results. We're done with primaryName and alternateNames. Then go to strictRetrievalStrategy for any cases where name type=derived. If there are no such cases, stop.
If yes, we're going to look these names up using strictRetrievalStrategy-*.
strictRetrievalStrategy
7. Preprocessing: we need to construct a series of parameters which limit our result set.
A. strictRetrievalStrategy-knownRelationships
Source: identity.knownRelationships where:
firstInitial and lastName of knownRelationship does not match targetAuthor, and
The full output using the existing value in application.properties should look like this.
AND (weill AND cornell) OR (weill AND medicine) OR (cornell AND medicine) OR (cornell AND medical) OR (weill AND medical) OR (weill AND bugando) OR (weill AND graduate) OR (cornell AND presbyterian) OR (weill AND presbyterian) OR (10065 AND cornell) OR (10065 AND presbyterian) OR (10021 AND cornell) OR (10021 AND presbyterian) OR (weill AND qatar) OR (cornell AND qatar) OR @med.cornell.edu OR @qatar-med.cornell.edu
Example: "OR (University of Milwaukee)[affiliation] OR (University of Shanghai (China))[affiliation] OR (weill AND cornell) OR (weill AND medicine)..."
E. strictRetrievalStrategy-departments
Source: identity.departments
Example: for ajg9004 - "OR Radiology[affiliation]"
F. strictRetrievalStrategy-secondInitial
Lookup by the first two capital letters in the user's first name or middle name. Examples:
Warren,James,David --> Warren JD
Choi,Augustine,M.K. --> Choi AM
Choi,Hyo Kyoung,NULL --> Choi HK
Moore,John,P --> Moore JP
Do this across all primary and alternate names.
G. ON HOLD: strictRetrievalStrategy-meshMajor
Source: meshMajor from any pub retrieved during goldStandardRetrievalStrategy or emailRetrievalStrategy, and that has meshTerm.count of 50,000 or less
Example: "OR cardiac arrest[Majr] OR Aneurysm, Dissecting[majr]"
8. Prepare searches
Strict retrieval strategy searches consist of two major pieces: name(s) and additional parameters from 7 and the dateFilter.
Suppose we've identified three distinct names that need to be retrieved using strict retrieval. Let's call them A, B, and C. Our searches would be:
(A OR B OR C) AND 7A AND dateFilter
(A OR B OR C) AND 7B AND dateFilter
(A OR B OR C) AND 7C AND dateFilter
(A OR B OR C) AND 7D AND dateFilter
(A OR B OR C) AND 7E AND dateFilter
9. Conduct searches
For each search, first get a count of results. If the result count is lower than the value set in searchStrategy-strict-threshold, proceed with storing the results. If it is not, skip over that search to the next one.
For ses9022: 10903715 11673488 12496375 16614246 23012453 24197888 27144688
yiwang
For ccole: 30009991
Future development
Identify additional name aliases from targetAuthor in goldStandardRetrievalStrategy and in cases where known email is a match. We need to be storing data in the Analysis table before we can do this.
The text was updated successfully, but these errors were encountered:
application.properties
Add these to application.properties. We will describe how to use these later..
Define refreshFlag
0a. Set variable "dateFilter" equal to null.
0a. User selects one of these refreshFlags for retrieval (this should be an option in the Swagger UI):
0b. Construct date filter.
If "Only newly added publications" is selected, we're going to do an incremental lookup. Grab latestRetrievalDate from esearchresults. Construct query using last retrieval date. Suppose the latest lookup data was August 1. The modifier would look like this. Note that 3000 is PubMed's suggested maximum year, but it could be anything.
Store this as dateFilter.
goldStandardRetrievalStrategy
1. Is use.gold.standard.evidence=true?
Go to 2.
emailRetrievalStrategy
2. If user has one or more emails, run emailRetrievalStrategy.
Test case: mcr2004@med.cornell.edu
lastNameFirstInitialRetrievalStrategy
3. Retrieve all unique forms of identity.name.lastName and identity.name.firstInitial from Identity table for targetAuthor.
Look at both primaryName and alternateNames.
4. Derive additional names, if possible.
Logic:
A. For any name in primaryName or alternateNames, does targetAuthor have a surname, which satisfies these conditions: contains a space or dash; if you break up the name at the first space or dash, there would be two strings of four characters or greater
B. Attempt to derive additional name aliases by breaking up any surnames, splitting on space or dash.
For example: ses9022 has a primaryName of Selin Somersan-Karakaya. This would translate into:
Some users have multiple spaces (e.g., alg9037 - Gonzalez Della Valle). In such cases, split only on the first space or dash.
Test cases (CWID, surname in Enterprise Directory, surname in article)
C. Does user have a first initial followed by a period orspace in givenName for primaryName like any of the following?
use case: wcb2001
5. Sanitize
nameScoringStrategy-excludedSuffixes
, which is stored in application.properties.6. Does this search return more than value set in
searchStrategy-leninent-threshold
? (Do not include derived names at this point.)[lastName firstInitial for primaryName] OR [lastName firstInitial for alternateName1] OR [lastName firstInitial for alternateName2]...
strictRetrievalStrategy
7. Preprocessing: we need to construct a series of parameters which limit our result set.
A. strictRetrievalStrategy-knownRelationships
B. strictRetrievalStrategy-fullName
C. strictRetrievalStrategy-grants
D. strictRetrievalStrategy-institutions
homeInstitution-keywords
For preprocessing, take the above and:
,
forOR
|
forAND
The full output using the existing value in application.properties should look like this.
E. strictRetrievalStrategy-departments
F. strictRetrievalStrategy-secondInitial
G. ON HOLD: strictRetrievalStrategy-meshMajor
8. Prepare searches
Strict retrieval strategy searches consist of two major pieces: name(s) and additional parameters from 7 and the dateFilter.
Suppose we've identified three distinct names that need to be retrieved using strict retrieval. Let's call them A, B, and C. Our searches would be:
9. Conduct searches
For each search, first get a count of results. If the result count is lower than the value set in
searchStrategy-strict-threshold
, proceed with storing the results. If it is not, skip over that search to the next one.Test cases
Future development
The text was updated successfully, but these errors were encountered: