Create new strategy: genderStrategyScore #357

paulalbert1 · 2019-05-31T15:58:29Z

Background

There are a number of cases where ReCiter is suggesting articles for someone of the opposite gender. For example...

We can take advantage of the fact that certain names are more often associated with a particular gender - especially in cases when the inferred gender of the name of our target person does not match the inferred name of the target author of our candidate article.

Caveat: yes, gender is a social construct, but people named Richard tend to be male more often than not (according to SSA, 99.6% of the time), and people named Susan tend to be female more often than not (99.8%). If a person of interest named Susan happens to be a male, ReCiter would not entirely fail to suggest a candidate article where the targetAuthor is "Susananne," It would merely slightly downweight that result as a possible match.

Data source for gender

Howarder downloaded names and genders from the Social Security Administration, which covers 1930-2015. He then computed percentages by gender.
name_gender.json.txt

Some sample data:

Michaeel,M,1
Michael,M,0.9950199847246701
Michaela,F,0.9985816477553034
...
Susaa,F,1
Susan,F,0.9977389059509503

Consistent with other data sets, this table could be loaded as a DynamoDB table, with name = "Gender."

How this would work

Add to application.properties

strategy.genderStrategyScore.minimumScore=-5
strategy.genderStrategyScore.rangeScore=6

Attempt to infer the gender of our target person.

Get firstName and middleName from primaryName and alternateName.
Split names on space and dash. Names need to be two characters or more.
Attempt to do an exact lookup in the Gender table of all these names:
- primaryName(s) from firstName
- alternateName(s) from firstName
- primaryName(s) from middleName
- alternateName(s) from middleName
If there's no exact match, stop.

Identify the gender and percentage for the target person. For example, for Ben:

Ben,M,0.9943987100059407

3a. Take the average gender score of all available names.

Attempt to infer the gender of our targetAuthor.

Get firstName from article.
Split name on space
Attempt to do an exact lookup in the Gender table
Example: 24795040 (Y. Claire Wang) --> Claire

Identify the gender and percentage for the target article. For example, for Beth:

Beth,F,0.9979603107858765

5a. Take the average gender score of all available names.

Compute gender score discrepancy between article and identity.

For any female gender, subtract score from 1. For example articleGender for Beth, would be 1 - 0.9979 = 0.0021.
For a male gender, leave score as is: 0.994.
Take the absolute value of the difference from 1. For example: 1 - (0.994 - 0.0021) = 0.0081. We'll call this scoreDifference.
Compute the genderScore: (scoreDifference * strategy.genderStrategyScore.rangeScore) + strategy.genderStrategyScore.minimumScore
For example: (0.0081 * 4) + -3 = -2.967

Output the scores.

genderScore-Article = 0.0021
genderScore-Identity = 0.994
genderScore-IdentityArticleDiscrepancy = -2.967

Handling null cases.

There may be cases where a gender is null, e.g., A. Rifkind
In this case, the output should look like this.
- genderScore-Article = null
- genderScore-Identity = 0.02
- genderScore-IdentityArticleDiscrepancy = NULL

The text was updated successfully, but these errors were encountered:

paulalbert1 · 2019-06-11T14:59:10Z

Test case for splitting dashes...

paulalbert1 added the enhancement label May 31, 2019

sarbajitdutta self-assigned this Jun 5, 2019

sarbajitdutta added this to To do in ReCiter enhancements via automation Jun 5, 2019

sarbajitdutta moved this from To do to Testing in ReCiter enhancements Jun 12, 2019

sarbajitdutta mentioned this issue Jun 12, 2019

Gender strategy and misc improvements #358

Merged

paulalbert1 closed this as completed Jun 12, 2019

ReCiter enhancements automation moved this from Testing to Done Jun 12, 2019

sarbajitdutta mentioned this issue Jun 12, 2019

Release of 1.1 #359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create new strategy: genderStrategyScore #357

Create new strategy: genderStrategyScore #357

paulalbert1 commented May 31, 2019 •

edited

paulalbert1 commented Jun 11, 2019

Create new strategy: genderStrategyScore #357

Create new strategy: genderStrategyScore #357

Comments

paulalbert1 commented May 31, 2019 • edited

Background

Data source for gender

How this would work

paulalbert1 commented Jun 11, 2019

paulalbert1 commented May 31, 2019 •

edited