As the number of government algorithms grow, so does the need to evaluate algorithmic fairness. This paper has three goals. First, we ground the notion of algorithmic fairness in the context of disparate impact, arguing that for an algorithm to be fair, its predictions must generalize across different protected groups. Next, two algorithmic use cases are presented with code examples for how to evaluate fairness. Finally, we promote the concept of an open source repository of government algorithmic "scorecards," allowing stakeholders to compare across algorithms and use cases.
Our sincerest appreciation to Dr. Dyann Daley and Predict Align Prevent for their generous support of this work. We are also extremely grateful to Matt Harris for ensuring our code is in fact, replicable. Finally, we are immensely appreciative of the time and expertise of several reviewers including Drs. Dennis Culhane, Dyann Daley, John Landis, and Tony Smith.