### A dibiased model

Then their task was to try to determine, for words that differ on this
gender dimension, whether that gender difference is appropriate or
inappropriate. Let’s say king and queen are appropriately separated by
gender, and ditto father and mother , but maybe we don’t want to regard—
as word2vec by default does— Home Depot as the gender-flipped version of
JC Penney ; or alcoholism and eating disorders ; or pilot and flight
attendant.

- How, then, to tease apart the problematic from the unproblematic gender associations for not just a handful but for hundreds of thousands of different words?
- How to know which analogies should be kept, which should be adjusted, and which should be purged entirely?

The team did their best, identifying 218 such gender-specific
words in a subset of their model’s dictionary, and let their system
**extrapolate** to the rest of the dictionary. For all words outside this set, they set the gender component of the word’s representation to zero. Then they adjusted the
representations of all the gender-related words such that pairs of equivalent
terms—say, “brother” and “sister”—were “centered” around this zero
point. Said another way, they were adjusted so that neither term was
represented in the model as more “gender-specific” or more “gender-
neutral” than the other.

This neutralization came at a small cost—the model now, for instance,
thought it just as likely that someone could be “grandmothered in” as
“grandfathered in” to a legal exemption. But maybe this was a price worth
paying—and you could always decide how much prediction error you were
willing to trade for how much debiasing and set an appropriate tradeoff.

But later research proved that this debiasing was just - “lipstick on a pig.” Implicit connection among these stereotypically “feminine” professions themselves—between “nurse” and “receptionist”—remained.  In fact, such only partial debiasing may actually
make the problem worse, they argue, in the sense that it leaves the majority
of these stereotypical associations intact while removing the ones that are
the most visible and easiest to measure.

A classic test of unconscious bias in humans used in the social sciences is the “implicit association test,” . A team of computer scientists at Princeton—postdoc Aylin Caliskan and professors Joanna Bryson and Arvind Narayanan—found that the distance between embeddings in word2vec and other widely used word-embedding models uncannily mirrors this human reaction-time data. The slower people are to identify any two groups of words, the farther away those word vectors were in the model. The model’s biases, in other words, are, for better or worse, very much our own.

### The model’s biases, in other words, are, for better or worse, very much our own.

The fact that the embeddings that emerge from this “magical”
optimization process are so uncannily and discomfitingly useful as a mirror
for society means that we have, in effect, added a diagnostic tool to the
arsenal of social science. We can use these embeddings to quantify
something in precise detail about society at a given snapshot in time. And
regardless of causation—whether it’s changes in the objective reality thatchange the way we speak, or vice versa, or whether both are driven by some
deeper cause—we can use these snapshots to watch society change.

- Computer scientists are reaching out to the social sciences as they begin to think more broadly about what goes into the models they build. Likewise, social scientists are reaching out to the machine-learning community and are finding they now have a powerful new microscope at their disposal.
- Second is that biases and connotations are real. They are now measurable, in detail and with precision and are dynamic - the story of our language is the story of our culture.
- Third: These models should absolutely be used with caution.
- Fourthly time. As Princeton’s Arvind Narayanan puts it: “Contrary to the ‘tech moves too fast for society to keep up’ cliché, commercial deployments of tech often move glacially—just look at the banking and airline mainframes still running. ML [machine-learning] models being trained today might still be in production in 50 years, and that’s terrifying.”
- Assuming the model will not change the world is false. Indeed, uncareful deployment of these models might produce a feedback loop from which recovery becomes ever more difficult or requires ever greater interventions.
- Lastly, these models offer us a digital sextant as we look ahead as a society. They **must** be used descriptively rather than prescriptively.

#### intractability: not easily managed or cured (a theoritical study)

What they and their colleagues began to find was that not only were
there enormous complexities in translating our philosophical and legal ideas
about fairness into hard mathematical constraints but, in fact, much of the
leading thought and practice, some of it decades old, was deeply misguided
—and had the potential to be downright harmful.

If we hear in the press that a model “uses race” (or gender, etc.) as an attribute, we are led to believe something has already gone deeply wrong. Simply removing the “protected attribute” is insufficient. As long as the model takes in features that are correlated with, say, gender or race, avoiding explicitly mentioning it will do little good. This is known as the concept of “redundant encodings.” The gender attribute is redundantly encoded across other variables. eg. number of previous convictions, can become a redundant encoding of race. In fact, one of the perverse upshots of redundant
encodings is that being blind to these attributes may make things worse.


For instance, a machine-learning model used in a recruiting context might penalize a candidate for not having had a job in the prior year. We might not want this penalty applied to pregnant women or recent mothers, however—but this will be difficult if the model must be “gender-blind” and can’t include gender itself, nor something so strongly
connected to it as pregnancy. Thus. omitting the redundant encoding also makes it impossible not only to measure this bias but also to mitigate it.

#### fairness through blindness doesn’t work!

If an algorithms decision is "protected" by law then it will lead presumtively to legal harm.

A tool that is calibrated, she writes, “cannot have equal false positive and negative rates across groups, when the recidivism prevalence differs across those groups.” The impossibility proofs also show that equalizing the false positive and false negative rates means giving up on calibration. Then it will be unclear what it means to have a risk score. However, even those who emphasize the importance of calibration think
that it alone isn’t enough. As Corbett-Davies says, “Calibration, though generally desirable, provides little guarantee that decisions are equitable.”

Because this data is collected as a by-product of police activity,
predictions made on the basis of patterns learned from this data do not
pertain to future instances of crime on the whole. They pertain to future
instances of crime that becomes known to police. In this sense, predictive
policing is aptly named: it is predicting future policing, not future crime.

- What happens if a parolee is not caught at crime?
- What happens if over policed and high risk individuals become ground truth?

This is particularly worrisome in the context of predictive policing,
where this training data is used to determine the very police activity that, in
turn, generates arrest data—setting up a potential long-term feedback
loop. The model becomes increasingly confident
that the locations most likely to experience further criminal activity are
exactly the locations they had previously believed to be high in crime:
selection bias meets confirmation bias. This feedback loop, in turn, further biases
its training data.

At a minimum, it seems clear that we should
know exactly what it is that our predictive tools are designed to predict—
and we should be very cautious about using them outside of those
parameters. “USE ONLY AS DIRECTED ,”

Do that better predictions lead to better public safety? However, “improvements in the accuracy of predictions alone may not result in a reduction in crime . . . perhaps more importantly, law
enforcement needs better information about what to do with the predictions”. Predictions are not an end in themselves. What is better: a world in which we can be 99% sure where a crime will occur and when, or a world in which there is simply 99% less crime? Are we missing something larger?

“So this brings me to my main point,” Moritz Hardt tells me. A
machine-learning model, trained by data, “is by definition a tool to predict
the future, given that it looks like the past. . . . That’s why it’s
fundamentally the wrong tool for a lot of domains, where you’re trying to
design interventions and mechanisms to change the world.” Prediction offers a bit of a dystopian perspective on the topic

# Transparency

It’s often observed in the field that the most powerful models are on the
whole the least intelligible, and the most intelligible are among the least
accurate. Neural nets are really good, they’re
accurate; but they’re completely opaque and unintelligible, and I think
that’s dangerous now. 

So we need models that offer best of both worlds.

For one, they seem to suggest that, whatever
myriad issues we face in turning decision-making over to statistical models,
human judgment alone is not a viable alternative. At the same time, perhaps
complex, elaborate models really aren’t necessary to match or exceed this
human baseline.
A tantalizing question lurks, however: Namely, what explains this
surprising verdict? Is human judgment really that bad? Are simple linear
models of a handful of variables really that good? Or . . . a third possibility:
Has human expertise somehow managed to enter into the simple models
where we least expect it? Were we looking for it in the wrong place?

It’s an exciting time for researchers working on this set of questions.
Simple models are amazingly competitive—and then some—with human
expertise. Modern techniques give us ways of deriving ideal simple models.
With that said, there are cases where complexity is simply unavoidable;
the obvious one is models that don’t have the benefit of human experts
filtering their inputs to meaningful quantities of manageable size. Some
models must, for better or worse, deal not with human abstractions like
“GRE score” and “number of prior offenses” but with raw linguistic, audio,
or visual data. Some medical diagnostic tools can be fed human inputs, like
“mild fever” and “asthmatic,” while others might be shown an X-ray or
CAT scan directly and must make some sense of it. A self-driving car, of
course, must deal with a stream of radar, lidar, and visual data directly. In
such cases we have little choice but the kinds of large, multimillion-
parameter “black box” neural networks that have such a reputation for
inscrutability. But we are not without resources here as well, on the science
of transparency’s other, wilder frontier.

# SALIENCY

Humans, relative to most other species, have distinctly large and visible
sclera—the whites of our eyes—and as a result we are uniquely exposed in
how we direct our attention, or at the very least, our gaze. Evolutionary
biologists have argued, via the “cooperative eye hypothesis,” that this must
be a feature, not a bug: that it must point to the fact that cooperation has
been uncommonly important in our survival as a species, to the point that
the benefits of shared attention outweigh the loss of a certain degree of
privacy or discretion.

It might be understandable, then, for us to want to expect something
similar from our machines: to know not only what they think they see but
where, in particular, they are looking. This idea in ML is called "saliency". The idea
is that if a system is looking at an image and assigning it to some category,
then presumably some parts of the image were more important or more
influential than others in making that determination. But machine-learning systems can be very unintuitive. Often they
latch onto aspects of the training data we did not think were relevant at all,
and ignore what we would imagine was the critical information.

All of this additional information in a dataset was useless in practice as
additional inputs to the model. Learning to predict a patient’s risk of death
based on their hospital bill won’t actually help you when a new patient
arrives, because of course you don’t know their final bill yet. But rather than
serving as additional inputs, this information is useful as additional outputs,
additional sources of ground truth in training the model. The technique has
come to be known as “multitask learning.”

### Deconvolution
Zeiler and Fergus developed a visualization technique they called
“deconvolution,” which was a way to turn intermediate-level activations of
the network back into images.

The effect was dramatic, insightful. But was it useful? Zeiler popped the
hood of the AlexNet model that had won the ImageNet competition in 2012
and started digging around, inspecting it using deconvolution. He noticed
a bunch of flaws. Some low-level parts of the network had normalized
incorrectly, like an overexposed photograph. Other filters had gone “dead”
and weren’t detecting anything. Zeiler hypothesized that they weren’t
correctly sized for the types of patterns they were trying to match. As
astoundingly successful as AlexNet had been, it was carrying some dead
weight. It could be improved—and the visualization showed where.
