---
title: "Classification"
pagetitle: "Classification"
description-meta: "Introduction, case studies, and exercises for building classification systems."
description-title: "Introduction, case studies, and exercises for building classification systems."
author: "Leon Yin"
author-meta: "Leon Yin"
date: "07-27-2023"
date-modified: "07-27-2023"
bibliography: references.bib
execute: 
  enabled: false
keywords: classifications
twitter-card:
  title: Classification
  description: Introduction, case studies, and exercises for classification.
  image: assets/inspect-element-logo.jpg
open-graph:
  title: Classification
  description: Introduction, case studies, and exercises for classification.
  locale: us_EN
  site-name: Inspect Element
  image: assets/inspect-element-logo.jpg
href: classification
---

In [None]:
#| echo: false
from utils import build_buttons
build_buttons(link= 'apis', 
              github= 'https://github.com/yinleon/inspect-element/blob/main/apis.ipynb', 
              citation= True)

# Intro

In Dante's _Inferno_, the 14th century Italian poet portrays hell as nine unique layers (or circles). Dante builds on the concept of [contrapasso](https://en.wikipedia.org/wiki/Contrapasso)-- a punishment befitting the crime, for which damned souls are sent to a specific layer of hell based on their crimes.

The eighth circle is designated for frauds, which included soothsayers and diviners. For spending their lives only looking ahead towards the future, they had their heads turned backwards so they could only look behind them. 

<figure>
<img src="assets/dante.jpg" style="width:75%">
<figcaption align = "left" style="font-size:80%;"> The Map of Hell painting by Sandro Botticelli.</figcaption>
</figure>

By doing organizing the damned by their crimes, Dante produced a classification system.

Although you probably won't be producing a system to punish souls for eternety, you will inevitably encounter the need to produce your own categorization system for an investigation. 

If it's not something you plan for, you too might find yourself in hell.

## Why is classification necessary?

Classification is more of an art or craft, than a science. 

In any case, you will need a combination of quantatative skills such as...

- inspecting source code
- reading documentation
- exploratory data analysis and

qualatative skills such as...

- interviewing domain experts
- reading the literature

Similarly, your classification can be based on quanatites that can be measured (think of naming ranges, or thresholds), or qualities (usually a "yes" or "no").

An example for quantative classification is how the atmosphere is organized into layers based thermal characteristics.


<figure>
<img src="assets/lower layers of the atmosphere.jpg" style="width:50%">
<figcaption align = "left" style="font-size:80%;"> Layers of the atmosphere. Source: <a href="https://www.noaa.gov/jetstream/atmosphere/layers-of-atmosphere">NOAA</a></figcaption>
</figure>

Scientists discovered the lower atmospheric layers using weather baloons.

A quantative example is whether or not a photo contains a Chihuahua or a Muffin.

## Case Studies

### Google associates "Black girls" with porn

In this investigation Aaron Sankin and I tried to reproduce Safiya Noble's work on Google Search in Google Ads.

To do this we queried a marketing tool called Keyword Manager with search terms to see related terms.

We found that searches for "black girls", "latina girls", "asian girls" returned majority pornographic results. We determined this using a two-step categorization process.

The first involved making the same search again with Google's "adult ideas" filter turned off. By Google's own admission, more than half of results were "adult ideas". However, after reviewing the remaining search terms it was clear that the filter was not completely effective. To check, we ran the remaining terms through Google Search and confirmed that the majority of the searches returned results that mention "porn" in the description.

<figure>
<img src="assets/google_ads_black_girls.png" style="width:75%">
<figcaption align = "left" style="font-size:80%;"> The results re-calcualted with different categorization systems to bucket race/ethnicity and income.</figcaption>
</figure>


Build off your investigation target's own definitions when possible, but find ways to supplement those with additional information. Ultimately, our supplement was another product from Google.

## Check your work

If you're concerned that your classification system is biased and would skew your results, one thing you can do is re-calculate the same statistics with differing categorization systems. This is one way to bulletproof your work, especially if changes are negligable between classification systems.


https://themarkup.org/show-your-work/2022/10/19/how-we-uncovered-disparities-in-internet-deals#different-categorization-systems

https://themarkup.org/google-the-giant/2020/07/28/how-we-analyzed-google-search-results-web-assay-parsing-tool#google-search-interp



<figure>
<img src="assets/categorization_delta.png" style="width:80%">
<figcaption align = "left" style="font-size:80%;"> The results re-calcualted with different categorization systems to bucket race/ethnicity and income.</figcaption>
</figure>