Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bias and model transparency #108

Open
dontcallmedom opened this issue Sep 23, 2020 · 5 comments
Open

Bias and model transparency #108

dontcallmedom opened this issue Sep 23, 2020 · 5 comments
Labels
Discussion topic Topic discussed at the workshop User's Perspective Machine Learning Experiences on the Web: A User's Perspective

Comments

@dontcallmedom
Copy link
Member

In her talk, Jutta highlighted the risks for minorities or groups whose data are underrepresented in data used for training models, and approaches to reduce the bias (e.g. the "lawnmower" approach)

@JohnRochfordUMMS 's talk highlighted that privacy concerns make that phenomenon even stronger for people with disabilities, and highlighted tools that can help identify bias in training data.

Are there well known metrics or metadata that a model provider can (and ideally, should) attach to their models to help developers assess how much and what kind of bias they might be importing when they use a given model? Are there natural fora where discussions on these metadata are expected to happen?

@dontcallmedom dontcallmedom added the User's Perspective Machine Learning Experiences on the Web: A User's Perspective label Sep 23, 2020
@anssiko
Copy link
Member

anssiko commented Sep 24, 2020

I recently read a paper Model Cards for Model Reporting that discusses a "model card" proposal that suggests an approach where an ML model is accompanied by documentation detailing the model's limitations and performance characteristics. Perhaps this model [pun unintended] would work in the context of the web-based ML as well. In fact, it would be quite a natural fit in my view.

Here are couple of proof of concept model cards from Google's Cloud Vision API: face detection and object detection.

Edit: More examples of model cards from another project (https://mediapipe.dev/) at https://google.github.io/mediapipe/solutions/models

Adding @mmitchellai, one of the paper's authors, for comments.

@dontcallmedom
Copy link
Member Author

Looking at the face detection example linked by @anssiko , it helps identifies for instance that the sample of data is very strongly skewed towards "lighter tone skins" (100K samples) where "darker tone skins" were only a 1/5th of the sample.

In any case, exposing the data sounds a really useful first step; it would be great to hear about other similar projects and if there is any convergence in this space that could be triggered by wider dev adoption (which we can assume bringing ML to the Web should induce).

@mmitchellai
Copy link

mmitchellai commented Sep 24, 2020 via email

@anssiko anssiko added this to the 2020-09-29 Live Session #4 milestone Sep 28, 2020
@Jutta-Inclusive
Copy link

Knowing the limitations and potential bias of the training data is a great step in the right direction. Also knowing what proxy data was used to fill gaps would be helpful.
Some of my concerns go deeper. Even with full proportional representation and the teasing out of systemic bias, decisions based on population data will always decide against the minority. This is a "turtles all the way down" issue in that even personalized systems will work better for the majority than the minority. How do we address this bias that preexists and is then amplified and automated in ML?

Some of the groups looking at this include:
The Montreal AI Ethics group: https://montrealethics.ai/
and our We Count Project: https://wecount.inclusivedesign.ca/

Jutta

@dontcallmedom dontcallmedom added the Discussion topic Topic discussed at the workshop label Oct 9, 2020
@toreini
Copy link

toreini commented Oct 16, 2020

Hi,

This is an interesting topic; model transparency and explainability are not easy. Is there any thought on how one can challenge the output of the ML model? I get the impression that Model cards tend to make the whole model transparent. Can it explain the decision made for a single data point (for instance, she/he wants to know why this decision has been made for her/his own case)?

Thanks,
Ehsan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion topic Topic discussed at the workshop User's Perspective Machine Learning Experiences on the Web: A User's Perspective
Projects
None yet
Development

No branches or pull requests

5 participants