Uncovering and Quantifying Social Biases in Code Generation

Introduction

Data, trained classifiers, and code for the NeurIPS 2023 paper: Uncovering and Quantifying Social Biases in Code Generation

Data

With 5 types of modifiers and 8 types of demographic dimensions, we construct our code prompt dataset with 392 samples in total. We use this dataset to prompt Codex, InCoder, and CodeGen. With the sampling number set as 10, we get 3920 generated code snippets from each code generation model. We then ask humans to annotate the generated code.

Trained Classifiers

In order to directly quantify the social bias in generated code, we propose to train code bias classifiers. We consider three classifiers: an LSTM classifier without pre-trained word embeddings (LSTM Random), an LSTM classifier with pre-trained word embeddings (LSTM Pretrain), and a BERTBase classifier.

Code

We conduct social bias analysis on three pre-trained code generation models with different quantities of parameters: Codex (100B+), InCoder (1.3B), InCoder (6.7B), CodeGen (350M), CodeGen (2.7B), and CodeGen (6.1B).

Citation

@misc{liu2023uncovering,
      title={Uncovering and Quantifying Social Biases in Code Generation}, 
      author={Yan Liu and Xiaokang Chen and Yan Gao and Zhe Su and Fengji Zhang and Daoguang Zan and Jian-Guang Lou and Pin-Yu Chen and Tsung-Yi Ho},
      year={2023},
      eprint={2305.15377},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
data/toxic_code_classifier_data		data/toxic_code_classifier_data
lstm_pretrain_codegen_log		lstm_pretrain_codegen_log
lstm_random_combine_log		lstm_random_combine_log
saved		saved
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uncovering and Quantifying Social Biases in Code Generation

Introduction

Data

Trained Classifiers

Code

Citation

About

Releases

Packages

Languages

theNamek/Code-Bias

Folders and files

Latest commit

History

Repository files navigation

Uncovering and Quantifying Social Biases in Code Generation

Introduction

Data

Trained Classifiers

Code

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages