Skip to content

Comparing the performance of Codex and GPT3 using code prompts and text prompts in NLP tasks.

License

Notifications You must be signed in to change notification settings

zharry29/curious_code_prompts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring the Curious Case of Code Prompts

What is this work about

Recent work has shown that prompting language models with code-like representations of natural language leads to performance improvements on structured reasoning tasks. However, such tasks comprise only a small subset of all natural language tasks. In our work, we seek to answer whether or not code-prompting is the preferred way of interacting with language models in general. We compare code and text prompts across three popular GPT models (davinci, code-davinci-002, and text-davinci-002) on a broader selection of tasks (e.g., QA, sentiment, summarization) and find that with few exceptions, code prompts do not consistently outperform text prompts. Furthermore, we show that the style of code prompt has a large effect on performance for some but not all tasks and that fine-tuning on text instructions leads to better relative performance of code prompts.

Dependency

The conda environment /envs should contain everything needed. If not, please install libraries as prompted.

Organization

Each task has a corresponding folder in /datasets. Take HellaSWAG for example: in /datasets/HellaSWAG: running

python hellaswag.py --prompt text|code1|code2|code3|code4 --model codex|davinci --max_prompt MAX_TOKENS

produces an output file of a model (code-davinci-002 or text-davinci-002) with a prompt format (text prompt or one of our 4 code prompts) and a context window (2000, 4000, or 8000 tokens). Additionally, it also performs evaluation which can be used to replicate Figure 3, 4 and Table 2 in the paper.

Citation

If you find our work useful, please cite

@misc{zhang2023exploring,
      title={Exploring the Curious Case of Code Prompts}, 
      author={Li Zhang and Liam Dugan and Hainiu Xu and Chris Callison-Burch},
      year={2023},
      eprint={2304.13250},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Comparing the performance of Codex and GPT3 using code prompts and text prompts in NLP tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •