Codex Decompiler

This project was built by Akshat Parikh during the Trail of Bits 2022 Winter Internship. The project is provided as is. Contact opensource@trailofbits.com if you'd like to use this project.

Codex Decompiler

Codex Decompiler is a Ghidra plugin that utilizes OpenAI's models to improve the decompilation and reverse engineering experience. It currently has the ability to take the disassembly from Ghidra and then feed it to OpenAI's models to decompile the code. The plugin also offers several other features to perform on the decompiled code such as finding vulnerabilities using OpenAI, generating a description using OpenAI, or decompiling the Ghidra pseudocode. Down below, you can see an example of the plugin being run in Ghidra and the available features.

The plugin supports both regular OpenAI API and Azure OpenAI API. It can be configured to use different models.

Tested on Ghidra 10.3.1 with Java versions 11.0, 17.0, and 20.0.

Setup

Download the repository and move the ghidraRevAI.py file in the ghidra_scripts directory, which by default is at $USER_HOME/ghidra_scripts.
Set the environment variable OPENAI_API_KEY with the Api Key of OpenAI/Azure OpenAI (or just set it in the popup in the next steps).
Open Ghidra and import the binary to analyze.
Open the "Script Manager" window in the "Window" menu.
Select the script named ghidraRevAI.py, check the checkbox, and click the Play/Run Script button to run the script.
A series of popups will appear to help configure the plugin.
Each time you open Ghidra run the ghidraRevAI.py script again. The plugin options will be shown in the "Edit > Tool Options" window, under the "Codex-Decompiler" section.

Usage

To use the plugin, go to any function inside of the Listing window and press Ctrl+J (Cmd+J on MacOS).
A new window should pop up where you can see different operations that can be performed on the pseudocode in the taskbar. Here is an example of the taskbar.

Note: all of the output from OpenAI (pseudocode) is cached into the ghidra_scripts directory under the subfolder output. This is done to avoid unnecessary calls to the API which can be costly.

List of Operations:

Generate a description for the pseudocode displayed
View, edit, and resubmit the last prompt sent to OpenAI
Save the changes in the pseudocode editor to the file output
Decompile the disassembly again
Find vulnerabilities in the pseudocode
Decompile the pseudocode that Ghidra generated

Limitations

For any of the aforementioned features, the output from OpenAI can be faulty and inconsistent. Thus, before doing anything with the generated pseudocode or other data, make sure that it is correct.

References

Acknowledgments/Contributions

I would like to acknowledge everyone at Trail of Bits for helping me through this project and providing feedback. I thoroughly enjoyed my experience with the company and creating this tool.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
benchmark		benchmark
compiler		compiler
docs		docs
eval		eval
finetune		finetune
split		split
tests		tests
LICENSE		LICENSE
README.md		README.md
ghidraRevAI.py		ghidraRevAI.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codex Decompiler

Setup

Usage

List of Operations:

Limitations

References

Acknowledgments/Contributions

About

Releases

Packages

Contributors 3

Languages

License

trailofbits/Codex-Decompiler

Folders and files

Latest commit

History

Repository files navigation

Codex Decompiler

Setup

Usage

List of Operations:

Limitations

References

Acknowledgments/Contributions

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages