Claude Code vs OpenAI Codex

This repo contains 5 tasks that I gave to Claude Code and OpenAI Codex - in each case the agent was asked to create a detailed plan on how to achieve the task, without actually implementing it.

The tasks are each in the tasks directory, and the responses from Claude Code and OpenAI Codex are in the agent1 and agent2 directories, respectively. This is just me trying to be scientific and not give any hints to the agents when they were doing their comparison.

agent1 = "Claude Code"
agent2 = "OpenAI Codex"

Judging

Given that Claude Code and OpenAI Codex were responsible for creating the plans, who better to ask for a verdict on which one did a better job? I wrote a detailed PROCESS.md file that contains a prompt asking the AI to compare the 2 sets of generated plans in various ways and give its verdict.

The verdicts are in the verdicts directory. To create from-claude.md I ran claude on the CLI and then gave it the prompt:

please read PROCESS.md and answer what it asks you

To create from-codex.md I ran codex on the CLI and then gave it the same prompt.

Verdicts

Neither agent knew whether agent1 was Claude Code or OpenAI. Both gave the slight edge in their verdicts to agent2, which is OpenAI Codex. You can read their full verdict text yourself but the general vibe was that Codex produced more focused plans without what it perceived as over-engineering. That over-engineering may well actually help the LLM do the implementation, so YMMV - this eval method is far from gospel.

Limitations

Obviously, this is a sample set of 5, done one time, in a fairly non-scientific way. OpenAI Codex was released yesterday, so expect all this to change.

Other Impressions

Codex was considerably slower, both generating the plans and comparing them. It seemed to want to spend a lot more time reading than Claude Code does.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Claude Code vs OpenAI Codex

Judging

Verdicts

Limitations

Other Impressions

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent1		agent1
agent2		agent2
tasks		tasks
verdicts		verdicts
PROCESS.md		PROCESS.md
README.md		README.md

taskdemon/code-vs-codex

Folders and files

Latest commit

History

Repository files navigation

Claude Code vs OpenAI Codex

Judging

Verdicts

Limitations

Other Impressions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages