On Thursday, OpenAI released its first production AI model to run on non-Nvidia hardware, deploying the new GPT-5.3-Codex-Spark coding model on chips from Cerebras. The model delivers code at more than 1,000 tokens (chunks of data) per second, which is reported to be roughly 15 times faster than its predecessor. To compare, Anthropic’s Claude Opus 4.6 in its new premium-priced fast mode reaches about 2.5 times its standard speed of 68.2 tokens per second, although it is a larger and more capable model than Spark.
“Cerebras has been a great engineering partner, and we’re excited about adding fast inference as a new platform capability,” Sachin Katti, head of compute at OpenAI, said in a statement.
Codex-Spark is a research preview available to ChatGPT Pro subscribers ($200/month) through the Codex app, command-line interface, and VS Code extension. OpenAI is rolling out API access to select design partners. The model ships with a 128,000-token context window and handles text only at launch.
The release builds on the full GPT-5.3-Codex model that OpenAI launched earlier this month. Where the full model handles heavyweight agentic coding tasks, OpenAI tuned Spark for speed over depth of knowledge. OpenAI built it as a text-only model and tuned it specifically for coding, not for the general-purpose tasks that the larger version of GPT-5.3 handles.
On SWE-Bench Pro and Terminal-Bench 2.0, two benchmarks for evaluating software engineering ability, Spark reportedly outperforms the older GPT-5.1-Codex-mini while completing tasks in a fraction of the time, according to OpenAI. The company did not share independent validation of those numbers.
Anecdotally, Codex’s speed has been a sore spot; when Ars tested four AI coding agents building Minesweeper clones in December, Codex took roughly twice as long as Anthropic’s Claude Code to produce a working game.
The coding agent arms race
For context, GPT-5.3-Codex-Spark’s 1,000 tokens per second represents a fairly dramatic leap over anything OpenAI has previously served through its own infrastructure. According to independent benchmarks from Artificial Analysis, OpenAI’s fastest models on Nvidia hardware top out well below that mark: GPT-4o delivers roughly 147 tokens per second, o3-mini hits about 167, and GPT-4o mini clocks around 52.

Loading comments...