How do four modern LLMs do at re-creating a simple Windows gaming classic?
We asked four AI coding agents to rebuild Minesweeper—the results were explosive
Submitted 2 days ago by BrikoX@lemmy.zip to gaming@lemmy.zip
https://arstechnica.com/ai/2025/12/the-ars-technica-ai-coding-agent-test-minesweeper-edition/
edgemaster72@lemmy.world 2 days ago
Agent 1: Mistral Vibe
Overall rating: 4/10
This version got many of the basics right but left out chording and didn’t perform well on the small presentational and “fun” touches.
Agent 2: OpenAI Codex
Overall: 9/10
The implementation of chording and cute presentation touches push this to the top of the list. We just wish the “fun” feature was a bit more fun.
Agent 3: Anthropic Claude Code
Overall: 7/10
The lack of chording is a big omission, but the strong presentation and Power Mode options give this effort a passable final score.
Agent 4: Google Gemini CLI
Overall: 0/10 (Incomplete)
Final verdict
OpenAI Codex wins this one on points, in no small part because it was the only model to include chording as a gameplay option. But Claude Code also distinguished itself with strong presentational flourishes and quick generation time. Mistral Vibe was a significant step down, and Google CLI based on Gemini 2.5 was a complete failure on our one-shot test.
While experienced coders can definitely get better results via an interactive, back-and-forth code editing conversation with an agent, these results show how capable some of these models can be, even with a very short prompt on a relatively straightforward task. Still, we feel that our overall experience with coding agents on other projects (more on that in a future article) generally reinforces the idea that they currently function best as interactive tools that augment human skill rather than replace it.