We asked four AI coding agents to rebuild Minesweeper—the results were explosive

⁨9⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨BrikoX@lemmy.zip⁩ to ⁨gaming@lemmy.zip⁩

https://arstechnica.com/ai/2025/12/the-ars-technica-ai-coding-agent-test-minesweeper-edition/

How do four modern LLMs do at re-creating a simple Windows gaming classic?

source

Comments

Sort:hotnew top

edgemaster72@lemmy.world ⁨2⁩ ⁨months⁩ ago
Agent 1: Mistral Vibe
Overall rating: 4/10
This version got many of the basics right but left out chording and didn’t perform well on the small presentational and “fun” touches.

Agent 2: OpenAI Codex
Overall: 9/10
The implementation of chording and cute presentation touches push this to the top of the list. We just wish the “fun” feature was a bit more fun.

Agent 3: Anthropic Claude Code
Overall: 7/10
The lack of chording is a big omission, but the strong presentation and Power Mode options give this effort a passable final score.

Agent 4: Google Gemini CLI
Overall: 0/10 (Incomplete)

Final verdict
OpenAI Codex wins this one on points, in no small part because it was the only model to include chording as a gameplay option. But Claude Code also distinguished itself with strong presentational flourishes and quick generation time. Mistral Vibe was a significant step down, and Google CLI based on Gemini 2.5 was a complete failure on our one-shot test.

While experienced coders can definitely get better results via an interactive, back-and-forth code editing conversation with an agent, these results show how capable some of these models can be, even with a very short prompt on a relatively straightforward task. Still, we feel that our overall experience with coding agents on other projects (more on that in a future article) generally reinforces the idea that they currently function best as interactive tools that augment human skill rather than replace it.

source