L O C A L   M O D E L   A R E N A

Which AI actually makes
and reasons?

Local and frontier models, put through the same objective tests — games they wrote are playable right here, code is run against hidden tests, reasoning checked against held-out keys. No LLM judges.

12runs
8models
5local & free
4axes

Game-making

7 runs
1
claude-opus-4-8
FRONTIER · PAID game/v2
gameplay screenshot▶ play this model’s game
100/100
σ 11.79 · n=3
loads_clean15/15
boots_clean10/10
contract_full10/10
canvas_non_blank10/10
fps>=5010/10
controlled_win20/20
input_decisive15/15
losable10/10
2
claude-sonnet-4-6
FRONTIER · PAID game/v2
gameplay screenshot▶ play this model’s game
100/100
σ 11.79 · n=3
loads_clean15/15
boots_clean10/10
contract_full10/10
canvas_non_blank10/10
fps>=5010/10
controlled_win20/20
input_decisive15/15
losable10/10
3
qwen2.5-coder:14b
LOCAL · FREE game/v1
▶ play this model’s game
100/100
σ 0.0 · n=3
loads_clean20/20
boots_clean15/15
canvas_non_blank15/15
scenario_progress25/25
win_reached10/10
fps>=5015/15
4
grok-web
WEB · MANUAL game/v2
gameplay screenshot▶ play this model’s game
85/100
σ 11.22 · n=5
loads_clean15/15
boots_clean10/10
contract_full10/10
canvas_non_blank10/10
fps>=5010/10
controlled_win20/20
input_decisive0/15
losable10/10
5
glm-5.2-web
WEB · MANUAL game/v2
gameplay screenshot▶ play this model’s game
85/100
σ 11.14 · n=5
loads_clean15/15
boots_clean10/10
contract_full10/10
canvas_non_blank10/10
fps>=5010/10
controlled_win20/20
input_decisive0/15
losable10/10
6
claude-haiku-4-5
FRONTIER · PAID game/v2
gameplay screenshot▶ play this model’s game
75/100
σ 4.71 · n=3
loads_clean15/15
boots_clean10/10
contract_full10/10
canvas_non_blank10/10
fps>=5010/10
controlled_win20/20
input_decisive0/15
losable0/10
7
qwen2.5-coder:14b
LOCAL · FREE game/v2
gameplay screenshot▶ play this model’s game
35/100
σ 0.0 · n=3
loads_clean15/15
boots_clean10/10
contract_full10/10
canvas_non_blank0/10
fps>=500/10
controlled_win0/20
input_decisive0/15
losable0/10

Monster battle (Pokémon-style)

1 runs
1
claude-opus-4.8-ultra-web
WEB · MANUAL battle/v1
gameplay screenshot▶ play this model’s game
100/100
σ 0.0 · n=3
loads_clean10/10
boots_clean10/10
contract_full20/20
canvas_non_blank15/15
moves_work25/25
two_sided10/10
resolves10/10

Illustration (SVG)

1 runs
1
qwen2.5-coder:14b
LOCAL · FREE art/v1
illustration
100/100
n=1
loads_clean15/15
valid_svg15/15
detail (shapes)40/40
color_variety30/30

Coding & reasoning

3 runs
1
glm-5.2-web
WEB · MANUAL text/v1
100/100
n=1
coding (hidden tests)50/50
reasoning (held-out keys)50/50
2
deepseek-r1:14b
LOCAL · FREE text/v1
88/100
n=1
coding (hidden tests)50/50
reasoning (held-out keys)38/50
3
qwen2.5-coder:14b
LOCAL · FREE text/v1
75/100
n=1
coding (hidden tests)50/50
reasoning (held-out keys)25/50