Code Arena | WebDev

Compare the performance of AI models on agentic coding tasks involving multi-step reasoning and tool use

Feb 23, 2026
169,950 votes
46 models
Rank Spread
1
13
Anthropic
Anthropic · Proprietary
1560+14/-14
2,766
2
13
Anthropic
Anthropic · Proprietary
1553+15/-15
2,115
3
13
Anthropic
Anthropic · Proprietary
1533+16/-16
1,675
4
44
Anthropic
1499+8/-8
11,032
5
58
OpenAI · Proprietary
1471+16/-16
1,696
6
58
Anthropic
Anthropic · Proprietary
1470+8/-8
11,113
7
513
Google · Proprietary
1461+15/-15
1,826
8
513
Z.ai · MIT
1452+13/-13
2,520
9
713
Google · Proprietary
1444+7/-7
16,948
10
713
Google · Proprietary
1440+8/-8
12,778
11
714
Z.ai · MIT
1439+10/-10
5,127
12
714
Minimax
MiniMax · Modified MIT
1438+11/-11
3,557
13
714
MoonshotAI
Moonshot · Modified MIT
1436+10/-10
3,900
14
1118
MoonshotAI
Moonshot · Modified MIT
1419+12/-12
2,839
15
1422
Minimax
MiniMax · MIT
1402+8/-8
9,796
16
1423
1400+8/-8
8,742
17
1423
Qwen Icon
Alibaba · Apache 2.0
1396+13/-13
2,388
18
1423
OpenAI · Proprietary
1395+16/-15
1,634
19
1523
OpenAI · Proprietary
1393+12/-12
3,929
20
1523
Anthropic
1388+7/-7
14,117
21
1523
Anthropic
Anthropic · Proprietary
1388+8/-8
8,985
22
1524
OpenAI · Proprietary
1387+9/-9
6,438
23
1623
Anthropic
Anthropic · Proprietary
1386+7/-7
15,814
24
2325
DeepSeek · MIT
1370+9/-9
5,960
25
2427
Z.ai · MIT
1356+8/-8
8,747
26
2530
OpenAI · Proprietary
1343+7/-7
13,086
27
2530
1341+8/-8
6,932
28
2630
OpenAI · Proprietary
1336+9/-9
5,708
29
2631
MoonshotAI
Moonshot · Modified MIT
1331+7/-7
12,589
30
2633
OpenAI · Proprietary
1328+9/-9
6,506
31
2934
DeepSeek · MIT
1318+8/-8
7,291
32
3034
Minimax
MiniMax · Apache 2.0
1312+9/-9
8,834
33
3035
1306+13/-13
2,146
34
3134
Anthropic
Anthropic · Proprietary
1306+7/-7
13,857
35
3436
DeepSeek · MIT
1286+10/-10
5,131
36
3537
Qwen Icon
Alibaba · Apache 2.0
1280+7/-7
13,588
37
3639
Kwai
KwaiKAT · Proprietary
1258+15/-15
1,955
38
3740
OpenAI · Proprietary
1242+17/-17
1,537
39
3740
xAI · Proprietary
1235+9/-9
7,129
40
3843
Mistral · Apache 2.0
1222+20/-20
1,039
41
4043
Google · Proprietary
1205+13/-13
3,455
42
4043
xAI · Proprietary
1204+19/-19
1,267
43
4043
Mistral · Modified MIT
1197+16/-16
1,686
44
4445
xAI · Proprietary
1152+22/-22
968
45
4446
xAI · Proprietary
1140+21/-21
1,017
46
4546
Mistral · Proprietary
1099+22/-22
1,020

Remove Style Control Leaderboard Plots

Average Win Rate Against All Other Models (Uniform Sampling and No Ties)

Fraction of Model A Wins for All Non-tied A vs. B Battles

Confidence Intervals on Model Strength (via Bootstrapping)

Battle Count for Each Combination of Models (without Ties)