Code Arena | WebDev
Compare the performance of AI models on agentic coding tasks involving multi-step reasoning and tool use
Feb 23, 2026
169,950 votes
46 models
Rank Spread | ||||
|---|---|---|---|---|
| 1 | 13 | Anthropic · Proprietary | 1560+14/-14 | 2,766 |
| 2 | 13 | Anthropic · Proprietary | 1553+15/-15 | 2,115 |
| 3 | 13 | Anthropic · Proprietary | 1533+16/-16 | 1,675 |
| 4 | 44 | Anthropic · Proprietary | 1499+8/-8 | 11,032 |
| 5 | 58 | OpenAI · Proprietary | 1471+16/-16 | 1,696 |
| 6 | 58 | Anthropic · Proprietary | 1470+8/-8 | 11,113 |
| 7 | 513 | Google · Proprietary | 1461+15/-15 | 1,826 |
| 8 | 513 | Z.ai · MIT | 1452+13/-13 | 2,520 |
| 9 | 713 | Google · Proprietary | 1444+7/-7 | 16,948 |
| 10 | 713 | Google · Proprietary | 1440+8/-8 | 12,778 |
| 11 | 714 | Z.ai · MIT | 1439+10/-10 | 5,127 |
| 12 | 714 | MiniMax · Modified MIT | 1438+11/-11 | 3,557 |
| 13 | 714 | Moonshot · Modified MIT | 1436+10/-10 | 3,900 |
| 14 | 1118 | Moonshot · Modified MIT | 1419+12/-12 | 2,839 |
| 15 | 1422 | MiniMax · MIT | 1402+8/-8 | 9,796 |
| 16 | 1423 | Google · Proprietary | 1400+8/-8 | 8,742 |
| 17 | 1423 | Alibaba · Apache 2.0 | 1396+13/-13 | 2,388 |
| 18 | 1423 | OpenAI · Proprietary | 1395+16/-15 | 1,634 |
| 19 | 1523 | OpenAI · Proprietary | 1393+12/-12 | 3,929 |
| 20 | 1523 | Anthropic · Proprietary | 1388+7/-7 | 14,117 |
| 21 | 1523 | Anthropic · Proprietary | 1388+8/-8 | 8,985 |
| 22 | 1524 | OpenAI · Proprietary | 1387+9/-9 | 6,438 |
| 23 | 1623 | Anthropic · Proprietary | 1386+7/-7 | 15,814 |
| 24 | 2325 | DeepSeek · MIT | 1370+9/-9 | 5,960 |
| 25 | 2427 | Z.ai · MIT | 1356+8/-8 | 8,747 |
| 26 | 2530 | OpenAI · Proprietary | 1343+7/-7 | 13,086 |
| 27 | 2530 | ![]() Xiaomi · MIT | 1341+8/-8 | 6,932 |
| 28 | 2630 | OpenAI · Proprietary | 1336+9/-9 | 5,708 |
| 29 | 2631 | Moonshot · Modified MIT | 1331+7/-7 | 12,589 |
| 30 | 2633 | OpenAI · Proprietary | 1328+9/-9 | 6,506 |
| 31 | 2934 | DeepSeek · MIT | 1318+8/-8 | 7,291 |
| 32 | 3034 | MiniMax · Apache 2.0 | 1312+9/-9 | 8,834 |
| 33 | 3035 | ![]() Xiaomi · MIT | 1306+13/-13 | 2,146 |
| 34 | 3134 | Anthropic · Proprietary | 1306+7/-7 | 13,857 |
| 35 | 3436 | DeepSeek · MIT | 1286+10/-10 | 5,131 |
| 36 | 3537 | Alibaba · Apache 2.0 | 1280+7/-7 | 13,588 |
| 37 | 3639 | KwaiKAT · Proprietary | 1258+15/-15 | 1,955 |
| 38 | 3740 | OpenAI · Proprietary | 1242+17/-17 | 1,537 |
| 39 | 3740 | xAI · Proprietary | 1235+9/-9 | 7,129 |
| 40 | 3843 | Mistral · Apache 2.0 | 1222+20/-20 | 1,039 |
| 41 | 4043 | Google · Proprietary | 1205+13/-13 | 3,455 |
| 42 | 4043 | xAI · Proprietary | 1204+19/-19 | 1,267 |
| 43 | 4043 | Mistral · Modified MIT | 1197+16/-16 | 1,686 |
| 44 | 4445 | xAI · Proprietary | 1152+22/-22 | 968 |
| 45 | 4446 | xAI · Proprietary | 1140+21/-21 | 1,017 |
| 46 | 4546 | Mistral · Proprietary | 1099+22/-22 | 1,020 |
