Jump to content

Chatbot Arena

From Wikipedia, the free encyclopedia
(Redirected from Draft:Chatbot Arena)
Chatbot Arena
Type of site
Chatbot, artificial intelligence
Country of originUnited States
OwnerLMSYS Org
Founder(s)
  • Wei-Lin Chiang
  • Anastasios Angelopoulos
URLhttps://lmarena.ai
CommercialNo
RegistrationNone
LaunchedMay 3, 2023; 23 months ago (2023-05-03)

Chatbot Arena is a chatbot website for comparing and measuring performance of large language models based on human preference. Users enter prompts for two anonymous models to respond to and vote on the model that gave the better response, in which the model's identities are revealed. Users can also choose models to test themselves.[1][2]

Chatbot Arena is popular within the artificial intelligence industry, with major companies supplying their large language models, such as GPT-4o, o1, Gemini[3], and Claude[4], and using their subsequent rankings to promote them. Notably, Chinese company DeepSeek tested its prototype models in the Chatbot Arena months before its R1 model gained attention in Western media.[5] The website has even been used for preview releases of upcoming models. However, Chatbot Arena's methodology for measuring large language model performance has been questioned as insufficient.[6][7]

The main interface for the main "battle" mode, in which two models, "enigma" and Phi-4, have responded to the prompt "What is the best wiki in the world?"

References

[edit]
  1. ^ Hart, Robert (July 18, 2024). "What AI Is The Best? Chatbot Arena Relies On Millions Of Human Votes". Forbes. Retrieved April 21, 2025.
  2. ^ Kruppa, Miles (December 5, 2024). "The UC Berkeley Project That Is the AI Industry's Obsession". The Wall Street Journal. Retrieved April 21, 2025.
  3. ^ Nuñez, Michael (November 15, 2024). "Google Gemini unexpectedly surges to No. 1, over OpenAI, but benchmarks don't tell the whole story". VentureBeat. Retrieved April 21, 2025.
  4. ^ Edwards, Benj (March 27, 2024). ""The king is dead"—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time". Ars Technica. Retrieved April 21, 2025.
  5. ^ Metz, Rachel (February 18, 2025). "Before DeepSeek Blew Up, Chatbot Arena Announced Its Arrival". Bloomberg News. Retrieved April 21, 2025.
  6. ^ Stokel-Walker, Chris (February 6, 2025). "Hundreds of rigged votes can skew AI model rankings on Chatbot Arena, study finds". Fast Company. Retrieved April 21, 2025.
  7. ^ Wiggers, Kyle (September 5, 2024). "The AI industry is obsessed with Chatbot Arena, but it might not be the best benchmark". TechCrunch. Retrieved April 21, 2025.
[edit]