A new study claims that LM Arena, a popular AI model ranking platform, employs practices that unfairly favor large tech companies whose models rank near the top. The research highlights how proprietary AI systems from companies like Google and Meta gain advantages through extensive pre-release testing options that aren’t equally available to open-source models—raising important questions about the metrics and platforms the AI industry relies on to evaluate genuine progress.
The big picture: Researchers from Cohere Labs, Princeton, and MIT found that LM Arena allows major tech companies to test multiple versions of their AI models before publicly releasing only the highest-performing versions.
Why this matters: LM Arena’s rankings have gained significant industry influence, with companies like Google highlighting their performance on the platform when releasing new models.
Key details: LM Arena works by having users compare outputs from two unidentified AI models and vote on which they prefer, with results aggregated into a public leaderboard.
What they’re saying: LM Arena has responded that their pre-release testing features were not kept secret.
Researchers’ recommendations: The study suggests several remedies to make the LM Arena platform more equitable.