[State of Evals] LMArena's $100M Vision — Anastasios Angelopoulos, LMArena

Latent Space: The AI Engineer Podcast • December 31, 2025 • Solo Episode

Guests

No guests identified for this episode.

Description

From building LMArena in a Berkeley basement to raising $100M and becoming the de facto leaderboard for frontier AI, Anastasios Angelopoulos returns to Latent Space to recap 2025 in one of the most influential platforms in AI—trusted by millions of users, every major lab, and the entire industry to answer one question: which model is actually best for real-world use cases? We caught up with Anastasios live at NeurIPS 2025 to dig into the origin story (spoiler: it started as an academic project incubated by Anjney Midha at a16z, who formed an entity and gave grants before they even committed to starting a company), why they decided to spin out instead of staying academic or nonprofit (the only way to scale was to build a company), how they're spending that $100M (inference costs, React migration off Gradio, and hiring world-class talent across ML, product, and go-to-market), the leaderboard delusion controversy and why their response demolished the paper's claims (factual errors, misrepresentation of open vs. closed source sampling, and ignoring the transparency of preview testing that the community loves), why platform integrity comes first (the public leaderboard is a charity, not a pay-to-play system—models can't pay to get on, can't pay to get off, and scores reflect millions of real votes), how they're expanding into occupational verticals (medicine, legal, finance, creative marketing) and multimodal arenas (video coming soon), why consumer retention is earned every single day (sign-in and persistent history were the unlock, but users are fickle and can leave at any moment), the Gemini Nano Banana moment that changed Google's market share overnight (and why multimodal models are becoming economically critical for marketing, design, and AI-for-science), how they're thinking about agents and harnesses (Code Arena evaluates models, but maybe it should evaluate full agents like Devin), and his vision for Arena as the central evaluation platform that provides the North Star for the industry—constantly fresh, immune to overfitting, and grounded in millions of real-world conversations from real users.

We discuss:

The $100M raise: use of funds is primarily inference costs (funding free usage for tens of millions of monthly conversations), React migration off Gradio (custom loading icons, better developer hiring, more flexibility), and hiring world-class talent
The scale: 250M+ conversations on the platform, tens of millions per month, 25% of users do software for a living, and half of users are now logged in
The leaderboard illusion controversy: Cohere researchers claimed undisclosed private testing created inequities, but Arena's response demolished the paper's factual errors (misrepresented open vs. closed source sampling, ignored transparency of preview testing that the community loves)
Why preview testing is loved by the community: secret codenames (Gemini Nano Banana, named after PM Naina's nickname), early access to unreleased models, and the thrill of being first to vote on frontier capabilities
The Nano Banana moment: changed Google's market share overnight, billions of dollars in stock movement, and validated that multimodal models (image generation, video) are economically critical for marketing, design, and AI-for-science
New categories: occupational and expert arenas (medicine, legal, finance, creative marketing), Code Arena, and video arena coming soon
Consumer retention: sign-in and persistent history were the unlock, but users are fickle and earned every single day—"every user is earned, they can leave at any moment"

—

Anastasios Angelopoulos

Arena: https://lmarena.ai
X: https://x.com/arena

Chapters

00:00:00 Introduction: Anastasios from Arena and the LM Arena Journey
00:01:36 The Anjney Midha Incubation: From Berkeley Basement to Startup
00:02:47 The Decision to Start a Company: Scaling Beyond Academia
00:03:38 The $100M Raise: Use of Funds and Platform Economics
00:05:10 Arena's User Base: 5M+ Users and Diverse Demographics
00:06:02 The Competitive Landscape: Artificial Analysis, AI.xyz, and Arena's Differentiation
00:08:12 Educational Value and Learning from the Community
00:08:41 Technical Migration: From Gradio to React and Platform Evolution
00:10:18 Leaderboard Delusion Paper: Addressing Critiques and Maintaining Integrity
00:12:29 Nano Banana Moment: How Preview Models Create Market Impact
00:13:41 Multimodal AI and Image Generation: From Skepticism to Economic Value
00:15:37 Core Principles: Platform Integrity and the Public Leaderboard as Charity
00:18:29 Future Roadmap: Expert Categories, Multimodal, Video, and Occupational Verticals
00:19:10 API Strategy and Focus: Doing One Thing Well
00:19:51 Community Management and Retention: Sign-In, History, and Daily Value
00:22:21 Partnerships and Agent Evaluation: From Devon to Full-Featured Harnesses
00:21:49 Hiring and Building a High-Performance Team

[State of Evals] LMArena's $100M Vision — Anastasios Angelopoulos, LMArena

Guests Re-extract with AI

Description

Chapters

Audio

Guests