[State of Evals] LMArena's $100M Vision — Anastasios Angelopoulos, LMArena

Latent Space: The AI Engineer Podcast • December 31, 2025 • Solo Episode

View Original Episode

Guests

No guests identified for this episode.

Description

From building LMArena in a Berkeley basement to raising $100M and becoming the de facto leaderboard for frontier AI, Anastasios Angelopoulos returns to Latent Space to recap 2025 in one of the most influential platforms in AI—trusted by millions of users, every major lab, and the entire industry to answer one question: which model is actually best for real-world use cases? We caught up with Anastasios live at NeurIPS 2025 to dig into the origin story (spoiler: it started as an academic project incubated by Anjney Midha at a16z, who formed an entity and gave grants before they even committed to starting a company), why they decided to spin out instead of staying academic or nonprofit (the only way to scale was to build a company), how they're spending that $100M (inference costs, React migration off Gradio, and hiring world-class talent across ML, product, and go-to-market), the leaderboard delusion controversy and why their response demolished the paper's claims (factual errors, misrepresentation of open vs. closed source sampling, and ignoring the transparency of preview testing that the community loves), why platform integrity comes first (the public leaderboard is a charity, not a pay-to-play system—models can't pay to get on, can't pay to get off, and scores reflect millions of real votes), how they're expanding into occupational verticals (medicine, legal, finance, creative marketing) and multimodal arenas (video coming soon), why consumer retention is earned every single day (sign-in and persistent history were the unlock, but users are fickle and can leave at any moment), the Gemini Nano Banana moment that changed Google's market share overnight (and why multimodal models are becoming economically critical for marketing, design, and AI-for-science), how they're thinking about agents and harnesses (Code Arena evaluates models, but maybe it should evaluate full agents like Devin), and his vision for Arena as the central evaluation platform that provides the North Star for the industry—constantly fresh, immune to overfitting, and grounded in millions of real-world conversations from real users.

We discuss:

  • The $100M raise: use of funds is primarily inference costs (funding free usage for tens of millions of monthly conversations), React migration off Gradio (custom loading icons, better developer hiring, more flexibility), and hiring world-class talent

  • The scale: 250M+ conversations on the platform, tens of millions per month, 25% of users do software for a living, and half of users are now logged in

  • The leaderboard illusion controversy: Cohere researchers claimed undisclosed private testing created inequities, but Arena's response demolished the paper's factual errors (misrepresented open vs. closed source sampling, ignored transparency of preview testing that the community loves)

  • Why preview testing is loved by the community: secret codenames (Gemini Nano Banana, named after PM Naina's nickname), early access to unreleased models, and the thrill of being first to vote on frontier capabilities

  • The Nano Banana moment: changed Google's market share overnight, billions of dollars in stock movement, and validated that multimodal models (image generation, video) are economically critical for marketing, design, and AI-for-science

  • New categories: occupational and expert arenas (medicine, legal, finance, creative marketing), Code Arena, and video arena coming soon

  • Consumer retention: sign-in and persistent history were the unlock, but users are fickle and earned every single day—"every user is earned, they can leave at any moment"

Anastasios Angelopoulos

Chapters

  • 00:00:00 Introduction: Anastasios from Arena and the LM Arena Journey
  • 00:01:36 The Anjney Midha Incubation: From Berkeley Basement to Startup
  • 00:02:47 The Decision to Start a Company: Scaling Beyond Academia
  • 00:03:38 The $100M Raise: Use of Funds and Platform Economics
  • 00:05:10 Arena's User Base: 5M+ Users and Diverse Demographics
  • 00:06:02 The Competitive Landscape: Artificial Analysis, AI.xyz, and Arena's Differentiation
  • 00:08:12 Educational Value and Learning from the Community
  • 00:08:41 Technical Migration: From Gradio to React and Platform Evolution
  • 00:10:18 Leaderboard Delusion Paper: Addressing Critiques and Maintaining Integrity
  • 00:12:29 Nano Banana Moment: How Preview Models Create Market Impact
  • 00:13:41 Multimodal AI and Image Generation: From Skepticism to Economic Value
  • 00:15:37 Core Principles: Platform Integrity and the Public Leaderboard as Charity
  • 00:18:29 Future Roadmap: Expert Categories, Multimodal, Video, and Occupational Verticals
  • 00:19:10 API Strategy and Focus: Doing One Thing Well
  • 00:19:51 Community Management and Retention: Sign-In, History, and Daily Value
  • 00:22:21 Partnerships and Agent Evaluation: From Devon to Full-Featured Harnesses
  • 00:21:49 Hiring and Building a High-Performance Team

Audio