The AI industry has an evaluation crisis. Static benchmarks are contaminated the moment they’re published. Models overfit to metrics rather than utility. And no enterprise will bet their business on systems evaluated by their own creators.
So, when frontier labs need to know if their latest model actually works, they release it on LMArena and watch millions of real users vote with their preferences. When OpenAI evaluates chat performance, Google evaluates Gemini, xAI tests Grok, or when teams need to evaluate code generation, we believe LMArena’s growing body of evaluations — like Web Dev Arena — have become the de facto standard.
What started as a Berkeley research project has quickly become essential infrastructure, the continuous integration pipeline for intelligence. This isn’t because of marketing or sales. It’s because the platform solved a problem everyone had but no one addressed.
We believe the companies that make AI boring will create some of the most value. Not boring as in unimpressive, but boring as in reliable, predictable, and trustworthy. LMArena is building the infrastructure to make AI as boring as databases.
That’s why we’re thrilled to be founding investors in LMArena’s seed round alongside UC Investments (University of California) and partners who share the team’s commitment to open science.
What excites me most about LMArena is their north star: solving AI reliability at scale. The platform’s power comes from a simple flywheel: more models attract more users, generating more preferences, which attracts more models. With more than 400 models and millions of monthly users creating novel prompts daily, LMArena has built the largest living dataset of human preferences on AI outputs.
When models become reliable enough for hospitals to trust diagnoses, for courts to trust analysis, or for infrastructure to trust automation, that’s a generational transformation for the economy. Government agencies are already engaging. Regulated industries are piloting private arena deployments. The demand signal is clear: neutral, continuous evaluation isn’t optional for mission-critical AI.
Moving beyond a research project and incorporating as a company allows LMArena to take things even further. Already, it has plans to expand its scope into areas such as:
We envision a world where “Arena-tested” becomes the Good Housekeeping seal for AI, akin to a signal that a system has been validated by millions of real users, not just cherry-picked benchmarks. Where every AI interaction contributes to a shared understanding of what works. Where reliability isn’t promised by vendors, but is proven through transparent, continuous evaluation.
The challenges are substantial: maintaining neutrality under commercial pressure, scaling infrastructure for billions of users, and evolving evaluation methods as AI capabilities expand. But this team has already achieved something remarkable. They’ve made the entire ecosystem collectively invested in human preference at scale. In the race to build more capable AI, LMArena is on a mission to ensure those capabilities actually serve the people who use them. If that’s the future you want to build, they’re hiring.
Sign up for our a16z newsletter to get analysis and news covering the latest trends reshaping AI and infrastructure.
Check your inbox for a welcome note.
The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. In addition, this content may include third-party advertisements; a16z has not reviewed such advertisements and does not endorse any advertising content contained therein.
This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets) is available at https://a16z.com/investments/.
Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.