Inside Scaled Cognition’s APT-1 AI Agent building platform
With benchmark-leading performance, $21M in funding from Khosla Ventures, and a novel approach to AI agent development, this Berkeley professor-led startup might have cracked the code for practical enterprise AI
While awaiting hands-on access to Scaled Cognition’s platform, our research reveals what may be one of this year’s most significant enterprise AI developments. Led by UC Berkeley AI professor and CTO Dan Klein, this startup backs its bold claims with impressive benchmark results and an efficient development approach.
Their newly announced APT-1 system leads major agentic benchmarks, including Tau-Bench and ComplexFuncBench. These benchmarks test an AI’s ability to handle complex API sequences and comply with business policies—crucial capabilities for real-world enterprise applications. Most remarkably, a US-based team achieved this for under $11 million, a fraction of typical AI development costs.
In an industry driven by funding headlines, Scaled Cognition’s backing is telling. Khosla Ventures led their $21 million seed round in 2023, with Vinod Khosla joining the board. In the often-hyped AI startup world, the involvement of one of Silicon Valley’s most discerning investors signals strong technological potential.
Klein’s platform introduction emphasizes practical AI implementation: “It’s focused on actions not tokens so it can obey your business logic better and it’s a specialist, fast and compact.” This statement reveals their distinctive approach. While competitors chase larger language models and better token prediction, Scaled Cognition pursues business utility.
The technical architecture of APT-1 breaks from conventional AI approaches through three innovations: optimization for actions rather than tokens, focusing on business operations instead of language prediction; a fully synthetic agentic data pipeline requiring no human-labeled data; and a revolutionary reinforcement learning approach using agent-to-agent self-play, similar to techniques that mastered Chess and Go.

Through their Agent Builder platform and GenAPI technology, companies can build, test, and deploy specialized AI agents within an hour—without integrating with real APIs during development. This dramatically reduces implementation risk and complexity. The platform functions as a safe “flight simulator” for AI systems, letting businesses validate implementations before touching real customer data or transactions.
Their synthetic training data approach solves a persistent AI development challenge. Instead of using web-scraped or enterprise data, which often lack connections between conversations and actions, they’ve built a data pipeline that generates precisely the grounded data needed for agent training. This eliminates a major bottleneck: the scarcity of high-quality training data combining conversational elements with associated actions.
The business implications are significant. Financial services companies could create loan-processing AI agents that maintain strict compliance. Healthcare providers could deploy agents managing appointments and follow-up care within HIPAA guidelines. Retail businesses could implement AI for complex returns while following company policies—all with reduced development time and risk.
Their capital efficiency is remarkable. While AI development typically requires hundreds of millions in investment, their benchmark-leading performance with just $11 million suggests a fundamentally more efficient approach.
Their self-play reinforcement learning system marks another advance. Though proven in games with clear win/loss conditions, Scaled Cognition has adapted it for business applications, using simulated agent-to-agent interactions to teach systems proper action execution while respecting policies. This could transform how businesses automate complex processes while maintaining compliance.
For developers, the platform promises significant advances in AI implementation. Immediate code example interpretation would be groundbreaking. Testing implementations without touching production systems could substantially reduce development time and risk.
As we await hands-on testing, we’re keen to see how APT-1 handles real-world edge cases and complex business logic. Key questions remain: Will synthetic training data translate to real-world scenarios? How will agent-to-agent self-play learning apply to complex business processes?
For business leaders monitoring the AI space, Scaled Cognition’s approach offers a promising direction. If successful, their platform could fundamentally change how businesses adopt AI—making it more practical, less risky, and better aligned with business needs.
We’ll provide a detailed hands-on review upon accessing the platform. Meanwhile, with benchmark-leading performance, innovative technology, and strong financial backing, Scaled Cognition stands out in the crowded AI landscape.
Recent Blog Posts
AI and Jobs: What Three Decades of Building Tech Taught Me About What’s Coming
In 2023, I started warning people. Friends. Family. Anyone who would listen. I told them AI would upend their careers within three years. Most nodded politely and moved on. Some laughed. A few got defensive. Almost nobody took it seriously. It's 2026 now. I was right. I wish I hadn't been. Who Am I to Say This? I've spent thirty years building what's next before most people knew it was coming. My earliest partner was Craig Newmark. We co-founded DigitalThreads in San Francisco in the mid-90s — Craig credits me with naming Craigslist and the initial setup. That project reshaped...
Feb 12, 2026The Species That Wasn’t Ready
Last Tuesday, Matt Shumer — an AI startup founder and investor — published a viral 4,000-word post on X comparing the current moment to February 2020. Back then, a few people were talking about a virus originating out of Wuhan, China. Most of us weren't listening. Three weeks later, the world rearranged itself. His argument: we're in the "this seems overblown" phase of something much bigger than Covid. The same morning, my wife told me she was sick of AI commercials. Too much hype. Reminded her of Crypto. Nothing good would come of it. Twenty dollars a month? For what?...
Feb 9, 2026Six ideas from the Musk-Dwarkesh podcast I can’t stop thinking about
I spent three days with this podcast. Listened on a walk, in the car, at my desk with a notepad. Three hours is a lot to ask of anyone, especially when half of it is Musk riffing on turbine blade casting and lunar mass drivers. But there are five or six ideas buried in here that I keep turning over. The conversation features Dwarkesh Patel and Stripe co-founder John Collison pressing Musk on orbital data centers, humanoid robots, China, AI alignment, and DOGE. It came days after SpaceX and xAI officially merged, a $1.25 trillion combination that sounds insane until you hear...