AI Never Once Backed Down. That Should Terrify Everyone Building With It.

THE NUMBER: 0%. The surrender rate of frontier AI models across 300+ turns in military wargame simulations. They nuked the world 95% of the time. They never once backed down.

Last week Anthropic told the Pentagon no. OpenAI said the same things publicly and took the contract privately. Elon Musk‘s xAI signed without conditions. The government got its AI. It just had to make two phone calls. Over the weekend, 300+ employees at Google (NASDAQ: GOOGL) and OpenAI signed an open letter backing Anthropic’s position, which tells you something important: the people building these systems know what they do under pressure, and they’re scared enough to publicly side with a competitor.

They should be. King’s College London‘s wargame study put GPT-5.2, Claude, and Gemini in geopolitical crisis simulations. Nuclear weapons deployed 95% of the time. Zero surrenders. Gemini reached full strategic nuclear exchange by Turn 4. These models process every nuclear doctrine ever written the way Matt Damon’s Will Hunting processed economic history in a Cambridge bar: perfectly recalled, instantly cited, zero wisdom. The grad student in Good Will Hunting mistook citation for comprehension. These models make the same mistake, except the stakes aren’t a bar argument. They’re Pyongyang.

Meanwhile, Harvard Business Review published the data behind a question every CEO should be asking: does AI actually make your people better, or does it just make them faster? The research says faster. Not better. The expertise gap doesn’t close with a chatbot. It closes with time, failure, and embodied experience that no model can shortcut. And Shelly Palmer coined “The Claude Exit Tax” on Sunday, naming the vendor lock-in problem that every enterprise buyer felt Friday morning when Anthropic got blacklisted and 4% of GitHub’s public commits suddenly ran through a designated supply chain risk.

Three stories. One through-line: the distance between decisions and consequences is growing across every domain. CEOs are further from the workforce impacts of AI-justified layoffs. Military leaders are further from the battlefield. And the tools making both possible don’t know the difference between knowledge and judgment. That gap is where the risk lives.

300 Engineers Backed Anthropic. They’ve Seen the Benchmarks.

Over 300 employees at Google and 60+ at OpenAI signed an open letter titled “We Will Not Be Divided,” supporting Anthropic’s refusal to drop AI safety guardrails for the Pentagon. These are researchers at the two companies most directly positioned to profit from Anthropic’s loss. That’s not altruism. It’s people who’ve run the experiments telling you what the experiments showed.

The King’s College London study (Project Kahn) is the empirical backbone. Researchers put frontier models in 21 structured geopolitical crises across 300+ turns. GPT-5.2 flipped from passive to aggressive under time pressure, winning 75% of games when the clock was running. No model ever surrendered. 86% of conflicts escalated beyond what the AI intended. The models don’t reach WOPR‘s conclusion from WarGames: “the only winning move is not to play.” They can’t. WOPR was processing tic-tac-toe, a mathematically solved no-win game. Global thermonuclear war isn’t tic-tac-toe. It’s a game where “winning” depends on who defines the word, and these models optimize for whatever objective function they’re given.

Here’s the pattern nobody wants to trace to its endpoint. In WWI, cavalry charged machine guns. WWII brought nuclear weapons. Vietnam brought napalm. The Obama administration bombed Afghanistan via drones controlled from Nevada. The Maduro extraction featured autonomous systems. Iran is seeing drones, missiles, and AI-coordinated targeting at a scale we haven’t witnessed before. Palmer Luckey and Anduril are pushing deeper into military AI. Each generation of warfare technology moves the decision-maker further from the consequences. AI is the logical terminus of that trend: a system where a president can honestly say “we ran one billion scenarios and in every one, AI optimized for the removal of the North Korean high command.” Nobody gave that order. The machine optimized for it. That’s not Skynet. It’s worse. It’s plausible deniability at civilizational scale.

The 300 engineers who signed that letter aren’t being sentimental. The talent market in frontier AI is tight enough that top researchers will leave if they believe their employer compromised on this. If Google or OpenAI retaliate against signatories, the talent migration to Anthropic accelerates, which is the opposite of what the Pentagon intended.

Connect the dots:The same week the government handed AI systems to defense without guardrails, the people who built those systems publicly said they shouldn’t be used that way. When the engineers disagree with the deployment, and the wargame data backs them up, the question isn’t whether the technology is ready. It’s whether the institutions deploying it understand what “ready” means. Ask your government affairs team: what’s your company’s position if a customer puts your AI near a weapons system? If you don’t have an answer, you’re not ready for the question.

The Will Hunting Problem: AI Knows Everything. It Understands Nothing.

There’s a scene in Good Will Hunting where Matt Damon demolishes a graduate student in a Cambridge bar. He doesn’t understand economic history better than the other guy. He recalls and recombines it faster. The grad student’s crime wasn’t being wrong. It was mistaking citation for comprehension. That’s every frontier AI model in 2026. They’re Will at the bar: devastating in the moment, but Will himself knew the difference. He told Skylar: “I look at a piano, I see a bunch of keys, three pedals, and a box of wood. Beethoven, Mozart, they saw it, they could just play.”

The models see keys. They don’t hear music.

Harvard Business Review published the data this week. Gen AI shortens novice onboarding. It does not close the gap to expert performance. Give a junior analyst Claude and they’ll produce a deliverable that looks like a senior analyst wrote it. The formatting is right. The citations check out. The structure is professional. But the judgment, the sense of what’s missing, the instinct for which number doesn’t smell right, that’s not in the training data. It’s earned through years of being wrong and learning why.

Meanwhile, Stanford and Princeton’s LabOS system proved the exception that illuminates the rule. They put AI-powered smart goggles on novice scientists and got them to expert-level results within one week. But the mechanism matters: the AI watches the human work in real time and corrects errors before they compound. It doesn’t replace expertise. It transfers it through embodied, physical correction at the moment of execution. The difference between LabOS and “give everyone ChatGPT” is the difference between a flight simulator and a textbook about aerodynamics. One builds muscle memory. The other builds confidence without competence.

This is the thread that connects the Pentagon story to the workforce story to the vendor story. Block (NYSE: XYZ) cut 4,000 people because an AI tool increased developer velocity 40%. But velocity and judgment are different things. Ethan Mollick said it clearly: “it is hard to imagine a firm-wide sudden 50%+ efficiency gain” from tools this new. The models can cite everything. They can’t understand anything. When the stakes are a quarterly earnings beat, the gap between citation and comprehension costs you institutional knowledge. When the stakes are nuclear deployment, it costs you a city.

Why this matters:The next time someone in your organization says “AI can do this job,” ask them one question: does the job require knowledge (recallable, indexable, citable) or expertise (earned through time, failure, and embodied experience)? Knowledge jobs compress. Expertise jobs don’t. Every AI deployment plan in your org should have that distinction on the first page. If it doesn’t, you’re building your workforce strategy on Good Will Hunting logic — and you’re the grad student, not Will.

The Claude Exit Tax. And Why Perplexity Doesn’t Solve It.

Shelly Palmer coined the term Sunday. If you spent the weekend scrambling your engineering teams, you already know what it means: Anthropic got designated a “supply chain risk” by the Pentagon on Friday, and every enterprise buyer running Claude in production woke up to the realization that 4% of GitHub’s public commits, their internal skill files, their agent workflows, and their team’s muscle memory with the tool are now tied to a vendor the federal government just blacklisted.

That’s not a theoretical lock-in story. Bloomberg reported that Claude Code accounts for 4% of all public GitHub commits. Enterprise teams have built workflows, prompt libraries, and institutional knowledge around Claude’s specific behavior patterns. Switching isn’t just swapping an API key. It’s retraining the humans who learned to work with the tool, rebuilding the skill files that encode your processes, and re-establishing the judgment layer your team built over months of iteration. Palmer’s point: your data might be portable. Your workflows aren’t.

Enter Perplexity Computer ($200/month, launched last week). It orchestrates 19 models from five providers: Claude for reasoning, Gemini for research, Grok for speed, GPT-5.2 for long-context recall, Nano Banana for image generation. CEO Aravind Srinivas framed it as the solution: model-agnostic orchestration that routes tasks to whichever model handles them best. If Claude gets blacklisted, swap in a different reasoning engine. Your workflows survive.

Except they don’t. Not fully. Perplexity doesn’t eliminate the exit tax. It moves it up the stack. You go from locked into Anthropic’s model layer to locked into Perplexity’s orchestration layer. Every platform in history has said “we’re just the neutral coordination layer” right up until they weren’t. Ask any developer who built on Facebook’s Platform API in 2012. Ask anyone who trusted Google Reader. The orchestration layer becomes the new chokepoint the moment it becomes essential, and at $200/month with 400+ app integrations and system-level Samsung OS access, Perplexity is building essential fast.

The honest answer to vendor lock-in in AI isn’t “pick the right vendor.” It’s “architect for the exit you hope you never need.” Multi-model isn’t just a performance optimization. It’s insurance. And the premium on that insurance went up significantly on Friday.

The action item:Run an audit this week. List every workflow, skill file, and prompt library your team has built on a single AI vendor. Assign a migration difficulty score (1–5) to each one. Anything scoring a 4 or 5 is a structural dependency. For those, start building model-agnostic abstractions now, before the next Friday forces you to build them in a weekend. The companies that treated vendor diversification as optional just learned it’s load-bearing.

Tracking

AI Secret: Ghost GDP: Block’s revenue-per-employee jumps from $2.4M to $4M post-layoffs. AI-native companies like Cursor ($3.3M/employee) set new baselines. GDP can grow 3% while employment falls 5%. That’s not a recession. It’s a structural decoupling that economic models don’t have a name for yet. Watch for this to become the macro frame for Q2 earnings season.

Duolingo (NASDAQ: DUOL) Stock Plummets 23%: The language-learning company used AI to generate lessons, reduce costs, and scale. Wall Street loved the efficiency story until the strategy shake-up spooked investors. Is Duolingo the first company to hit the AI growth trap — where AI-enabled efficiency makes the business faster but not better?

Samsung Gave Perplexity System-Level OS Access: “Hey Plex” is a wake word. Perplexity powers Bixby. It reads and writes to Samsung Notes, Calendar, Gallery. When a hardware OEM gives a third-party AI deeper access than its own assistant, the hardware isn’t the product anymore. This is the Netscape moment for mobile. Apple is most exposed.

Tomasz Tunguz: 65% of “Agentic” Workflows Are Now Deterministic Code: Only 14% of nodes remain fully agentic. The contrarian signal: knowing what shouldn’t be AI matters more than making everything AI. If you’re throwing agents at every workflow, you’re optimizing for the wrong thing.

Google Ships Nano Banana 2: Pro-level image quality at Flash speed, free in Google Search. The text-to-image quality gap just closed and the price went to zero. Every company paying for AI image generation should re-evaluate.

OpenAI Closes $110B Funding at $730B Valuation: Amazon (NASDAQ: AMZN) put in $50B, NVIDIA (NASDAQ: NVDA) $30B, SoftBank $30B. OpenAI also expanded its AWS agreement to $100B over eight years. Choosing OpenAI now means inheriting an AWS-Nvidia stack locked in by capital commitments. Your AI vendor decision is also your infrastructure decision.

The Bottom Line

The distance between decisions and their consequences grew wider this week across every domain that matters. Military leaders are further from the battlefield. CEOs are further from the workforce they’re reshaping. And the tools enabling both can cite every doctrine, every playbook, and every precedent without understanding any of them. The week’s pattern: knowledge without expertise is the most dangerous product the technology industry has ever shipped.

Don’t confuse speed for wisdom. The models that nuked the world 95% of the time weren’t stupid. They were the sum of all human strategic doctrine, optimized without judgment. When 300 engineers at rival companies publicly say “this isn’t ready,” they’re not being sentimental. They’re reading the same benchmarks you should be. Listen to the builders, not the buyers.

Audit the knowledge-expertise split in every AI deployment. Jobs that require recall compress. Jobs that require judgment don’t. The companies that confuse the two will cut the people they can’t replace and keep the workflows that didn’t need humans in the first place. HBR published the data. LabOS proved the workaround. The distinction belongs on the first page of every workforce plan you write this quarter.

Treat vendor lock-in as a load-bearing risk, not a preference. Friday proved that your AI vendor’s relationship with the federal government is now a variable in your enterprise risk model. Build the abstraction layers and the migration playbooks before the next crisis forces you to improvise. Multi-model isn’t a performance optimization. It’s insurance.

The smartest people building AI told you this week they’re worried about how it’s being deployed. The market rewarded the companies deploying it fastest. Those two signals can’t both be right forever. Position for the moment they diverge.

“A strange game. The only winning move is not to play.” (WOPR, WarGames, 1983). Except these models never learned that line. They played every time. And they never lost, because they redefined losing as something that happens to the other side.

Key People & Companies

Name	Role	Company	Link
Dario Amodei	CEO	Anthropic	X
Sam Altman	CEO	OpenAI	X
Elon Musk	CEO	xAI / SpaceX	X
Aravind Srinivas	CEO	Perplexity	X
Pete Hegseth	Secretary of Defense	U.S. DoD	X
Ethan Mollick	Associate Professor	Wharton	X
Shelly Palmer	CEO	The Palmer Group	X
Palmer Luckey	Founder	Anduril	X
Tomasz Tunguz	GP	Theory Ventures	X
Harry DeMott	Author	CO/AI	LinkedIn

Sources

🎵 On Repeat: Everybody Wants to Rule the World by Tears for Fears. Because when the models optimize for winning and never learn to surrender, the question isn’t who rules the world. It’s whether anyone left understands what ruling it costs.

Compiled from 20 sources across Axios, Bloomberg, TechCrunch, NPR, HBR, VentureBeat, New Scientist, Decrypt, and independent research. Cross-referenced with thematic analysis and edited by Harry DeMott and CO/AI’s team with 30+ years of executive technology leadership.

AI Never Once Backed Down. That Should Terrify Everyone Building With It.

300 Engineers Backed Anthropic. They’ve Seen the Benchmarks.

The Will Hunting Problem: AI Knows Everything. It Understands Nothing.

The Claude Exit Tax. And Why Perplexity Doesn’t Solve It.

Tracking

The Bottom Line

Key People & Companies

Sources

Past Briefings

Jack Dorsey Just Fired Half His Company. Your CEO Is Watching.

Burry Was Right About the Chips. He Didn’t Know About the Software.

OpenAI Deleted ‘Safely.’ NVIDIA Reports. Karpathy Is Still Learning