Uber burned their annual AI budget in four months. The problem isn't the price.

Two things happened in the AI dev tools world this month that nobody’s connecting.

One: a skill that makes AI coding agents respond like a caveman. “Me fix bug. You want test?” It cut token usage by 65–75%. Currently going strong at 50k+ stars.

Two: GitHub paused all new Copilot signups, Pro, Pro+, Student, all frozen. The request-unit model blew up in their face. Starting June 1, everyone moves to token-based billing. The unlimited era is over.

Meanwhile, somewhere between these two highlights, Uber was going back to the drawing board on their AI budget. Why?

Uber handed 5,000 engineers Claude Code in December. By April, four months in, the entire annual AI tooling budget was gone.

Their CTO’s words: “I’m back to the drawing board, because the budget I thought I would need is blown away already.”

65–75%Token reduction from caveman-mode agents, 50k+ stars on GitHub

5,000Uber engineers handed Claude Code in December

4 mo.Until the entire annual AI tooling budget was gone

Jun 1GitHub moves all Copilot plans to token-based billing

Every session, the agent re-learns the project. User memory, conversation summaries, massive config files, the community has tried all of it. But the cost isn’t in the work. It’s in the re-orientation.

Multiply that by 5,000 engineers, multiple sessions a day, four months. That’s where Uber’s budget went.

The frontier companies are solving for what agents remember about you, preferences, conversation history, communication style.

Anthropic’s own researchers running long-running Claude agents landed on a CHANGELOG.md file as the agent’s “portable long-term memory”, read at the start of each session to remember where it left off.

Nobody’s solved what agents should know about the project.

Spec-driven development is the closest principled answer right now. But it scopes to one session. It doesn’t survive v1.2. It doesn’t survive Monday morning.

There’s a project called memvid taking a completely different angle, encoding context into video frames, then using semantic search to retrieve only what’s relevant to the current task.

Different medium, same underlying insight: agents shouldn’t hold everything in the active window. They should fetch what’s contextually relevant, right now, for this task.

The infrastructure layer is patching around this. Nobody’s designed for it from the ground up.

The gap is structural. And it’s expensive on both ends, the developer re-explaining the same architectural decision for the fifth time, and the org that budgeted for AI productivity and got a bill instead.

I’ve been pulling this apart from first principles. There’s an architecture here, not a new tool, not another framework, something closer to a knowledge contract between an agent and the project it works on. What does an agent actually need to know? When? What never changes versus what’s scoped to right now?

The questions are simple. The answers restructure everything.

Most teams aren’t solving this. They’re just absorbing the cost.