How Spotify runs Claude across 20M+ lines of code

A funny thing happened on the way to “vibe coding” becoming the default software meme.

The most interesting version of AI coding did not come from a solo founder building a weekend app with 47 tabs open and one alarming npm install. It came from Spotify, a company with roughly 2,900 engineers, a 20M+ line backend monorepo, and about 4,500 production deployments every day.

In a new Claude interview with Spotify VP of Engineering Niklas Gustavsson, hosted by Boris from Anthropic, the headline number is wild: Spotify says 73% of its PRs are now AI-assisted, and AI tooling has helped drive a 75%+ improvement in PR frequency. Anthropic also clipped key moments in a ClaudeDevs thread.

That is the shiny part.

The useful part is much less flashy: Spotify made agents work by doing the unsexy engineering work first.

Spotify’s AI coding story started before Claude Code
The everyday workflow is weirdly normal
Honk became Spotify’s internal agent layer
Verification is the whole game
Spotify’s monorepo helped Claude instead of breaking it
The productivity numbers are hard to ignore
The best counterexample: AI coding can slow people down
The prototype layer may be the bigger cultural shift
What other companies should copy from Spotify
The engineer’s job is becoming orchestration
What to watch next
Timecodes of the top moments

Spotify’s AI coding story started before Claude Code

Gustavsson said Spotify began exploring LLMs for automated code changes years ago, before the current Claude Code wave. The company already had a painful problem: its codebase was growing about seven times faster than the number of engineers supporting it.

That meant more maintenance, more migrations, more library updates, more framework changes, and more teams repeating the same annoying work.

The old process looked familiar to anyone inside a large engineering org:

A central team wrote a migration guide.
Hundreds of teams read it, or at least meant to.
Each team manually updated its own components.
The migration dragged on for months.
Everyone discovered three new edge cases and one new reason to fear calendars.

Spotify built “fleet management” to change that. Instead of asking every team to perform the same migration by hand, it created infrastructure that could mutate code across thousands of repositories.

At first, that worked through deterministic scripts. If Spotify needed to upgrade Java, swap an API, or update a library, the script made the change.

Then reality showed up.

Code has a ridiculous surface area. Replacing one method call sounds easy until that method is called five different ways, assigned to variables, wrapped in abstractions, or hidden in some corner of a service written by a person who has since moved teams, jobs, apartments, and possibly tax jurisdictions.

Gustavsson said individual migration scripts could turn into thousands of lines of edge-case handling. That ceiling pushed Spotify toward LLMs.

The everyday workflow is weirdly normal

Gustavsson’s personal setup is not some futuristic mission-control dashboard. It is tmux, terminals, git worktrees, and a lot of Claude sessions running in parallel.

He said he typically keeps five to 10 Claude sessions open, with one session per git worktree and agents working in the background. Most of that happens inside Spotify’s large monorepos, including a backend monorepo with more than 20M lines of code.

That detail matters because Gustavsson expected the monorepo to be a problem.

Big repositories usually break tools. They contain old patterns, new patterns, half-migrated patterns, and the occasional file that gives off “do not touch the cursed object” energy.

But Spotify found the opposite. Claude worked well at that scale because it could inspect nearby code and copy the company’s existing patterns.

That is the hidden enterprise AI lesson: agents do better when the codebase has a house style.

Honk became Spotify’s internal agent layer

Spotify’s internal system, called Honk, grew out of migration work.

The company had already been automating code changes across thousands of repositories, but static rewrites kept running into the same wall: code has too much API surface.

A migration that sounds simple, like swapping one method call for another, can explode once the method is called five different ways, assigned to variables, wrapped in abstractions, or tucked inside legacy code no one has wanted to make eye contact with since 2019.

Gustavsson said Spotify’s old codemods could grow into thousands of lines of edge-case handling. Early LLMs barely helped. The breakthrough came when Spotify stopped asking the model to one-shot the answer and started wrapping it in a better system.

One big step was adding an LLM judge. Gustavsson said that took PR success from roughly 25% to 80%.

Then the models improved. The agents improved. Spotify eventually removed the judge because the agent system became strong enough without it.

That does not mean verification became less important. It means verification moved closer to the normal engineering loop: tests, CI, builds, ownership, review, and production safeguards.

For AI coding, that is the adult version of the story.

A lot of the industry still talks about coding agents as if the agent’s intelligence is the whole game. Spotify’s lesson is that intelligence needs rails. A powerful agent in a weak engineering system creates faster chaos. A powerful agent in a disciplined engineering system creates leverage.

Verification is the whole game

The strongest line from the interview was not about Claude, tmux, or the 20M-line monorepo. It was about verification.

When agents are doing real work without a human manually executing every step, verification becomes the most important part of the system. Gustavsson and Boris both framed this as the place most companies underinvest.

Spotify had to change its engineering culture around that. Before fleet management, teams could be sloppy in some areas because the owning team reviewed every change. Once Spotify started auto-merging more routine changes, that assumption broke.

The company needed stronger test automation so software could survive automated changes without every team hand-checking every PR.

That earlier investment now pays off for agents. Claude can make changes, run builds, hit CI, see what failed, and keep working inside a loop the company already trusts.

This is the adult version of AI coding.

The agent is the visible thing. The verification loop is the reason anyone can let it touch production-adjacent code without silently developing a new eye twitch.

Spotify’s monorepo helped Claude instead of breaking it

Gustavsson said he was initially worried about using agents inside Spotify’s large monorepos. That would be a reasonable thing to fear.

Big repositories can break indexing tools. They contain old patterns, new patterns, half-migrated patterns, and the occasional ancient file that gives off “please contact the one person who left in 2021” energy.

But Spotify found the opposite. Claude worked well in those repositories because it could inspect nearby code and copy the company’s existing patterns.

That is an important enterprise AI point hiding inside a coding story.

Agents perform better when the environment is consistent.

Gustavsson’s advice to other engineering leaders was not “buy the agent.” It was to keep investing in the same practices good teams already cared about:

Strong test automation.
Consistent frameworks.
Standardized code patterns.
Clear component ownership.
Reliable CI.
Cleaner internal tools.
Fewer one-off snowflakes.

Those practices used to help humans move faster. Now they help agents move faster, too.

A messy codebase confuses the new AI teammate in roughly the same way it confuses the human teammate. The difference is that the AI can now confidently apply the confusion at machine speed. Great news, your technical debt got a personal trainer.

The productivity numbers are hard to ignore

Spotify is seeing big results.

Gustavsson said the company has seen a 75%+ improvement in PR frequency that it can directly attribute to AI tooling. He also said roughly 73% of PRs are AI-assisted.

That does not automatically mean Spotify’s products are 75% better or that every engineer is magically 75% more effective. Gustavsson was careful about that. Spotify is now trying to connect engineering outputs like PRs and deployments to work items, A / B tests, rollouts, user value, and revenue.

That is the measurement problem every serious company now faces.

Counting AI-assisted PRs is easy. Knowing whether those PRs created value is harder.

This is where Spotify’s story gets more interesting than the usual AI productivity victory lap. Gustavsson said the ROI conversation started easy because the improvements were so large. As the organization matures, the expected precision goes up.

Leaders now want to know:

How many tokens did the work take?
How many human hours did it save?
Which PRs turned into shipped features?
Which shipped features changed user behavior?
Which changes helped revenue, retention, quality, or reliability?
Which AI work created cleanup costs later?

That last question is the one every company should tattoo on its sprint planning doc, preferably somewhere visible and emotionally threatening.

The best counterexample: AI coding can slow people down

Spotify’s results are impressive, but they do not erase the strongest counter-narrative: AI coding tools can make some developers slower.

A 2025 METR randomized controlled trial studied 16 experienced open-source developers completing 246 tasks in mature projects they knew well. Developers expected AI tools to make them 24% faster. Afterward, they still felt AI had made them 20% faster.

The measured result went the other way. With AI tools allowed, developers took 19% longer.

That study used earlier 2025 tools, mostly Cursor Pro with Claude 3.5 and 3.7 Sonnet, so it does not settle the 2026 agent question. It does something more useful: it explains why Spotify’s setup matters.

AI coding fails when the human has to spend more time prompting, waiting, reviewing, and cleaning up than they would have spent just doing the work. It works when the task is well-scoped, the environment is legible, and the verification loop catches mistakes cheaply.

Spotify’s numbers make sense because Spotify did not treat AI coding as a private productivity hack. It treated it as infrastructure.

That distinction changes everything.

The prototype layer may be the bigger cultural shift

The most dramatic part of the interview was not about PRs. It was about prototypes.

Gustavsson said Spotify built infrastructure so employees can create end-to-end prototypes inside Spotify’s real mobile apps and backend systems. Those prototypes can be shared through an internal app store, where other employees can try them.

That changes who gets to test product ideas.

Before, a product manager, designer, executive, or non-mobile engineer might need to convince an engineering team to spend days or weeks exploring an idea. Now, they can describe the idea in natural language and get a working prototype in an hour or two. Claude Cowork points at a similar delegation pattern for broader workplace tasks.

Gustavsson said the prototype store includes work from everyone up to one of Spotify’s co-CEOs.

That is the part leaders outside engineering should care about. AI coding agents are starting to turn product imagination into something more tactile. Instead of debating a mockup, teams can poke at a rough version with real data and real app flows.

This does not mean every prototype should ship. It means the cost of asking “what would this feel like?” is falling fast.

When the cost of prototypes falls, organizations usually get more weird ideas, more bad ideas, more surprisingly good ideas, and more evidence. The winning companies will need better taste, not fewer experiments.

What other companies should copy from Spotify

The lazy takeaway is: Spotify uses Claude, so your company should use Claude.

The useful takeaway is: Spotify prepared its engineering system for delegation.

Most companies should copy the preparation before they copy the tooling.

Start with tasks that already have strong verification. Library upgrades, API migrations, test generation, documentation updates, small refactors, and internal tools are better starting points than “go build the entire new product line while we all go to lunch.”

Then improve the environment around the agent:

Write down the patterns humans already follow.
Standardize the commands agents need to run.
Strengthen tests before increasing autonomy.
Make ownership clear when automated PRs touch team code.
Track AI-assisted work separately from unassisted work.
Measure cleanup, review burden, and incident rates alongside PR speed.
Give agents access to tools gradually, not all at once.

The security angle also deserves more attention than it gets. Coding agents can run commands, inspect files, and interact with developer environments. That power is useful and risky.

Security researchers have already shown ways coding agents can be tricked through apparently normal repositories and setup instructions. That does not make coding agents unusable. It makes permissions, sandboxing, network controls, and human review part of the product surface.

A coding agent is not a chatbot with a nicer terminal. It is a junior execution layer with access to your production-adjacent universe. Treat it accordingly.

The engineer’s job is becoming orchestration

Gustavsson said his own relationship with coding changed. He used to enjoy the problem-solving part of programming and worried that agents might take away the part he liked.

Instead, he found that the part he liked was solving problems, not necessarily typing every implementation step himself.

That distinction feels small until you watch a senior engineer run five agents in parallel, review diffs, guide architecture, and jump into unfamiliar codebases that previously would have taken days to understand.

The work shifts from writing every line to deciding what should exist, setting constraints, checking results, and improving the system so agents produce better work next time.

That also means engineering orgs have a new management problem. If junior engineers learn by doing implementation work, and agents increasingly do implementation work, companies need new ways to build engineering judgment.

Spotify’s example points to one answer: make the system legible. Strong tests, clear ownership, consistent patterns, and visible prototypes do not only help agents. They help humans understand what “good” looks like.

What to watch next

Spotify’s story is one of the strongest enterprise examples we have because it connects agent adoption to real operating metrics.

Still, the next phase has to prove three things.

First, PR frequency needs to translate into product outcomes. Faster code movement matters when it improves reliability, user experience, revenue, or learning speed.

Second, AI-assisted work needs to age well. The scary version of agent adoption is a company that ships faster for six months, then spends the next year cleaning up the software equivalent of a teenager’s bedroom after finals week.

Third, the benefits need to spread beyond companies with Spotify-level engineering infrastructure. A 20M-line monorepo, disciplined CI, and thousands of engineers create a very different testing ground than a messy mid-market codebase with three critical services and one person named Alex who understands deployment.

The big question now is whether AI agents help companies escape engineering debt, or whether they mostly reward the companies that already paid it down.

Spotify’s answer, at least so far, is pretty clear: agents work when the company around them is ready to delegate.

Timecodes of the top moments

Watch the full Claude interview with Spotify VP of Engineering Niklas Gustavsson here.

(0:10) Boris says he thought Niklas was moving too fast when he predicted engineers would stop using IDEs, then found his own workflow had changed two months later.
(0:57) Niklas explains that he started as a molecular biologist and moved into programming because genome sequencing created early “big data” problems.
(1:43) Spotify began trying to use early LLMs to automate code changes years before the current Claude Code wave.
(2:26) Niklas says the personal coding breakthrough came when the model stopped feeling like autocomplete and became something he could throw real problems at.
(3:14) Niklas describes his current workflow: five to 10 Claude sessions running in tmux, with agents working in the background.
(3:54) Spotify has a few very large monorepos while still operating thousands of smaller polyrepos.
(4:28) Niklas expected agents to struggle inside Spotify’s backend monorepo, which has more than 20M lines of code.
(4:52) Claude worked well in the monorepo because it could look at other code in the repository for “inspiration.”
(5:23) Spotify started automating maintenance because its codebase was growing roughly seven times faster than its engineering headcount.
(5:56) Spotify built “fleet management” so it could mutate code across thousands of repositories instead of asking every team to do migrations manually.
(6:42) Before automation, migrations took months, and Spotify could only do about 10 per year.
(7:14) Niklas says code has an “enormous API surface,” which made deterministic code changes complex very quickly.
(7:50) Some migration scripts grew into thousands of lines of edge-case handling.
(8:07) Early LLM attempts failed partly because Spotify naively tried to put code in front of a model and have it one-shot the change.
(8:34) Spotify improved results by using LLMs as judges and breaking code changes into smaller steps.
(9:14) Honk V2 is really closer to “V8,” because Spotify iterated on it many times internally before naming the release.
(9:28) Honk expanded from scheduled code migrations into a broader tool engineers could invoke from places like Slack.
(10:14) The judge improved PR success from roughly 20% to 30% up to about 80%.
(10:40) Honk now runs the Claude Agent SDK inside a Kubernetes pod with access to internal tools.
(11:10) One of Honk’s most important tools is verification: it can run CI builds.
(11:17) Spotify can run verification on Linux and macOS, which matters for iOS development.
(11:39) Spotify has used Claude with simulators and Figma workflows to automate UI implementation work.
(12:05) Boris frames verification as the single most important part of closed-loop agentic development.
(12:28) Niklas says Spotify had to strengthen test automation as part of its move toward automated code changes.
(13:17) Spotify changed team expectations because owners would no longer manually review every routine change before merge.
(13:41) Those test automation investments now make it easier to throw agents at the same codebase.
(13:52) Boris argues speed and quality are a false trade-off when quality practices are automated well.
(14:31) Niklas says Spotify is keeping quality metrics neutral while significantly improving speed.
(15:06) Spotify makes roughly 4,500 production deployments every day.
(15:30) Spotify’s ideal loop is for a developer to have an idea and get it into production as quickly as possible.
(16:38) Spotify’s engineering org has roughly 2,900 engineers.
(16:54) Niklas says Spotify has seen a 75%+ improvement in PR frequency from AI tooling, and roughly 73% of PRs are AI-assisted.
(17:36) Spotify wants to connect PRs and deployments to work items, A / B tests, rollouts, user value, and revenue.
(18:33) Niklas says the early ROI discussion was easy because the improvements were large, but expectations for measurement precision are rising.
(19:33) Niklas advises engineering leaders to invest in foundational capabilities like test automation and verification.
(19:42) Standardization made humans more productive first, then turned out to help agents too.
(20:12) Niklas says Claude gets more confused when similar code appears in 10 different styles.
(20:34) The “sane engineering practices” from before still apply, even with a new actor in the codebase.
(21:44) Niklas says his own work changed into managing five agents in the background, but the problem-solving part still felt rewarding.
(22:22) Niklas says agents let him solve problems and enter codebases he previously could not have tackled quickly.
(22:58) Boris says implementation time has shifted into thinking about what comes next, talking to customers, and prototyping.
(23:31) Spotify is investing in prototyping for both engineers and non-engineers.
(23:45) Niklas says Claude lets people express an idea in natural language and have it implemented.
(24:18) Spotify built infrastructure for end-to-end prototypes in its mobile apps and backend, plus an internal app store for sharing them.
(24:51) Employees can now create a working prototype in an hour or two and share it with others.
(25:25) Niklas says everyone up to one of Spotify’s co-CEOs has prototypes in the internal app store.
(25:49) The prototype loop lets Spotify test an idea in a day instead of weeks or months.