What Boris Cherny, Greg Brockman, and Andrej Karpathy Think

In the span of a week at Sequoia's AI Ascent 2026, three of the most consequential technical leaders in AI walked onto the same small stage and described, in their own words, the same earthquake. Boris Cherny, the creator of Claude Code, said coding is solved (for him, anyway). Greg Brockman said agentic tools went from writing 20% of your code to writing 80% over the course of December. Andrej Karpathy, the person who coined "vibe coding," said he had never felt more behind as a programmer than he did right now.

If you only watched one of those interviews, you would walk away thinking: software is changing. If you watched all three, you walk away with something more useful, which is a sharper question. Coding is becoming free. So what becomes the bottleneck?

First up, the TL;DR
Where they all agree
Where they differ (and this is the interesting part)
Boris's printing press: the analogy that frames the next decade
Brockman's frame: human attention is the new oil
Karpathy's warning: the ghosts are still jagged
The 7 Powers test: which moats survive
The thousand-agents-overnight picture
Where it goes next
Open questions worth sitting with
In Case You're Interested: All the Actionable Insights, Listed

First up, the TL;DR

Three Sequoia AI Ascent 2026 talks tell the same story, in different accents. Boris Cherny (Anthropic) hasn't written a line of his Claude Code codebase by hand since November and recently shipped 150 PRs in one day from his phone. Greg Brockman (OpenAI) said December was the inflection point where agentic coding tools jumped from writing 20% of your code to 80%. Andrej Karpathy (Eureka Labs) said something blunter: he's never felt more behind as a programmer.

Here's what happened:

All three said agentic coding is now the default mode of writing software, not a sideshow.
Boris frames it as "solved," anchored to the printing press analogy: a once-rare literacy is about to become universal.
Brockman frames it as the moment human attention becomes the scarcest resource in the loop, because the doing is easy; the judging is hard.
Karpathy frames it as a new engineering discipline (agentic engineering) sitting on top of vibe coding, because models are still jagged ghosts that can solve a 100,000-line refactor and then tell you to walk 50 meters to the car wash.

Why this matters: Each speaker pinpoints a different bottleneck that replaces "writing code." Boris says the bottleneck is organizational drag; the lead Anthropic has on the rest of the world is org-redesign, not model access. Brockman says the bottleneck is human attention; deciding whether something is good and aligned with your values. Karpathy says the bottleneck is human understanding; you can outsource your thinking, but never your understanding. Three diagnoses of the same condition. They mostly fit together.

Our take: The skills that mattered last year (typing fast, knowing API quirks, holding architectures in your head) have just been deflated, and the skills that compound now (taste, organizational design, spec-writing, agent orchestration, asking the right question) are exactly the ones most engineering hiring still doesn't test for. Karpathy's proposed interview, which is "build a Twitter clone for agents and make it secure while ten codex agents try to break it," is the future of technical hiring (also just sounds like a sick video game). Most companies are still asking candidates to invert binary trees on a whiteboard.

Where they all agree

The three speakers, who do not coordinate and do not work together, converged on three claims with surprising precision.

Claim one: the floor has risen all the way through the ceiling of what used to be "professional software." Boris hasn't written any of the Claude Code codebase by hand in 2026. He routinely ships dozens of PRs a day; one recent day, 150 PRs. Brockman cites a systems engineer who handed a design doc to a model overnight and woke up to find the optimization fully implemented, instrumented, profiled, and tuned. Karpathy admits he can't remember the last time he corrected a chunk of code from his agent. The numbers vary. The direction is identical.

Claim two: massive parallelization is the actual product, not the model. Boris runs a few hundred agents during the day and a few thousand at night, mostly via "loop": ask Claude to use cron to schedule a recurring job. He has loops babysitting his PRs, loops keeping CI healthy by chasing flaky tests, loops clustering Twitter feedback every 30 minutes. Anthropic also just launched routines, the same idea but server-side. Brockman paints the same picture in different language: "Do you want to be a CEO of an organization of 100,000 agents? That actually seems pretty good." Karpathy describes his hiring exercise as orchestrating 10 codex agents to attack a system. The agent count increases exponentially. The cognitive frame becomes "I'm a director."

Claim three: the infrastructure has to be rebuilt for agents, not for humans. Karpathy's pet peeve is documentation that tells him to go to a URL; he doesn't want to do anything. Give him the text to copy-paste to his agent. Boris notes that most knowledge work tools are already remote-friendly via MCP, which is "the simplest answer." Brockman talks about Chronicle, OpenAI's just-announced tool that watches everything you do on your computer and forms memories so you don't have to keep explaining your context. Three angles on the same insight: humans-in-the-loop infrastructure has to give way to agents-as-first-class-citizens infrastructure.

Where they differ (and this is the interesting part)

If all three agree that coding is functionally solved, they sharply disagree about what becomes the bottleneck.

Boris says the bottleneck is organizational structure. When asked about the gap between Anthropic and the outside world, he's blunt: on the model side, no real gap; everyone uses the same models. The lead is organizational. At Anthropic, "no more manually written code anywhere at the company. All of the SQL is written by models. Everything is just built by the models." More striking: "our Claudes are talking all day" over Slack to other people's Claudes that are also running in loops, figuring out unknowns. The implication: large companies will face years of internal resistance to that kind of redesign. Tiny startups don't have that problem. They can build AI-native from the ground up. Boris predicts the number of disruptive startups in the next decade will increase 10x.

Brockman says the bottleneck is human attention. The doing of things now is easy. The hard part is whether something is a good thing, whether it's what you wanted, whether it's aligned with your values. The escalated-to-your-manager Codex anecdote (his agent waited two minutes for a Slack reply, then escalated to the package owner's manager) is funny, but the framing is serious: "we're still building up the EQ of the model", which means humans are still the moral check. If you have an organization of 100,000 agents, your most precious resource is the seconds of human attention that decide which of their 100,000 actions actually ship.

Karpathy says the bottleneck is human understanding. "You can outsource your thinking, but you can't outsource your understanding." He pairs this with a careful distinction: vibe coding raises the floor for everyone, but agentic engineering preserves the quality bar of professional software. Models remain jagged, peaks and troughs. They will refactor a 100,000-line codebase and then tell you to walk to the 50-meter car wash instead of driving. They are not animals; they are ghosts, summoned statistical entities with no embodied common sense. So the human owns taste, judgment, oversight, and (crucially) understanding what to build and why.

These three bottlenecks are not contradictions. They're complementary diagnoses. Boris's organizational-drag bottleneck explains why incumbents lose to startups. Brockman's attention bottleneck explains why "AI will replace your job" is too crude a take; it replaces the doing, but elevates the judging. Karpathy's understanding bottleneck explains why prompt engineers without taste produce bloated, brittle systems that look superficially fine.

Boris's printing press: the analogy that frames the next decade

Boris's most ambitious move on stage was to reach for a 600-year-old precedent. In the 1400s in Europe, about 10% of the population was literate. They were often professional readers and writers, employed by lords and kings who couldn't read or write themselves. Their job was reading and writing; it was a specialized skill, not a basic one.

Then the printing press arrived. In the 50 years after the first press, more was published in Europe than in the prior thousand years combined. The cost of a book fell roughly 100x. Literacy itself didn't surge immediately (you still need education systems and time off the farm) but over the following centuries, global literacy climbed to about 70%. Today, you don't need a degree in reading and writing to read and write. Professional writers still exist, but they are no longer the only people allowed to put words on paper.

Boris's claim is that software is about to undergo the same transition, but much faster than 50 years. The corollary follows: the best person to write accounting software isn't an engineer. It's a really good accountant who knows the domain. Coding is the easy part. Knowing what to build is the hard part.

If that turns out to be right, two things follow. First, the population of people building software grows by an order of magnitude. Second, the value of being a "really good accountant" (or doctor, or lawyer, or marketer, or musician) goes up, because domain knowledge is now the rate-limiter on shipping useful software in that domain. The 10x developer becomes the 10x domain expert who can also direct agents.

Brockman's frame: human attention is the new oil

Where Boris reaches for medieval Europe, Brockman reaches for resource economics. "Human attention is going to be this incredibly scarce resource." Today, attention is split across explaining, doing, judging, and sharing. Tools like Chronicle (which watches your screen and forms memories) collapse the explaining cost toward zero. Codex collapses the doing cost. What's left is judging.

Two pieces of evidence support the frame. First, the bottleneck Brockman identifies inside OpenAI itself has shifted from building to sharing. Internal dashboards that used to take a week now take an afternoon. The new question is how anyone in a large enterprise builds a thing and gets it into the right colleagues' workflows with the right governance. Second, the math example: individual humans on the internet using GPT-5.4 Pro are now solving unsolved math problems that historically required a math team. The bottleneck was never the computation. It was who got to point the computation at which question.

If Brockman is right, the most undervalued skill in 2026 is the ability to direct attention. Curating what your agents work on, deciding what's worth shipping, designing the moments where humans should slow down and check (and, equally, the moments where they should auto-approve). The product of the future, in this telling, isn't ChatGPT-with-more-features. It's something closer to a true digital second self, with all your context, that handles the doing while you handle the deciding.

Karpathy's warning: the ghosts are still jagged

Karpathy's animals-vs-ghosts framing reads as philosophical, but his examples are practical. Strawberry-letter counts (now patched). The car-wash question (still broken). The Menu Gen edge case where his agent tried to cross-correlate Google sign-in emails with Stripe checkout emails because that "felt right," instead of building a stable user ID. His Micro GPT project, where models simply could not simplify the LLM training code because there's no aesthetics reward in their training mix.

The point is that "coding is solved" hides a lot of jaggedness. State-of-the-art models can absolutely refactor large codebases and find zero-day vulnerabilities. They can also make a basic-walking-vs-driving call wrong. The skill that compounds is knowing which circuits you're in. If you're inside the well-trained territory (math, code, common business workflows), you're in light speed. If you're outside it, you're pulling teeth.

This is why Karpathy thinks the 10x engineer framing badly understates the spread. The best agentic engineers peak much higher than 10x. Not because they type faster but because they know which problems are in distribution, which to attack head-on, which to wrap in fine-tuning, and which to specify carefully through detailed specs and docs. The skill is judgment about the model's shape, not the model itself.

The 7 Powers test: which moats survive

The most useful concrete framework Boris dropped is borrowed from the Acquired podcast (and originally Hamilton Helmer's 7 Powers book). Map every business moat through the AI lens, and ask which still hold.

Switching costs get weaker, because models can port between products. If your moat is "we lock our customers into our quirky data format," you should worry. Process power gets weaker too, because Claude 4.7 can hill-climb any process given a target. If your moat is "we know how to do Workflow X better than anyone else," the AI can probably learn that workflow from a description.

Network effects, scale economies, and cornered resources still matter. AI doesn't dilute the value of being the place where buyers meet sellers, the place with the cheapest unit economics, or the company that owns the only supply of a constrained input.

For founders, the implication is clear. Don't stake your defensibility on "we have a smarter workflow" or "our customers are too entrenched to switch." Stake it on something AI can't conjure for free. A community of users that pulls in more users. Capex that's hard to duplicate. A scarce supplier you locked up.

The thousand-agents-overnight picture

The single most concrete vision of the near future came from Boris, almost in passing. He runs a few hundred agents by day and a few thousand by night. They work in loops, scheduled by cron, restarting themselves and reporting back over Slack. Some are babysitting his PRs. Some are keeping CI healthy. Some are ingesting Twitter feedback and clustering it. At Anthropic, Claudes talk to other Claudes over Slack to resolve unknowns.

If you stop and picture that for a second, the implication is unsettling. The bottleneck on getting work done isn't typing speed, or even what model you have access to. It's whether you've designed a system in which agents can usefully run while you sleep, and whether you've built the human-attention layer that wakes you up only when something matters. That's the thing Boris means when he says the lead Anthropic has on the world is organizational, not technological.

Brockman's vision lines up almost exactly. The "CEO of 100,000 agents" image is the same picture, scaled. Karpathy's spec-and-oversight framing is the human side of the same picture: the CEO writes specs, the agents execute, the CEO checks the diff.

Where it goes next

What should a builder actually do with all this?

If you're a founder, take Boris's framework seriously. The next decade probably does belong to startups, but only the ones that build AI-native from day one and pick a moat that survives the audit. (Network effects, scale, cornered resources; not switching costs and process). Pick a domain where you can become the "really good accountant" Boris describes: deep expertise plus AI fluency.

If you're an engineering leader, take Brockman's attention frame seriously. Your team's prototype velocity has 10x'd. Your sharing, governance, and judgment infrastructure mostly hasn't. Invest there. Invest in the organizational design changes that let small teams operate flat and ship fast, because the org chart is now the actual product. And take Karpathy's hiring redesign seriously: stop testing for the skills that just got deflated and start testing for agentic engineering capability at scale.

If you're an individual builder, take Karpathy's understanding line as a North Star. "You can outsource your thinking, but you can't outsource your understanding." The thing that's compounding right now is the depth of your model of whatever domain you care about. The agents will write the code. They won't tell you what to build, why it matters, or whether the result is any good.

Open questions worth sitting with

Three questions came out of these talks that nobody on stage answered, and that nobody else seems to be answering yet either.

How does Boris's claim that "coding is solved" reconcile with Karpathy's claim that "everything has to be rewritten" for agents? The most honest reading is that "solved" applies to greenfield, on-distribution stacks (TypeScript and React in Boris's case) and breaks down in the messy middle of legacy systems with custom languages and non-agent-native tooling. There's a legitimate decade of work just porting infrastructure to be agent-readable.

What's the half-life of the human-attention bottleneck Brockman names? Models clearly improve at flagging high-risk actions. Will the human's job be to set policies (which actions auto-approve, which escalate)? Will those policies eventually get written by AIs too? At what point does the loop close itself?

And where does verifiability stop being the lever? Karpathy says "everything is automatable" given a council of LLM judges. But there are domains (taste, ethics, original aesthetic vision) where every judge is itself a downstream artifact of the same training data. The risk isn't that AI fails to automate those domains. The risk is that it succeeds too well and homogenizes them.

Three of AI's most influential builders just told a roomful of founders that the technology has shifted faster than most people are processing. The work now is figuring out which of the new bottlenecks (organization, attention, understanding) you're going to invest your scarce human time into.

In Case You're Interested: All the Actionable Insights, Listed

Below are all of the top moments from each talk, with time-codes linked so you can jump straight to a specific quote that most interests you to watch the full conversation.

Boris Cherny (Anthropic) — Why Coding Is Solved, and What Comes Next

(0:55) Live audience poll on how people use Claude Code: majority CLI, with sizable groups on desktop and IDEs (VS Code / JetBrains); one person volunteered "iOS mostly these days," foreshadowing Boris's own phone-first workflow.
(2:39) Origin story: Boris started Claude Code "kind of accidentally" inside Anthropic Labs in late 2024, a small incubator team that also built MCP and the desktop app. The team disbanded after shipping, but is now reunited under Mike Krieger (Instagram co-founder, now Anthropic's CPO) for "round two."
(3:15) Coined the operating concept driving the bet: "product overhang." The model can already do things no product has captured yet, so you build for the next model, not the current one.
(3:35) Late 2024 state of the art was typeahead: open your IDE, hit tab, autocomplete one line. Sonnet 3.5 enabled that. Boris bet the next leap was an agent that writes all the code, not one line at a time.
(4:00) Brutally honest origin admission: "It just really didn't work for the first six months." He used the early Claude Code for ~10% of his own code. Even after release, it was not a hit.
(4:17) The exponential growth of Claude Code began with Opus 4 in May, then inflected with each successive release: 4 → 4.5 → 4.6 → 4.7. Each model bumped adoption visibly.
(4:34) Strategy lesson: "We were trying to build this thing that was pre-PMF, and we knew it wouldn't have PMF for six months because we were building for the next model." Build for the future model.
(5:07) The headline claim: coding, for Boris personally, is solved.
(5:27) Live poll on the audience: very few write 100% by hand or 100% with an agent; most are "somewhere in between." Boris half-jokes the room is "50% solved."
(5:41) Confirmed via the leaked codebase: Claude Code is "just TypeScript and React. There's no big secret. There's nothing really complicated."
(5:48) Why TypeScript and React: those languages were "very on distribution for the model." Picking on-distribution stacks let the model write more of the code earlier.
(6:14) The exact transition point: "Sometime in October, November last year" the model started writing 100% of his code.
(6:22) Throughput stat that should make engineering leaders pause: Boris ships "a few dozen PRs every day" and recently hit 150 PRs in a single day, "just trying to push to see how far I can get it."
(6:33) Caveat: it's not solved everywhere. Big complicated codebases, "weird languages the model's not good at yet." His answer: "wait for the next model."
(6:50) Personal workflow setup. Boris first tweeted his setup six months ago and was surprised it was surprising; it has since changed.
(7:09) Most of his work now happens on his phone via the Claude app's code tab.
(7:34) Concrete agent counts: typically 5–10 sessions running, "a few hundred agents" going at any time, and "every night I have like a few thousand that are doing kind of deeper work."
(7:47) Two patterns to manage scale: sub-agents within a session, and "loop": ask Claude to use cron to schedule a recurring job. Loop is becoming his preferred pattern.
(8:03) Specific loops he runs in production: one babysits his PRs (auto-rebasing, fixing CI), one keeps CI healthy by chasing flaky tests, one clusters Twitter feedback every 30 minutes.
(8:35) "Loops are the future at this point." Anthropic also just launched routines, the same idea but server-hosted so it survives closing your laptop.
(8:51) Future-of-teams prediction: more generalists, not fewer.
(9:35) The new generalist is cross-disciplinary: engineers who are also great at design, data science, or product. He's seeing this on his own team.
(9:48) On the Claude Code team, "every single person on our team writes code." That includes the engineering manager, PM, designers, data scientists, finance person, and user researcher.
(10:26) The SaaS apocalypse question. Boris says two things will happen, and neither is the version most people are debating.
(11:01) Borrows the framework from Hamilton Helmer's 7 Powers (popularized by the Acquired podcast) to map which moats survive the AI shift.
(11:20) Powers that get LESS important: switching costs (the model can port between products) and process power (Claude 4.7 can hill-climb any process given a target).
(11:53) "I think this is the first model" that can iterate to a target until done. That property is what kills "process" as a moat.
(12:07) Powers that still matter: network effects, scale economies, cornered resources. Those AI doesn't really change.
(12:13) Forecast: the number of disruptive startups in the next decade will increase ~10x.
(12:29) Why startups win this round: a tiny team can build something as valuable as a large company because the incumbent has to evolve org structure, retrain people, and absorb internal resistance.
(12:44) "It's the best time to build. It's the best time to be a startup. There's so much disruption coming."
(13:25) On model vs. product attribution to Claude Code's success: roughly 50/50 a year ago.
(13:53) YC lesson he keeps quoting: "Build something people love." Product detail is what makes daily use feel great, even when the model carries most of the work.
(14:14) As models get better, the harness gets less important. Where to invest now: making loops first-class, easier sub-agents, easier multi-agent orchestration.
(14:33) Forecast: in a year, "all the safety mechanisms" (prompt-injection defenses, static command verification, permission modes, human-in-the-loop) become "less important" because the model just does the right thing.
(15:01) Audience question on democratizing software: shop owners writing their own software, hobbyists programming microcontrollers.
(15:31) Boris's response: building software will be a basic skill like sending a text message. Not a profession; an everyday literacy.
(15:49) The printing-press analogy. Pre-1400s Europe was ~10% literate. After the press: in 50 years, more was published than in the prior 1,000 years; the cost of a book fell ~100x; literacy eventually reached ~70%.
(16:49) The accountant corollary: the best person to write accounting software is "a really good accountant," because coding is the easy part. Knowing the domain is the hard part.
(17:22) Audience asks how big the engineering gap is between Anthropic and the outside world.
(17:44) On the model side, no real gap. Internally they use the same models everyone else does, including a little of an internal "mythos" model and lots of Opus 4.7.
(18:14) "We use Claude for literally everything... we have no more manually written code anywhere at the company. All of the SQL is written by models. Everything is just built by the models."
(18:22) The thousand-agents-talking-to-agents picture. "As I'm coding, as my Claudes are coding in a loop, they will communicate over Slack to talk to other people's Claudes that are also running in a loop to figure out unknowns."
(18:42) The real lead Anthropic has on the rest of the world isn't model access; it's organizational structure and process redesign around AI.
(19:50) On multi-agent orchestration: at the product level, "it really just comes down to prompting." Tweak prompts to push the model toward parallelization.
(20:08) 4.7 sometimes proposes its own loop: it'll say "I notice the data is changing over time. I'll start a loop and give you a report every 30 minutes." User asks for Slack delivery, model uses the Slack MCP.
(21:23) On local vs. cloud models: "doesn't matter," because the model itself will pick. "These will not be decisions that we are making as engineers anymore."
(22:22) On co-work / non-developer tools: most knowledge work is already remote (Salesforce, Docs). MCP is the simplest answer for hooking it all together.
(23:09) Anthropic is "pretty far ahead on computer use." Slow, but quite good with 4.7 (especially through co-work).
(23:28) Final framing: MCP, CLIs, APIs, computer use, all interchangeable. "To the model it's just tokens."
(23:45) Closing question on what gets more interesting as models improve.
(24:03) His answer: Cloud Design today, plus things they're cooking for Claude Code "over the coming weeks," loop and batch (massively parallel agents), and computer use.

Greg Brockman (OpenAI) — Why Human Attention Is the New Bottleneck

(0:31) ChatGPT now has "almost a billion or maybe more than a billion" weekly active users; Stripe reportedly processes 1.6% of global GDP, contextualizing Brockman's track record.
(0:49) Compute hunger.
(1:05) "We have a very simple business: we buy, rent, build compute, and we resell it at a margin." As long as the margin is positive, scale it; demand for intelligence is unlimited.
(1:32) Asked whether OpenAI has enough compute: "No. Definitely not."
(1:36) Cites Matt Garman's claim that GPU compute availability in 2026 "rounds to zero," then says even OpenAI doesn't have all of it.
(1:51) ChatGPT-launch anecdote: when team asked "how much compute should we buy?" Brockman said "all of it." Scaling demand has outpaced their procurement ever since.
(2:13) Scaling laws.
(2:40) "The scaling laws are a deep and very beautiful mystery." They feel as fundamental as physics, but remain empirical: no full theory yet.
(3:00) Neural networks were designed in the 1940s, before computers. Pour more compute into them, capability scales, "and that's a beautiful thing."
(3:15) "There's no wall." Scaling continues to deliver.
(3:31) New architectures ahead.
(3:55) Constant innovation behind the scenes: micro-tweaks (data formatting matters more than people think) and macro shifts (LSTM → transformer).
(4:23) "Everyone's moved past the transformer as described in the 2018 paper."
(4:27) Brockman's claim about the long-research lead: "OpenAI has been leading the pack" on architecture and paradigm shifts.
(4:42) How close to AGI.
(4:58) The 80% number: "I think we're about 80% of the way there" to AGI, by his personal definition. (OpenAI has a formal definition; everyone has their own intuition.)
(5:18) On whether models are smarter than him: "they're certainly more capable than I am at writing software, right? If you give it all the context."
(5:38) Audience anecdote: very few people in the room feel they're better at writing software than GPT-5.4. Even kernel writing is seeing massive gains internally.
(5:59) Concrete story: a systems engineer wrote a design doc for a complex optimization, handed it to a model, went to sleep. He woke up to find the model had implemented the spec, noticed it was slow, added instrumentation, ran a profiler, and iterated multiple times to a fully optimized result.
(6:46) Startup playbook for AI.
(7:32) The 20% to 80% inflection. Over the course of December, agentic coding tools went from writing 20% of your code to writing 80%, "which means they go from being kind of a sideshow to being the main thing that you're doing."
(7:53) Codex is shifting from a tool for software engineers to a tool for anyone working on a computer. "All computer work this year."
(8:07) Chronicle, just announced: a tool that plugs into Codex, sees everything you're doing on your computer, and forms memories of what's going on. "It instantly knows what you're talking about."
(8:25) The Chronicle insight: most of your effort today is "explaining to your computer what's going on. Why are you explaining to your computer what's going on? That makes no sense."
(8:48) The "one-time shift" happening now is about CONTEXT. Has your AI been included in your meetings? If not, "that's not very nice to the AI" since you're asking it to help with no information.
(9:24) Inside OpenAI with Codex.
(9:54) Internal guideline: "we still want a human to be accountable for all code that gets merged."
(10:22) OpenAI is going vertical-by-vertical: finance, sales, IT each get a small dedicated team that customizes Codex skills and UI for the domain, then externalizes what works.
(10:50) Customers can opt into being early co-designers ("very AI forward and want to be part of defining this revolution").
(11:11) Teams and governance shift.
(11:43) "The cost of building a prototype is cheap now." Internal dashboard that used to take a week now takes "just do it now."
(11:49) The new bottleneck has shifted from building to sharing. Question: how does anyone in an enterprise easily build a dashboard, widget, or bot and share it with others?
(12:13) Governance follow-on: data provenance becomes critical. Derived artifacts (like wikis built from internal docs) need invalidation tracking when source permissions change.
(13:17) On team size and humans-in-software-engineering in 10 years: "a decade is a long time," and the ceiling on this technology is "really hard to internalize."
(13:35) "We're going to have this ability for solopreneurs to build very incredible businesses." Anyone with a vision can realize it.
(13:55) Org structure prediction: "Maybe you can be much more flat small teams."
(14:16) Math example: individuals on the internet using GPT-5.4 Pro to solve unsolved math problems. "Normally you need a math team, and they're just doing it."
(14:34) The AlphaGo move-37 analogy: a single AI move changed humanity's understanding of Go and made the game more interesting for humans. Brockman thinks the same thing happens to other domains.
(14:52) Security and responsible deployment.
(15:22) The "escalated to your manager" anecdote. Brockman asked Codex to install a package; it hit an error; he told it to ping the package owner on Slack. Two minutes later: "This is taking too long. I've escalated to the person's manager." It actually pinged the person's manager.
(15:55) That story is "kind of reasonable" (proactive) and also "maybe should have taken a little bit longer. Maybe should have checked with me."
(16:08) "We're still building up the EQ of the model." That phrase reframes what alignment work feels like in 2026.
(16:22) The headline thesis of the talk: "Human attention is going to be this incredibly scarce resource."
(16:34) "The doing of things now is easy. The 'is this a good thing? Is this what I wanted? Is this aligned with my values, with my desires?', that is going to become the single most important bottleneck."
(17:30) Models can scan your codebase and run end-to-end red teams. Defenders should lean into AI, not avoid it.
(17:42) Trusted access programs leverage the community of defenders who care about internet security; OpenAI is investing here.
(18:03) "These models are very powerful, but they're not magic." They're part of the resilience ecosystem, not a substitute for it.
(19:05) On accelerating change: "I think it's just been the trend of technology for the past two decades." More people doing things; lower barriers to entry; bigger upside for value-builders.
(19:30) How to keep up: "play with the technology yourself."
(19:36) "The whole point is that rather than have the machine be something you have to contort yourself to, the machine contorts itself to you."
(20:38) OpenAI expanded its trusted access for cyber program last week. He encouraged the room to apply (only one or two had).
(22:26) On strategy: the word "focus" has been applied to OpenAI "quite a lot recently, possibly for the first time."
(22:43) Strategic frame: OpenAI is going through "this agentic transition" and bets on enterprise + the slice of consumer about goals, not just productivity.
(23:31) The product vision: "an AGI that you can talk to that has all this context that you can use in your personal life, your work life, that's trustworthy." Advice on health, finances, careers.
(24:18) Interface forecast: today's "type behind a box" setup is "very unnatural." Future interfaces look completely different.
(25:07) The 100,000 agents image. "Do you want to be a CEO of an organization of 100,000 agents? Like that actually seems pretty good."
(25:33) Science frontiers.
(26:32) The physics result: an OpenAI model produced a "very beautiful formula" on a physics problem researchers had thought "totally impossible," potentially a step toward quantum gravity.
(27:00) Calibration: not there yet, but the step is "much bigger than where we were just a couple of months ago." Where will we be in a year?
(27:30) Why software engineering is teaching them to handle messy reality: real-world codebases, humans interrupting, adversarial inputs. Lessons port to biology and lab science.
(27:37) Forecast: a "real renaissance" in science is coming. "Next year is going to be a totally wild time."
(27:46) Brockman closes by saying "not as much time" for movies and hikes "as we'll hopefully have post-AGI."

Andrej Karpathy (Eureka Labs) — From Vibe Coding to Agentic Engineering

(0:32) Karpathy's startling self-description: "I've never felt more behind as a programmer." From the person who coined "vibe coding."
(0:44) Feeling behind as a coder.
(1:02) For most of last year, agentic tools were good "at chunks of code" but mistakes required edits. He tracks the inflection precisely.
(1:18) The December inflection. Suddenly the chunks "just came out fine. And then I kept asking for more, and it just came out fine."
(1:42) "Things have changed fundamentally" as of December for the agentic, coherent workflow. Many people experienced AI last year as a ChatGPT-adjacent thing; he urges them to look again.
(2:28) Software 3.0 explained.
(2:53) Three software paradigms: 1.0 = writing code; 2.0 = arranging data + neural net architectures (Karpathy's earlier term); 3.0 = LLMs as a programmable computer.
(3:21) The shift in programming: "Your programming now turns to prompting, and what's in the context window is your lever over the interpreter that is the LLM."
(3:44) Agents as the installer.
(4:22) Open-source install example: shell scripts balloon into complexity to support every platform. Software 3.0 alternative: a paragraph of text you copy-paste to your agent, and it figures out your environment.
(4:49) Menu Gen story. Karpathy built a Vercel app that takes a photo of a restaurant menu, OCRs it, and uses an image generator to render every dish.
(5:39) The blow-his-mind moment: the Software 3.0 version is just a Gemini prompt with the photo plus "use Nanobanana to overlay the things onto the menu." The model returns the menu image with dishes drawn into the pixels.
(6:02) "All of my Menu Gen is spurious. That app shouldn't exist." The harness disappears when the model becomes the system.
(6:24) Reframe: don't think of AI as a speedup of existing things. Think about new things now possible. Example: LLM-built knowledge bases recompile your documents into a wiki, "something that couldn't exist before."
(7:37) What's obvious by 2026.
(8:23) Speculative future: completely neural computers. Feed raw video and audio in, diffusion renders a unique UI for that moment.
(8:40) Historical analogy: in the 50s and 60s, computers could've gone the calculator OR the neural-net path; we picked calculators.
(8:53) The flip: the neural net becomes "the host process," and CPUs become the "co-processor."
(9:14) Tool use becomes a "historical appendage" for tasks that need deterministic guarantees.
(9:41) Verifiability and jagged skills.
(10:02) The rule of thumb: traditional computers automate what you can specify in code; LLMs automate what you can verify.
(10:19) Frontier labs train models in giant RL environments with verification rewards. The result: jagged peaks at math and code, soft spots elsewhere.
(11:18) The car-wash example. "I want to go to a car wash to wash my car, and it's 50 meters away. Should I drive or should I walk?" State-of-the-art models tell you to walk because it's so close.
(11:42) "How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000 line codebase or find zero-day vulnerabilities and yet tells me to walk to this car wash? This is insane."
(12:00) The takeaway: jagged means stay in the loop. Treat models as tools, not employees.
(12:25) Chess anecdote: GPT-3.5 → GPT-4 chess improved a lot, not from general capability gains but because someone added a huge amount of chess data to pre-training.
(13:00) Founders are "at the mercy" of what labs choose to put into the data mix. There's no manual; you have to probe to find which circuits you're inside.
(13:39) Founder advice and automation.
(14:14) The lever: if you're in a verifiable domain, you can build RL environments and fine-tune even when labs don't focus on your problem.
(15:14) Even "unverifiable" things like writing can be wrapped in a council of LLM judges to manufacture verifiability.
(15:36) "Everything is automatable."
(15:46) From vibe coding to agent engineering.
(15:57) The clean distinction. Vibe coding = raises the floor for everyone in software. Agentic engineering = preserves the quality bar of professional software (no introduced vulnerabilities) while going faster.
(16:30) Agents are "spiky entities. They're a bit fallible, a little bit stochastic, but they're extremely powerful."
(16:53) Speedup ceiling: "10x is not the speedup you gain." People who are very good at agentic engineering peak well above 10x.
(17:50) The mediocre vs. AI-native coder difference: investing in your setup, utilizing every feature, "getting the most out of the tools."
(18:23) Most companies haven't refactored their hiring process for agentic engineering capability.
(18:48) Karpathy's hiring redesign: give a candidate a big project. Build a Twitter clone for agents. Make it secure. Then run 10 codex agents at it trying to break it. Watch how they coordinate.
(19:34) What stays human: aesthetics, judgment, taste, oversight.
(19:54) Menu Gen weirdness anecdote: users sign up via Google, pay via Stripe; Karpathy's agent tried to cross-correlate accounts by email instead of using stable user IDs. "Why would you use email addresses to try to cross-correlate the funds?"
(20:43) The new interface for humans: detailed specs and docs co-designed with the agent. Plan mode is fine, but the deeper pattern is spec-first.
(21:05) On API minutiae: he forgot keep_dim vs. keep_dims, axis vs. dim, reshape vs. permute. "This is the kind of details that are handled by the intern" because intern recall is excellent.
(21:31) But you still need to know fundamentals (underlying tensor vs. view) so the agent doesn't copy memory unnecessarily.
(22:13) Will taste matter less as models improve? Hopefully, but right now there's "no aesthetics cost or reward" in RL training, so models bloat code with copy-paste and brittle abstractions.
(22:55) Micro GPT case study. Trying to simplify LLM training, Karpathy kept prompting models to "simplify more, simplify more." They couldn't. "It feels like you're outside of the RL circuits. It's not light speed."
(23:30) Animals vs. ghosts. We're not building animals. We're summoning ghosts: jagged, statistical, simulation entities shaped by pre-training plus RL bolt-ons.
(24:34) "If you yell at them, they're not going to work better or worse." Mindset matters more than the framing.
(25:16) Agents everywhere.
(25:38) "Everything has to be rewritten." Today's infrastructure is still written for humans.
(25:48) His pet peeve: docs that tell you to "go to this URL." He doesn't want to do anything. Give him the text to copy-paste to his agent.
(26:14) The agent-native infrastructure mental model: sensors and actuators over the world, with data structures legible to LLMs.
(26:38) Menu Gen blog post lesson: writing the code wasn't the hard part. Deploying on Vercel, configuring DNS, navigating service settings was.
(27:09) The agent-native test: prompt an LLM "build Menu Gen" and it deploys it to the internet without human touch.
(27:14) Agent-to-agent representation forecast: "I'll have my agent talk to your agent to figure out some of the details of our meetings."
(27:43) Closing question: what's worth learning when intelligence gets cheap?
(28:05) The line he keeps coming back to: "You can outsource your thinking, but you can't outsource your understanding."
(28:25) The bottleneck Karpathy feels personally: he still has to know what to build, why it's worth doing, and how to direct his agents.
(28:55) Why he loves LLM knowledge bases: synthetic data generation over fixed data gives him "different projections" on information, and "anytime I see a different projection onto information, I always feel like I gain insight."
(29:13) "Tools to enhance understanding" are the most exciting product space. LLMs don't excel at understanding, so the human still owns it.

😻 When Three of AI's Top Builders Tell You Coding Is Solved, Pay Attention to What They Mean