How Top Vibe Coders Are Not YOLO
The future of sw work is unclear and unevenly distributed — What we can learn from the top "Vibe Coders" who are actually zero vibe and not Yolo at all
If you are currently trying to figure out what is the next change or what you or your team or even your company could adopt, trying to understand the direction from the vantage point of the extreme, that already exists, might help.
Disclaimer: We probably don’t really know what they are really doing over at Claude. I tried to compile as much as I could and give a honest summary of what I found.
“Vibe coding” started as a name for AI-assisted prototyping. It has since become a label for everything - including the careful, structured work of the engineers who are actually pushing the frontier. The resulting confusion creates a ton of division and is worth clearing up. It creates an idea that there is production code without clear architecture, guardrails and constraints. The idea that code is only reviewed on the feature, not line level (what Shapiro describes as the psychological hurdle between level 2 and 3 - out of 5 - on his “vibe coding scale”) is fear inducing. Studying what Cherny (creator of Claude Code) and others are doing is telling a bit of the story of how their work is actually zero Yolo, but highly disciplined, structured and responsible. Also, it gives a bit of a peek into what future teams look like and how they align.
Andrej Karpathy coined the term “vibe coding” in early 2025. The idea: fully surrender to the AI. Stop reading diffs. Just describe what you want and see what comes out.
It caught on fast because it matched a “vibe” that was in the air, something polarizing, because it looked magical at the time. It was democratizing and at the same time, without guardrails, totally irresponsible. Tools were Lovable, Bolt, v0. Build a landing page in 20 minutes. Ship a prototype in an afternoon that would’ve taken a week before. That version of vibe coding is out there, still and has useful, limited applications. But then a lot happened and the label remained stuck to everything.
Karpathy moved on, and especially the people around Cherny at Claude moved on, and in his own way, also Kent Beck (see some quoted from him below). moved on
Until now that creates some fog over the debate, as any serious AI-assisted development got called vibe coding. Which is a problem. Because what Karpathy originally described - and the workflows the people at the actual frontier developed since then and execute everyday - are two different things entirely. Karpathy moved on, the team creating Claude Code moved on a lot, heck: Kent Beck moved on. Let’s see where they moved to.
What Boris Cherny Actually Does
Short version: research first, plan second, annotate until it’s right, then, and only then, implement. Humans check the thinking before any code runs. And then check again when the feature is built. If it works: fine. If not: next iteration. Good code is taken for granted and not being checked as the production pipeline is optimized through guardrails to ensure that.
Boris Cherny built Claude Code. He uses it like this.
Before any code gets written, he writes a research doc (you could call this PM work, he writes the spec for the feature). Claude reads the relevant parts of the codebase, documents what it understood. Cherny reviews it. As long as the understanding is wrong, no code is produced. Garbage in, garbage out - that doesn’t change when the code is written by an AI.
Next: a plan, ending in plan.md. It shows file paths, trade-offs, code sketches, which Cherny annotates directly, corrects constraints, kills what’s out of scope. Claude then updates the plan. They go back and forth until it’s right.
And only then does he say: “implement it all.” So: until here: plan mode and then the famous one-shot based on the iterations until a good spec was reached.
If this way of working interests you and what it means for PMs and strategy workers, my upcoming course might be for you:
The explicit instruction during planning is “don’t implement yet.” That’s one important guardrail. The agent can’t lock into wrong assumptions before a human has checked the thinking. The switch from “don’t implement yet” to “implement it all” is 0-1, but to get there is highly iterative. The one shot is the result of those careful iterations. Then he looks at the result and like a designer or a tailor, he iterates again to say “a bit more of this, a little bit less of that”.
During implementation, continuous typechecks and tons of guardrails are running. Tests are written first, based on the specs. Short, operational corrections. No architecture discussions at this point. If it goes the wrong direction, he doesn’t debug it to death. He reverts, in the worst case, he throws it in the trash. Re-scopes. Hard reset. Move on.
At any given moment he has 5 to 10 agents running in parallel. some of them in Claude Code, some in the browser, some on his phone. Balancing the work on a git worktree. Some tabs just get killed if the direction was wrong. That’s the default of Cherny and his team. And (yes, currently just) a couple of other teams. 90-100% of their code is written like this. Pull requests on the rise as well.
The Misunderstanding of the old Agile community: “But we proved spec driven coding does not work”
It should be obvious, but the word “spec” alone drives instincts - hundreds of pages of PRDs, waterfall, defining the outcome of the next 9 months: But here, the “spec” is for one iteration. Maybe an hour. Not eternity, not a sprint. Not even a day.
If it doesn’t work, depending on the outcome, you throw it away (totally wrong direction) or iterate on it. Cherny has multiple simultaneous attempts going precisely because he expects some of them to fail. The cost of a wrong direction is a closed tab, not a failed sprint.
This is not the same spec-driven design as “normal” teams practice it. The spec isn’t a contract. It’s a hypothesis. You test it. If it’s wrong, write a new one. Same word, different semantics.
What survives iteration isn’t the spec, but the judgment about what to spec in the first place.
Vibe coding as a word is already done
When Karpathy described vibe coding, he was describing exploration. Build something to see if it’s even worth building. Don’t worry about production. Don’t worry about edge cases. Just explore.
That was ok and it still is. But it’s not what happens when you ship real software.
What Cherny, Karpathy himself in production contexts, and, a bit with more hesitation or just “searching”, Kent Beck are working toward is something different: code generated at scale, with humans making judgment calls at the feature level.
Line by Line is gone and taken care of by “the machine”. Not even function by function. But: does this feature do what it should? Is the plan right? Is the architecture sound? Is this the right thing to build at all?
Those remain human decisions. What they say is that “taste” is more important than ever. Probably it’s more than that, and calling it taste is a sign of humility. It’s a well trained skill.
The implementation is increasingly mechanical. The judgment isn’t. So, there’s vibe coding and there is context engineering or whatever the word of the week may be. Different things.
The Five Levels of Vibe Coding
This is Dan Shapiro’s framework for where people currenrly stand with AI. Five distinct stages, each one requiring the human to step further back from the code:
Level 0 - Spicy Autocomplete. AI finishes your lines. You’re still writing the software. Fewer keystrokes, same job.
Level 1 - Coding Intern. You hand the AI a scoped task: write this function, refactor this module. You review everything that comes back. AI does tasks. Human does architecture.
Level 2 - Junior Developer. AI handles multi-file changes and navigates the full codebase. Human still reads all the code. Shapiro says 90% of self-described “AI native” developers are here, thinking they’re further along.
Level 3 - Developer as Manager. The relationship flips. You direct, the AI implements. You review at feature/PR level - not line by line. Most developers get stuck out here, because the psychological blocker is huge: let go of the code.
Level 4 - Developer as Product Manager. You write a spec. You leave. You come back hours later and check if the tests pass. You’re not reading the code anymore. You’re evaluating outcomes. Almost nobody writes specs good enough for this yet.
Level 5 - The Dark Factory. Specs go in. Working software comes out. No human writes code. No human reviews code. The factory runs lights-out. Barely anyone on the planet operates here.
L2 → L3 is the crucial and hard step. Level 5 probably Nirvana, but an Utopia, the place most of us don’t reach very soon.
What do teams look like now?
Claude Code started as a one-person project (Cherny) plus Sid Bidasaria. Now it’s ~10 engineers (judging on the output and outcome, they’re quite a lot more already this week?). Cowork - Claude Code for more general knowledge work - is also a ~10 people team (or was, a couple of weeks ago, when it was built in a couple of days).
There are roles. PM, design, data, engineering. But the boundaries are dissolving. The APIs between roles become less clear. Everyone goes more up- and downstream at the same time. Everyone codes. The PM writes code. The designer writes code. Half the sales team at Anthropic uses Claude Code weekly. The whole org dogfoods. They all improve the production chain.
Formal sprints don’t seem to exist. But what replaces the coordination overhead?
A partial answer lies in shared artifacts. CLAUDE.md files in the repo, team conventions, common mistakes, guidelines. Slash commands that embed shell steps and pre-approved permissions so agents can move fast without constant back-and-forth. Learnings from PRs fed back into those shared docs. Architectural decisions, “best practices”, learnings get “coded” into the machine and the shared infrastructure right away. Strategy and decisions are shared in .md files.
What used to be a recurring debate about shared practices, has become shared infrastructure.
Code review: Claude does the first pass, including security review. Humans approve. Every PR, same flow. ~95% of Claude Code’s own codebase was written by Claude Code. PR throughput reportedly went up ~67% as the team doubled.
Take those numbers with appropriate skepticism, they are self-reported and hard to benchmark. But the direction is clear.
There’s no clear PM-to-Dev ratio. No role in its pure form without some coding. The team isn’t organized around handoffs anymore.
“Heard the story of a programmer who didn’t want to use a genie because they would lose their programming skills. My dude, someone handed you a chainsaw and you are saying nah I’ll just keep using this cross cut saw because I don’t want to get weak” - Kent Beck
What’s the shift?
What looks like vibe coding from the outside is a carefully constructed pipeline, a finely honed machine.
Research artifact. Plan artifact. Annotation loop. Mechanical implementation. Continuous verification. Revert if wrong.
The trick is that the visible part looks so casual, probably for the sake of marketing: “here’s what I want.” “A third of my work is happening on my mobile phone in the mobile interface of claude code”. Sounds close to not being serious, but is to be taken into context. The interface doesn’t matter, it’s just a small window into a huge context and supporting scaffolding. The invisible, unattractive part is discipline: which decisions are human, where are the gates, what gets reverted when.
It’s the opposite of YOLO.
It’s a peek into what’s possible in engineering under the new conditions. The tools changed. The fundamentals, understand the problem, control scope, verify the output, learn from failure, didn’t.
The people who are best at this aren’t the ones who let go the most. They’re the ones who figured out where exactly they need to stay in control.
Why This Matters
Understanding how the extreme works already today is important for two aspects:
(I) There is a lot of disbelief and misunderstanding on how this can work and the narrative is often met with “this is just marketing” without looking deeper into why it works, where it works.
(II) There is a lot of discussion going on that this (being whatever vibe coding interpretation) is not as productive as claimed, which is often coming from people looking at it without full information and from an angle of “how does my work look like today”. Tinkering shows the way currently.
“Try It! The thing about rapidly changing, brand new tools is that nobody knows the answer to any of the questions you are likely to have. Exploristan is the Land of Try It.
Try it and then share. We’re all wondering.” - Kent Beck
(III) Most importantly: There are a lot of studies which are interpreted as “this” causes an undue amount of stress for people. Which is right. But the environments mentioned in those studies themselves did not react to this new way of work. “This” causes two things: a lot more context switching (not what humans are good at) and more ambiguity due to dissolving roles, more work options up- and downstream at the same time, thus more potential for confusion. The task for work environments is to set up new structures and rituals that create guard rails against burn out and support the new level of speed. Not everything that can be done, should be done. help structure the workday of workers according to the new challenges. But for that the new way of work first needs to be understood.


