Vibe coding is real
Andrej Karpathy posted about "vibe coding" last week. Fully giving in to the vibes, seeing things appear, accepting or rejecting, barely looking at the code. He's right that it works. But most people took the wrong lesson from it.
They heard "don't look at the code" and skipped the part where you think about what you're building. So the typical session goes like this: come up with an idea, start prompting without much thought, get random features that miss the mark, try undoing what you didn't want, end up with a bloated codebase, bang your head against errors for hours, give up, start over with a new idea.
I was stuck in that loop for weeks before I figured out what was missing. It wasn't a better model. It was a planning layer between the idea and the first prompt.
The split that actually ships
AI writes the code. You do the thinking. Most people get that backwards. They hand over the thinking too, then wonder why the output is generic.
Before I write a line of code on anything, whether it's Andelo features or a side project, I work through a sequence of planning steps in conversation with an LLM. Each one builds on the last, and by the time I generate code, the model has enough context to get close on the first pass.
Here's the sequence.
I start by scoping the MVP. I describe the idea and what I think the minimum version should do. The LLM pushes back with questions I haven't considered. We go back and forth until the scope is tight. The discipline here is cutting. Every feature you add at this stage multiplies complexity in every later step. What is the smallest version of this thing that is still valuable to someone?
Then I map the tech to each feature. Not code, just decisions. If the app needs nested folders, that means recursive queries and tree traversal. If it needs image uploads, I need a CDN, compression, and a queue. Knowing this upfront means I won't discover it mid-build.
Next I think through user flows. For each feature: who uses it and what does their journey look like? A power user creating nested folders is a different flow from someone seeing an empty dashboard for the first time. This is where UX problems show up that technical planning alone won't catch.
Then I define screen states. Every screen can be empty, loading, errored, or showing content. Most vibe-coded apps only handle the happy path. Everything else feels broken. What does the registration page look like before you've typed anything? What happens when validation fails? I write this down before touching code. It's the difference between a prototype and a product.
After that I build the technical spec. Everything from the previous steps goes into one document. Feature requirements, data models, endpoints, schema, security, the design system. Detailed enough that the model won't guess. Small enough to fit in a context window. For an MVP this is usually around 15,000 tokens.
I also set coding rules. Framework-specific guardrails. For Next.js: prefer server components, use server actions, handle loading states with Suspense. These stop the model from writing code that works but is architecturally wrong.
Then I plan the tasks. The spec becomes a step-by-step build sequence. I use a large-context model for this because shorter context windows drop details. Then I run an evaluator-optimizer loop, which I'll explain below.
Finally I generate code step by step. One task, verify it works, next task. If a step needs the design system or other context, I point the tool to the right reference file. Small chunks make it obvious when something goes wrong.
The evaluator-optimizer trick
The biggest frustration with AI-generated code is context loss. You build a detailed spec, hand it to a coding tool, and half the details disappear. The model focuses on the happy path and ignores the error states you defined. It picks up the tech stack but drops the UX work.
After the model generates a task plan, I tell it: evaluate your plan against the original spec. How well did you cover each piece of the tech stack? Did you think about dependencies between steps? What about the screen states we defined?
It finds its own gaps. First round usually catches missing state management, incomplete error handling, skipped UX details. I tell it to regenerate with those gaps filled, then run one more round focused on the design layer.
Two rounds of self-evaluation catch most of what was dropped. The plan that comes out is noticeably better than the first pass, and it keeps the details you spent time writing.
The same trick works during code generation. After each phase I check: does what you just built match the spec for this section? If not, fix it before moving on.
Vibes and specs are not opposites
"Make it feel premium" without a design system produces generic output. The model has nothing to anchor on, so it picks the statistical average of everything it's seen. That average is, by definition, mediocre. "Make it feel premium" with a defined color palette, spacing scale, and animation timing gets you something that actually looks the part.
I still describe aesthetics in loose terms. "Snappy, not bouncy." "Quiet until you need it." These do more work than specific CSS values. But they only work when the model already knows what the component does, what states it handles, and what design system it belongs to.
Vibe coding without a planning layer is prototyping. With one, it's shipping.