You shipped a thing. It works. The button buttons. The demo demoes.
Then reality shows up: a weird input, a slow API, a missing env var, a user who clicks “back” at the worst possible time, and surprise, your app turns into modern art.
That’s the difference between vibe coding and software engineering. Vibe coding gets you to “it runs.” Engineering gets you to “it’s accountable.”
And the research backs up why this matters. AI-generated code often works while still being risky: Veracode found AI-generated code introduced “risky security flaws” in 45% of tests. The well-known Copilot security study found about 40% of generated programs were vulnerable in CWE scenarios. A Stanford/Boneh user study found people with an AI assistant wrote less secure code and were more likely to think it was secure.
So the fix isn’t “don’t use AI.” The fix is “stop treating working output as reviewed output.”
Below are 10 “bridge questions,”phrased as prompts you can literally ask your IDE or agent, designed to force the mindset shift from “it runs” to “it’s accountable.” This should work whether you're using Codex, Claude Code, v0, or any other tool on the market. (Pro tip: If you use code NEURON-V0 signing up for v0, they'll give you $25 in free credits. Credits expire in one month.)
If you prefer to download and install as a skill, here's a link to my github repo to get you up and running right away.
The 10 vibe-coder → engineer questions (copy / paste prompts)
1) Explain the diff like I’m the on-call engineer
Prompt: “Summarize what changed, what behavior changed, and what could break.”
Why this exists: Engineers start with blast radius. Not “what did we build?” but “what did we change that could wake me up at 2 a.m.?”
What you’re looking for:
- Behavior changes (including “minor” defaults)
- Any migrations / config changes
- Any new dependencies or permissions
2) What assumptions does this code make? Which are untested?
Prompt: “List assumptions about inputs, ordering, time, environment, and external services.”
Assumptions are silent promises. Silent promises become loud incidents.
Look for assumptions like:
- “This field is always present”
- “Requests arrive in order”
- “Time is monotonic” (it isn’t)
- “This API is fast / always up” (lol)
3) What are the trust boundaries and threat model for this change?
Prompt: “Entry points, roles, assets, abuse cases, and mitigations.”
Translation: where does untrusted stuff enter, what do you protect, and how could someone be evil?
Why it matters: Databricks’ red team showed vibe-coded apps can look totally fine while hiding serious vulnerabilities like insecure deserialization that can lead to code execution.
Look for:
- Entry points: web routes, webhooks, jobs, uploads
- Roles: anonymous, user, admin, service accounts
- Assets: money, data, API keys, privileged actions
- Mitigations: validation, auth checks, rate limits, safe parsing
4) Show every endpoint / job / handler and the authn / authz rule that protects it
Prompt: “Flag any route that lacks server-side enforcement.”
Quick definitions:
- Authn (authentication) = who are you?
- Authz (authorization) = what are you allowed to do?
This is the #1 place “works in demo” becomes “everyone can see everyone’s stuff.”
Bonus paranoia: Apiiro reported AI-assisted changes can increase deep issues like privilege escalation paths and architectural design flaws—exact numbers vary by context, but the pattern is the warning label.
5) Trace untrusted input to sensitive sinks
Prompt: “DB writes, shell, templates, deserialization, HTML output—show me the paths.”
This is the engineer version of “follow the slime trail.”
Untrusted input includes:
- Query params, forms, JSON bodies
- Headers, cookies
- Webhook payloads
- Files (uploads), CSVs, pasted text
Sensitive sinks include:
- Database writes (especially dynamic queries)
- Template rendering / HTML output
- Shell commands
- Deserialization (parsing objects back from bytes)
If your agent can’t clearly trace the path, that’s your cue to inspect manually.
6) What sensitive data do we touch, and where does it go?
Prompt: “Fields collected/stored/returned/logged; highlight over-collection and log leakage.”
Your future incident report will include the sentence: “We didn’t realize we were logging that.”
Look for:
- Passwords (never), tokens, API keys
- Emails, phone numbers, addresses
- Any IDs that can be used to fetch more data
- Logs, analytics events, error trackers, third-party calls
Rule of thumb: collect the minimum, store the minimum, log almost none of it.
7) Show me the failure behavior
Prompt: “Timeouts, retries/backoff, circuit breakers, idempotency—what happens when X is down?”
Reality checklist:
- APIs time out.
- Queues backlog.
- DB connections run out.
- Users double-click.
AI-generated code is especially prone to “happy path heroism” and weak exception handling. CodeRabbit’s analysis of 470 OSS PRs found AI-generated PRs had ~1.7× more issues overall.
Look for:
- Timeouts set explicitly (not “infinite vibes”)
- Retries with backoff (not retry-storms)
- Idempotency (safe replays) for writes and jobs
- Clear error messages that don’t leak secrets
8) Generate (or point to) tests that prove the core promises
Prompt: “Happy path + top edge cases + one ‘evil input’ test per trust boundary.”
Tests aren’t morality. They’re receipts.
Minimum set:
- Happy path (so you know what “works” means)
- Top edge cases (nulls, empty, huge, weird unicode, etc.)
- One “evil input” per trust boundary (injection attempts, path traversal attempts, role bypass attempts)
If you only do one thing from this whole list, do this.
9) Find maintainability debt: duplication, unnecessary dependencies, naming/style drift
Prompt: “Suggest reuse/refactors; identify new deps that could be removed.”
This is where projects become products—or haunted houses.
Two useful signals from recent research:
- LLM agents can skip reuse opportunities and create more redundancy than humans, which quietly turns into a maintenance tax.
- Reviewers often react neutrally/positively anyway, because the code looks plausible.
Look for:
- Duplicate logic that should be a shared helper
- New libraries for something your stack already does
- Naming/style inconsistencies (death by 1,000 papercuts)
10) Can we safely deploy / observe / rollback this?
Prompt: “Required env vars validated, migrations safe, logs/metrics exist, and rollback steps documented.”
Shipping isn’t done when it merges. Shipping is done when:
- You can detect failure fast
- You can stop the bleeding fast
- You can explain what happened later
Look for:
- Config validation at startup
- Migration safety (backwards compatible when possible)
- Logs that answer: what happened, to whom, when?
- A rollback plan that isn’t “uhhh revert?”
The 5-question “compression ratio” (for busy brains)
If 10 feels like a lot, here’s the engineer mental model:
- Do I understand it? (can I explain it?)
- Can it be abused? (trust boundaries, authz, input → sinks)
- Will it break under reality? (timeouts, retries, edge cases)
- Will it melt under load? (performance / concurrency basics)
- Can we run it safely? (deploy, observe, rollback)
That’s basically the whole job. The rest is details and coffee.