As far as I know, the term “vibe coding” was coined by Karpathy, referring to the practice of relying on LLM-based AI agents to code on one’s behalf. The folks at Changelog have done a number of great podcast episodes on the topic. in my experience working on personal projects, the latest generation of agents like Claude Code, Gemini CLI, and Sourcegraph Amp deliver amply on the hype.

Before I forget, some observations / speculations below.

Human-Friendly == Agent-Friendly?

I’ve been working on a client-server project with Elm and Rust. Elm features, famously, one of the friendliest compilers around, from which the Rust compiler drew inspiration (at least as far as compiler output goes). Although Elm’s training corpus is relatively small compared to other languages, Claude Code has no problems with it. In part, this seems to be because the compiler helps debug issues after each round of edits.

Will compiler output become even more important in the era of agents? One can imagine an interesting A/B test here, e.g. by suppressing all output other than build status for the case while preserving full output for the control. On a related note, will the static vs dynamic typing pendulum be swung by agents?

Business Analysts FTW

At some point, I was a business analyst (BA) in the technical job sense, collecting business inputs and writing requirements specifications. “The system shall do X but not Y… for the avoidance of doubt, it should not do Z.”

Good BAs, armed with a healthy degree of skepticism and neuroticism to consider all the possible ambiguities of business ideas and language, produce reasonably complete specs. The thought process is similar to a developer seeking to ensure pattern matching captures all possible inputs. Presumably, good BAs are also good vibe coders?

Interestingly, I find Claude takes my paragraphs of specs and produces very good, concise summaries of the task at hand. It feels like the agent understands…

Tight Feedback Loops Matter (Mode)

As the code base grows, the agent tends to write more than it deletes. I’ve found it useful to give it feedback (or force it to introspect) in every possible way, including:

prompt the agent to review and refactor code for clarity and simplicity
if you have any intuition about problematic areas, ask the agent about it
have the agent develop some scripts for humans to verify system functionality (e.g. trace a path through the system); then ask the agent to use the scripts
run tests, hand-written or agent-written
make it build frequently and with standard linting and review tools like clippy in Rust

Although vibe coding can feel scary, the fundamental problem of determining if things are working correctly is no different from that of a sufficiently complex codebase. Since you can’t read all the code or hold it in memory, what do you do?

What Doesn’t Work?

UI development remains tricky. Despite tools like puppeteer that enable agents to examine the browser visually, I’ve found that hand-crafting is often more productive for designing and reasoning about layouts.

The only time all agents got stuck was when a previously generated long block of CSS was overriding layouts specified elsewhere. No matter what the agent did, the output remained incorrect and unchanging. I’m not sure whether to attribute this to UI per se. The human intuition goes something like “since we’ve tried everything including simple changes, the problem must lie somewhere else”… which seems like logic well within reach of the current generation of LLMs. Practically, the agent wasn’t searching for the problematic CSS, which was in a file rarely touched.

As we gain more collective experience in developing and managing agents, what kind of new best practices should we expect to emerge?