Don’t vibe that code
Azul’s CTO on when and when not to use LLMs to assist software development
Who doesn’t love a good buzzword? It’s possible that half the tech industry (and a good proportion of tech journalism) wouldn’t exist if the industry didn’t hunger for a neologism. One of the latest (but not the latest) is vibe coding.
“There’s a new kind of coding I call ‘vibe coding’ where you fully give in to the vibes, embrace exponentials and forget that the code even exists,” OpenAI co-founder Andrej Karpathy. who is credited with coining the term, mused back in February.
When someone like that says something like this, folks sit up and listen. But should they?
Not if anyone else is depending on your code, says Simon Ritter, deputy CTO at Java runtime software provider Azul. AI-generated code is likely to be unreliable, insecure, inefficient and in the long run may cost more money, time and effort than it saves.
A perfectly valid use-case for vibe coding – where the user dictates the desired end result to an LLM which then generates the code – is creating a static personal website dedicated to your cat. Having been trained on millions of examples of such sites, the LLM will do a pretty good job and if anything goes wrong there’s no harm done. The cat is unlikely to complain.
Bad vibes
In adding another usability layer between the “coder” and the source, vibe coding is really just another abstraction, like a higher level language or a framework. The trouble is, each abstraction inevitably adds junk code and vulnerabilities, and in the case of LLMs this tendency is amplified by the vagaries of natural language, Ritter explained.
“The word ‘run’ has 645 different meanings in the Oxford English Dictionary, right? We have that ambiguity built into the language. So when we're trying to use it to describe what we want to do, from a programming perspective, it becomes very hard.”
And that’s before you get to hallucinations and failure to understand context.
Programming languages, particularly compiled ones such as Java, have a whole ecosystem of checks and balances to eliminate ambiguities before they make it into the codebase. LLMs, not so much.
It’s not that automation of coding is a bad thing. Far from it. Development environments with predictive features have been around for years. These autocomplete features have grown increasingly sophisticated over time, appreciably improving productivity. IDEs are at one end of the coding automation spectrum.
In the middle of the spectrum, explains Ritter, sits something like Claude Code “which is using a prompt but to generate a component, and that becomes much more like a dynamic library. If you've got a library which doesn't quite do what you want it to, then you can use something like Claude and, say, write me a class or a method that does this very specific thing, and it's very good at generating that code.”
Vibe coding lives at the far end of the spectrum, the ultimate in no-code. “It eliminates, very largely, the idea of being a coder, so you're just relying on being able to type a natural language prompt,” Ritter explained.
Fine for experimenting, but not for serious business. Unless an experienced developers are on hand to security check and refine the output, vibe coding could quickly generate a huge tangle of spaghetti coded trouble. “One of the key things that we have to understand is that level of trust in what we're generating,” Ritter commented, adding that this requires prior experience and expertise.
Swarm coding
But let’s return to the middle of the coding automation spectrum. Numerous developers now swear by Claude Code, OpenAI Codex, Gemini CLI and other GenAI coding platforms as genuine productivity boosters. Each of these models has its strengths and weaknesses. Why not play them off against each other, or use them as a team?
This approach has spawned a new buzzword: “swarm coding”.
“You're using a number of agents to construct different parts of the system,” Ritter explained. “So you get one to create a product requirement document, and then you get a different agent from a different source to review the PRD that's generated. So, you're then automating that process but having some [oversight] of what has been done.
“I think that that use of multiple agents, multiple different systems is definitely something that will become more common.”
But specialised agents should be trained on specialised data. Currently most LLMs are trained on GitHub repositories, which are often of dubious quality.
“If you're going to generate high quality code as output, what you need is a large body of high quality code for training, because it's the old garbage in garbage out problem,” said Ritter.
For specialised Java programming, the open source Java Virtual Machine codebase would make an ideal source of training data, he went on. “The JVM is 7.5 million lines of code. It’s written in C++, but all the Java libraries have been written by people who really understand Java. So let's use that as a starting point.”
So is Azul working on AI apps and models itself to improve developer experience? Not yet. Ritter favours a cautious approach when it comes to automation. “We’re waiting to see how it goes.”
The company has started working in partnership with automation specialists Moderne to use AI to improve performance by identifying, removing and refactoring unused and dead code, but it’s a long way from the scenario painted by vibe coding enthusiasts.
“Let's try it out, but let's not commit to getting rid of all our developers and replacing them with an AI agent yet.”