Agentic AI: Where are we now? Databricks’ EMEA CTO gives his take

Dael Williamson on the remaining barriers to enterprise adoption of agentic AI and how they are being overcome

Image:
Agentic AI: Where are we now? Databricks EMEA CTO gives his take

Earlier this year, I wrote about the mysterious lack of visibility - given all the noise - of AI agents outside of big tech demos, which seemed to be their primary use case. At a recent Databricks event in London I sat down with Dael Williamson, EMEA CTO at the data and AI company, and asked him about this apparent agentic reticence. What are the hurdles to adoption and how are they being overcome?

What is an AI agent?

First, we must sort out the semantics and ask: what is an AI agent? Otherwise, how can we determine if they exist outside of a marketing department’s imagination? Williamson, who acknowledges that real-world examples of agents in production are still scarce, offered this definition:

"An agent is an AI system that has autonomy over, or agency over, a task.”

He continued: "We use the word 'system' very carefully because you’ve got to remember these things are probabilistic. It’s not like an algorithm that you put into a mainframe and leave for 40 years."

In terms of requirements, he added: "We’re trying to create something that can stably complete a task without causing any reputational or financial risk."

Agents are not deterministic algorithms running in isolation, nor are they LLMs or chatbots. They’re a collection of multiple models and tools working together to complete specific tasks. Williamson prefers the more precise term 'compound AI systems', as proposed in a paper by Databricks co-founder Matei Zaharia, but concedes that the snappier 'agents' has won the buzzword battle. "It was kind of inevitable."

Easy to build, hard to trust

Anyone who has attended or watched coverage of a big tech event will have seen demonstrations of agents performing back-office tasks such as accounting or supply chain optimisation. They certainly look impressive, but they are mostly for show, according to Williamson. "They’re building them top-down. They create a nice UI, then they build an agent, and it does something. But if you run that agent five times, you’ll get very divergent results."

The latter point is hugely important. Generative AI, unlike traditional machine learning (ML), is probabilistic, not deterministic. Left unchecked, AI agents have a short useful ‘half-life’ - typically days or weeks - as their output will change over time. To prevent them from wandering off track, a feedback mechanism is needed plus tools to keep them on course, rebuilding them when necessary.

Image
Description
Dael Williamson

"Building an agent is actually super easy. I can build one in about three minutes," Williamson remarked. "Building the evaluations - that’s what takes time."

When it comes to creating trustworthy agents at a reasonable cost, the key is to select small, domain-specific models that are just good enough for their specific task. Through fine-tuning and careful data preparation, generic, error-prone models can be honed into more reliable enterprise tools. But, Williamson insisted, you must start with the data. "You need to build it bottom-up, data first. Then you build the tools, then the agent, and then the evaluations and cycles."

It’s also vital to understand that agents can never be 100% accurate or predictable, probably not even 99%. It’s about setting acceptable behavioural limits for each use case and establishing appropriate practical thresholds.

Evaluation-driven development: a new paradigm

The move towards agency is driving a new development paradigm, Williamson said. He drew an analogy with the evolution of software development from waterfall to Agile to DevOps, thanks to automated testing. AI development is following a similar path, moving (in his analogy) from "waterfall AI" with its unpredictable results to "agile AI", which improves accuracy through feedback and evaluation frameworks.

Evaluations - or evals - differ from traditional software tests; they don’t test functionality, they test behaviour. "They’re more like boundaries," explained Williamson, comparing evals to parents setting limits for children, monitoring their behaviour, and taking corrective action where necessary. Just as effective parental guidance nudges a child towards acceptable behaviour, so a sophisticated evaluation framework will improve the reliability and accuracy of the agents.

However, evals are difficult to scale using traditional methods, requiring large teams of human monitors to assess outputs. Databricks uses synthetic data to build automated evaluation tools, making quality assurance accessible to broader technical teams. The company also collects traces (the "AI exhaust fumes") generated as human users interact with agentic systems, which are then fed back into the system to improve it - as well as being invaluable for auditing.

Engineers with eval skills are in high demand, Williamson noted: "It’s attracting software engineers and AI engineers alike."

And data engineers too. As agentic AI systems begin to automate the extract-transform-load (ETL) process, they are also getting involved. "Instead of ETL, it’s EAL: Extract, agent-based Adjustment, Load, and agents are doing the loading too. There’s an entirely new skill set emerging. As AI becomes more involved in coding, quality assurance becomes critical, and eval-driven development becomes key."

Evaluation skills are also increasingly important for compliance. The EU AI Act, for example, requires organisations using certain automated systems to demonstrate what data entered their AI systems, how it was processed, what the models did, and why specific outputs were generated. "Tracing is the most important thing when it comes to compliance," said Williamson. "You’re able to show inputs, outputs, processing - almost how the model’s thinking."

Data architecture


AI agents making real-time decisions can only be as good as the organisation’s data infrastructure. They can only be truly effective if they have wide-ranging access to an organisation’s resources, including sensitive and unstructured data, and if that data is reliable. This creates both an architectural challenge and potentially a security nightmare.

Architecturally, it means not just storing data, but understanding its lineage and tracking its flow through complex systems. This extends beyond traditional structured data governance to include unstructured files, logs, notebooks, documents, code repositories, and so forth. In terms of security, it requires granular permissions for data access, automated masking of personal data/PII outputs, and ensuring sensitive data doesn’t cross boundaries.

All this must be in place before letting agents loose on sensitive data.

"If you don't have all that [provided by the catalogue layer of the agentic system] then building these AI systems really is going to be the Wild West, you'll never be robust," commented Williamson. "And if you don't have traceability you're black-boxing.

“But if you do have it all in place, then you can make sure you can adhere to the regulations."

Where are we now?

So, how far have we come with production agentic AI in the enterprise? Companies like Databricks are certainly working hard to connect the dots of data architecture, governance, security, authentication, ML and GenAI and to simplify deployment of agents, but for most organisations, formidable challenges remain in implementing truly reliable agentic systems (as opposed to the ML-based predictive analytics systems sometimes proffered by the industry as agentic use cases).

Using Williamson’s definition of an agentic AI system as one that "provides autonomy over a task" and "can stably complete a task without causing reputational or financial risk," my conclusion from our chat and from other explorations is that they are still few and far between in business, outside of specialised areas like coding and cybersecurity.

This is probably a good thing, given the immaturity of the space and the potential for harm. (Just this week OpenAI co-founder Andrej Karpathy bemoaned the “very large demo-to-product gap” and said enterprise agents could be 10 years away.) But things are moving in the right direction, and I look forward to meeting an AI agent ‘in the wild’ sometime soon.