Google AI Overviews deliver millions of errors hourly, analysis suggests

Google has challenged the study’s methodology

Google's AI-powered search summaries could be generating vast numbers of inaccuracies every hour, despite appearing largely reliable at first glance.

The study, reported by The New York Times, examined "AI Overviews" – automated summaries powered by Google's Gemini models – and found they were accurate roughly 90% of the time.

While the accuracy rate may seem high, researchers warn that Google's vast search volume means even a small percentage of errors can result in a significant number of mistakes.

With more than five trillion searches processed annually, a 10% inaccuracy rate could result in tens of millions of incorrect responses every hour.

The analysis was conducted with the help of Oumi, an AI startup, using a benchmark known as SimpleQA, a dataset of more than 4,000 verifiable questions developed by OpenAI in 2024.

Initial testing last year, when Gemini 2.5 was Google's leading model, showed an accuracy rate of around 85%. After the release of Gemini 3, this improved to 91%.

Examples of errors

The report highlights several instances where AI Overviews produced misleading or incorrect answers.

In one case, the system was asked when reggae musician Bob Marley's former home became a museum. It cited multiple sources, but none clearly supported its answer, and it ultimately selected the wrong date.

In another example, it incorrectly stated that a "Classical Music Hall of Fame" did not exist, despite referencing a website confirming cellist Yo-Yo Ma's induction.

Industry experts say such issues are not unique to Google. Pratik Verma, chief executive of Okahu, said the technology was broadly comparable to other leading AI systems but urged caution.

"Never trust one source," he said. "Always compare what you get with another source."

Google itself includes a disclaimer beneath AI-generated summaries stating: "A.I. can make mistakes, so double-check responses."

Google has challenged the study's conclusions, arguing the methodology is flawed.

A spokesperson said the research relied on a benchmark that itself contained inaccuracies and did not reflect real-world search behaviour.

"This study has serious holes," said Ned Adriance. "It doesn't reflect what people are actually searching on Google."

AI can foster a misleading sense of certainty

Experts say evaluating AI accuracy remains difficult. Different companies use varying benchmarks, and results can change depending on how questions are phrased.

Generative AI systems are also "non-deterministic", meaning they can produce different answers to the same query.

Moreover, analysts warn that AI-generated summaries may foster a misleading sense of certainty.

Although drawing on web data makes these systems more accurate than standalone models, their responses can still oversimplify or misrepresent complex information. Critics say this becomes more concerning when users rely on summaries instead of consulting original sources, where fuller and often more reliable context can be found.

Research published last year by the Pew Research Center suggests the feature may also be changing how people browse the web.

In a study of 900 US users who agreed to share their browsing data, those shown an AI-generated summary clicked on traditional search results in just 8% of visits, compared with 15% among users who did not see a summary.

The findings also indicated that AI summaries may reduce deeper engagement: around 26% of pages featuring an overview were closed immediately, versus 16% of pages without one.