Experimental Morris II worm can exploit popular AI services to steal data and spread malware

Cornell researchers created worm 'to serve as a whistleblower'

Experimental Morris II worm can exploit popular AI services to steal data and spread malware

Image:
Experimental Morris II worm can exploit popular AI services to steal data and spread malware

Researchers have created the first known generative AI worm, capable of spreading through interconnected AI systems and potentially wreaking havoc on a massive scale.

Led by a team from Cornell Tech comprised of Ben Nassi, Stav Cohen and Ron Bitton, the project is dubbed "Morris II" - referencing the notorious Morris computer worm that spread on internet in 1988.

Their findings demonstrate how Morris II can exploit a generative AI email assistant, siphoning data from emails and sending spam - even bypassing some security measures in popular AI systems like ChatGPT and Gemini.

Large language models (LLMs), the backbone of generative AI, are increasingly becoming multimodal, capable of generating not just text, but also images and video.

LLMs typically function by processing prompts - text instructions that guide the system in generating responses or creating content. However, these prompts can be weaponised.

Attack methods include "jailbreaking", bypassing safety protocols and forcing the system to generate harmful or offensive content, and dialogue poisoning, in which the attacker manipulates the LLM to follow a sequence of instructions. LLMs can also leak the private data they may have been trained on.

The researchers say they created Morris II through a novel technique known as "adversarial self-replicating prompt," which tricks the AI model into generating another prompt as part of its response. This tactic, akin to traditional cyberattacks like SQL injection, poses a significant threat to the integrity of AI systems.

To demonstrate the worm's capabilities, the researchers built an email system that interacts with generative AI systems like ChatGPT, Gemini and the open-source LLaVA model.

The research team uncovered two methods of exploitation: one using text-based prompts and another embedding malicious prompts within image files.

In one scenario, the researchers simulated an attacker sending an email containing the adversarial prompt. This prompt essentially "poisons" the email assistant's database by leveraging a feature called Retrieval-Augmented Generation (RAG). RAG allows LLMs to access external data sources to enhance their responses. When the email, containing the malicious prompt, is retrieved through RAG and sent to a system like GPT-4 or Gemini Pro for processing, it essentially jailbreaks the AI service, enabling it to steal data from emails.

This stolen data, embedded within the generated response, then infects new systems when used to reply to emails from unsuspecting users.

The second method involves encoding a malicious prompt within an image, turning any image file into a potential carrier for the worm. This enables the compromised email assistant to forward the infected message to other users, further propagating the infection.

Two types of applications might be at risk from a real-world worm similar to Morris II. The first is genAI-powered applications whose execution flow is dependent upon the output of the genAI service. The second type is applications that use RAG.

The researchers say their work is intended "to serve as a whistleblower to the possibility of creating GenAI worms in order to prevent their appearance". They reported their findings to both Google and OpenAI.

OpenAI acknowledged the research, admitting the possibility of vulnerabilities related to user input that hasn't been adequately filtered or checked. The company said it is working to improve system resilience and urged developers to implement methods that safeguard against malicious input.