Malware can be hidden in English language text, say US scientists

Breakthrough paper shows hackers could evade anti-virus protection by hiding malicious code in sentences that read like English language spam

Malware deployed in this way would be almost impossible to detect

A team of US security researchers has engineered a way of hiding malware in sentences that read like English language spam.

The work is a breakthrough because current network security techniques work on the assumption that the code used in code-injection attacks, where it is delivered and run on victims’ computers, has a different structure to non-executable plain data, such as English prose.

One of the researchers, Dr Josh Mason of Johns Hopkins University, Baltimore, said the team wanted to broaden its understanding of how malicious code could be deployed, and highlight the need to design more efficient techniques for preventing this kind of attack altogether.

Dr Nicolas T Courtois, an expert in security and cryptology at University College London, said the work was an important paper in virusology, challenging an assumption that code has a different structure to non-executable plain data. He said malware deployed in this way would be “hard, if not impossible, to detect reliably.”

The research is a proof of concept, but Mason doubts any hackers are currently using the English language disguise technique for their code. “I'd be astounded if anyone is using this method in the real world owing to the amount of engineering it took to pull off,” he said. “A lot of people didn't think it could be done.”

Courtois says the paper has significant implications for technology companies, and argued that companies such as Intel should redesign their instruction set to make this kind of attack easier to detect.

And Professor John Walker, managing director of forensics consultancy Secure-Bastion, argued the research highlights the flaws in the anti-virus community's approach to security exploits. “There is no doubt in my mind that anti-virus software as we know it today has gone well past its sell by date,” he said.

Walker consults for GCHQ and is sure hacking groups will look to leverage the technique.

The research paper, presented at the ACM Conference on Computer and Communications Security in Chicago in November, is called English Shellcode – after the hacking community's generic name, shellcode, which refers to the payload portion of a code-injection attack.

This payload typically provides attackers with arbitrary control of system resources, applications, and data on a vulnerable computer. Attackers then choose how they want to continue their attack.

A tool that takes a piece of normal shellcode and generates some text to hide it could be the next step in the hacking and virus arms race. The advantage to hackers is simple. Alphanumeric shellcode can be stored in atypical and otherwise unsuspected contexts such as syntactically valid file and directory names or user passwords.

The challenge is that the alphanumeric character set is significantly smaller than the set of characters available in Unicode and UTF-8 encodings. This means that the set of instructions available for composing alphanumeric shellcode is relatively small. You couldn't have long strings of mostly capital letters, for example.

“There was really not a lot to suggest it could be done because of the restricted instruction set,” said Mason.

The team trained using English texts, roughly comprising 15,000 Wikipedia articles, and 27,000 books from Project Gutenberg.

The team can now generate English shellcode in less than one hour on standard PC hardware with 4GB of RAM.

Below is an example of an automatically generated English encoding. The text in bold is the instruction set and the plain text is skipped.

There is a major center of economic activity, suc h as Star Trek, including The Ed Sullivan Show. The form er Soviet Union. International organization participation.”

Mason said that with a lot of work, the quality of the English prose could be improved, but wouldn't really be worth the effort involved.

Mason worked with Dr Sam Small of Johns Hopkins University, Dr Fabian Monrose of the University of North Carolina, and Greg MacManus of iSIGHT Partners.

The paper is available here http://www.cs.jhu.edu/~sam/ccs243-mason.pdf