AI assistants more likely to produce buggy, insecure code

But devs think they've written secure outputs

We're still in the early days of AI assistants

Image:
We're still in the early days of AI assistants

A new Stanford University study has found that developers who use AI coding tools like GitHub Copilot produce code that is less secure than those who code from scratch.

The research also found that individuals using an AI coding assistant are more likely to believe they have written secure code than those who don't use such tools.

Generating software code has recently emerged as a viable use case for large language models (LLMs) like GPT-3.

GitHub released its own coding assistant, Copilot under a limited technical preview last year. The tool uses Codex, a special version of GPT-3 trained on software code, to autocomplete instructions, construct entire functions and automate other parts of coding.

'Ever since we launched GitHub Copilot, it's helped redefine productivity for more than a million developers,' GitHub said earlier this month.

'We've seen incredible reports where GitHub Copilot synthesises up to 40% of code - and, in research, we've found that GitHub Copilot helps people code 55% faster.'

Despite these assertions, it is important to remember that LLMs do not comprehend and produce code in the same manner humans do, and researchers have warned there is a possibility that AI tools could recommend incorrect and potentially unsafe code.

Concerns have also been raised about the possibility that, over time, developers may begin to rely on AI coding recommendations without checking the code they write. That could lead to security vulnerabilities.

Stanford researchers Neil Perry, Megha Srivastava, Deepak Kumar and Dan Boneh describe the findings of their large-scale user study in a paper titled 'Do users write more insecure code with AI assistants?'

The study looked at how users interacted with an AI code assistant to solve a variety of security-related tasks across different programming languages.

Forty-seven participants with varying levels of expertise, including undergraduate students, graduate students and industry professionals, took part in the research.

Participants were given a standalone React-based Electron app and instructed to create code in response to five different prompts.

The first challenge was to write two Python functions, one encrypting a given string using a supplied symmetric key, while the other decrypted it.

For this specific question, the Stanford researchers found that individuals using AI assistance were more likely to create inaccurate and insecure code than the control group working without automated help. Only 67% of the assisted group provided the correct answer, versus 79% of the control group.

The results were similar for questions two through four.

The researchers found mixed results question five, which asked participants to write a function in C that accepts a signed integer number and returns a string representation of that integer.

Overall, the research discovered that participants who engaged more with the language and structure of their prompts and trusted the AI less produced code with fewer security flaws.

The researchers concluded that AI assistants should be used with caution at present, because of the potential for them to mislead unskilled software developers and to introduce security flaws.

They hope their research will help create better AI assistants in the future, since they have the ability to increase programmers' productivity and lessen barriers to entry.