Do Users Write More Insecure Code with AI Assistants?

Programming@programming.dev – 78 points – 11 months ago

cross-posted from: https://programming.dev/post/8121843

~n (@nblr@chaos.social) writes:

This is fine...

"We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet were also more likely to rate their insecure answers as secure compared to those in our control group."

[Do Users Write More Insecure Code with AI Assistants?](https://arxiv.org/abs/2211.03622?

I think this is extremely important:

Furthermore, we find that participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities.

Bad programmers + AI = bad code

Good programmers + AI = good code

LLMs amplify biases by design, so this tracks.

What do you mean? Sounds to me like any other tool, it takes skill to use it well. Same as stack overflow, built in code suggestions or IDE generated code.

Not to detract from the usefulness of it just in terms of the fact that it requires knowledge to use well.

As someone currently studying machine learning thoery and how these models are built, I'm explaining that built into the models at their core are functions that amplify the bias of the training data by identifying and using mathematical associations within the training data to create output. Because of that design, a naive approach to its use would result in amplified bias of not only the training data but also the person using the tool.

This. As an experienced developer I've released enough bugs to miss-trust my own work and spend as much time as I can afford in the budget on my own personal QA process. So it's no burden at all to have to do that with AI code. And of course, a well structured company has further QA outside of that.

If anything, I find it easier to do that with code I didn't write myself. Just yesterday I merged a commit with a ridiculous mistake that I should have seen. A colleague noticed it instantly when I was stuck and frustrated enough to reach out for a second opinion. I probably would've noticed if an AI had written it.

Also - in hindsight - an AI code audit would have also picked it up.

The quote above covered exactly what you just said: "yet were also more likely to rate their insecure answers as secure compared to those in our control group" at work :-)

I find that the people who complain the most about AI code aren't professional programmers. Everyone at my company and my friends who are in the industry are all very positive towards it

I'm still of the opinion that....

Good programmers = best code

eh, I've known lots of good programmers who are super stuck in their ways. Teaching them to effectively use an LLM can help break you out of the mindset that there's only one way to do things.

I find it's useful when writing new code because it can give you a quick first draft of each function, but most of the time I'm modifying existing applications and it's less useful for that. And you still need to be able to judge for yourself whether the code it offers is any good.

I find it's great for explaining convoluted legacy code, it's all about utilizing it effectively

It really depends

How widely used is the thing you want to use. For example it hallucinated caddyfile keys when I asked it about setting up early data support for a reverse proxy to a docker container, luckily caddy docs are really good and it was an issue with the framework I use anyway so I had to look it up myself after all. Ig it'd have been more likely to do this right at first attempt if say I wanted it to achieve that using Express with Nginx. For even less popular technology like Elixir it's borderline useless beyond very high level concepts than can apply to any programming language.
How well documented it is, also more widespread use can sometimes make up for bad docs.
How much has changed since it was trained. Also it might still include deprecated methods since it doesn't discriminate between official docs and other sources like SO in it's training data.

If you want to avoid these issues I'd suggest to first read the docs, then look up stack overflow or likely name of a function you need to write on grep.app, then use a LLM as your last resort. Good for prototyping usually, less so for more specific things.

I think that's one of the best use cases for AI in programming; exploring other approaches.

It's very time-consuming to play out how your codebase would look like if you had decided differently at the beginning of the project. So actually comparing different implementations is very expensive. This incentivizes people to stick to what they know works well. Maybe even more so when they have more experience, which means they really know this works very well, and they know what can go wrong otherwise.

Being able to generate code instantly helps a lot in this regard, although it still has to be checked for errors.

Good programmers + AI = extra, unnecessary work just to end up with equal quality code

Not even close to true but ok

A worrying number of my colleagues use AI blindly. Like the kind where you just press tab and not even look. Those who look spend a second before moving on.

They call me anti-AI, even though I've used chatGPT since day 1. Those LLMs are great tools, but I am just paranoid to use it in that manner. I rather it explain to me how to do the thing instead of doing the thing (at which it is even better).

EDIT: Typo

Those LLMs are great fools, but I am just paranoid to use it in that manner.

Exquisite typo. I also agree to everything else you said.

ChatGPT can be surprisingly good at some things, but can also produce good-looking nonsense. The problem is that spotting those cases requires a certain level of knowledge of the subject, which makes the use of it kind of pointless. I personally use it for subjects where my knowledge is significantly below average, such as learning new frameworks / languages (e.g. React). It often gets stuck with more complex questions (e.g. questions related to x86 Assembly) or obscure subjects. I rely more on its ability to reproduce information than its problem-solving ability. I think the next development is adding LSP integration to the AI assistants and other tools to check its output.

However, I think most people don't use it the way I just described. A lot of people seem to mistake its ability to write code for an ability to understand code. It also sometimes uses older functions deprecated for security reasons, especially when using C. So yes, I think it will increase the amount of insecure code.

Not even knowledge, attentiveness. It's so easy to overlook issues with AI written code vs writing it yourself and having to come up with the process. Just today i had this happen, cost me a day of extra work because i missed something in chatgpt's great looking code.

In a shock to literally nobody... Jokes aside, I am looking forward to reading this paper

I'm not even sure how to utilize AI to help me write code.

Also one really good practice from pre-Copilot era still holds, that many new users of copilot, my past self included might forget: don't write a single line of code without knowing it's purpose. Another thing is that while it can save a lot of time on boilerplate, you need to stop and think whenever it's using your current buffer's contents to generate several lines of very similar code whether it wouldn't be wiser to extract the repetitive code into a method. Because while it's usually algorithmically correct, good design still remains largely up to humans.

There are lots of services to facilitate it. Copilot is one of them.

Is it really helpful / does it save a lot of time? I’m the worlds #1 LLM hater (don’t trust it and think it’s lazy) but if it’s a very good tool I might have to come around

I haven't been using it much, so I don't know if I'm a good judge. But I see it as an oversized autosuggestion tool that sometimes feels like an annoying interuption but sometimes feels like it helped me mover faster without breaking my train of thought.

By "it", I mean I've tried several different ways to have an integrated LLM assistant integrated into my dev environment, none of which I was initially satisfied with in terms of workflow. But that's kinda true for every change I've made to my dev environment and workflows. It takes me a while to settle on anything new.

I recommend none in particular, but I recommend that you take time to at least check it out. They have potential.

There's a very naive, but working approach: Ask it how :D

Or pretend it's a colleague, and discuss the next steps with it.

You can go further and ask it to write a specific snippet for a defined context. But as others already said, the results aren't always satisfactory. Having a conversation about the topic, on the other hand, is pretty harmless.

Copilot or Tabnine are the two major ones.

They're awesome for some things (especially error handling). But no.. AI will not take over the world anytime soon

Definitely in C at least

Good programmers - AI = best code.