Ars Frontiers recap: What happens to developers when AI can code?

Our second AI panel of the day, featuring Georgetown University’s Drew Lohn (center) and Luta Security CEO Katie Moussouris (right). Skip to 3:01:12 if the link doesn’t take you directly there. Click here for a transcript of the session.

The final panel of the day at our Frontiers conference this year was hosted by me—though it was going to be tough to follow Ars AI expert Benj Edwards’ panel because I didn’t have a cute intro planned. The topic we were covering was what might happen to developers when generative AI gets good enough to consistently create good code—and, fortunately, our panelists didn’t think we had much to worry about. Not in the near term, at least.

Joined by Luta Security founder and CEO Katie Moussouris and Georgetown University Senior Fellow Drew Lohn, the general consensus was that, although large language models can do some extremely impressive things, turning them loose to create production code is a terrible idea. While generative AI has demonstrated the ability to create code, even cursory examination proves that today’s large language models (LLMs) often do the same thing when coding that they do when spinning stories: they just make a whole bunch of stuff up. (The term of art here is “hallucination,” but Benj tends to prefer the term “confabulation” instead, as it more accurately reflects what it feels like the models are doing.)

So, while LLMs can be relied upon today to do simple things, like creating a regex, trusting them with your production code is way dicier.

Image showing a conversation with Bing Chat — The regex I asked Bing Chat to create. Is it perfect? No. Is it good enough? Yep.

A huge underlying issue, says Katie, is that generative models are only as good as the code they’re trained on. “My friends over at Veracode did a little study over the past 12 months, and 70 percent of the code that’s out there contains security flaws,” she explained. “So if you’ve got a model trained on code that is 70 percent vulnerable… the resultant code that comes out is highly likely to contain security flaws.” This means that even though the code might be mostly functional—like my Bing Chat-generated regex above—there’s also a high likelihood that it will come with problems baked in.

Drew built on that point, noting that there has been academic research in the area substantiating the issue. “A couple of people at NYU and Stanford have some papers out showing these things… even if you pair it with a human, that the human writes more vulnerable code with the AI helping them.” Drew’s concern is that the problem becomes a snake eating its own tail—as AI generates code with small problems, that code is then used as training data for more generative models, and those small problems become deeply systemic.

Both panelists advise caution in our AI-generated future, and both explained that human review at all levels of the development process remains vital. We are not yet anywhere close to being able to churn out code on autopilot—though it seems pretty likely that the day will eventually come. Speaking personally, I wouldn’t have any problem relying on generative AI to generate small stuff that I could do but would rather not do—simple shell scripts, regexes, and other drudgeries that would take me a lot of time to write. It takes a lot less time to vet a couple of lines of generated code than it does to create, for a non-developer like me.

And if my regex ushers in the singularity that causes AI to obliterate mankind, I’m really sorry.