The final panel of the day at our Frontiers conference this year was hosted by me—though it was going to be tough to follow Ars AI expert Benj Edwards’ panel because I didn’t have a cute intro planned. The topic we were covering was what might happen to developers when generative AI gets good enough to consistently create good code—and, fortunately, our panelists didn’t think we had much to worry about. Not in the near term, at least.
Joined by Luta Security founder and CEO Katie Moussouris and Georgetown University Senior Fellow Drew Lohn, the general consensus was that, although large language models can do some extremely impressive things, turning them loose to create production code is a terrible idea. While generative AI has demonstrated the ability to create code, even cursory examination proves that today’s large language models (LLMs) often do the same thing when coding that they do when spinning stories: they just make a whole bunch of stuff up. (The term of art here is “hallucination,” but Benj tends to prefer the term “confabulation” instead, as it more accurately reflects what it feels like the models are doing.)
So, while LLMs can be relied upon today to do simple things, like creating a regex, trusting them with your production code is way dicier.
A huge underlying issue, says Katie, is that generative models are only as good as the code they’re trained on. “My friends over at Veracode did a little study over the past 12 months, and 70 percent of the code that’s out there contains security flaws,” she explained. “So if you’ve got a model trained on code that is 70 percent vulnerable… the resultant code that comes out is highly likely to contain security flaws.” This means that even though the code might be mostly functional—like my Bing Chat-generated regex above—there’s also a high likelihood that it will come with problems baked in.

Loading comments...