Image

The CURL Project Drops Bug Bounties Due To AI Slop

Over the past years, the author of the cURL project, [Daniel Stenberg], has repeatedly complained about the increasingly poor quality of bug reports filed due to LLM chatbot-induced confabulations, also known as ‘AI slop’. This has now led the project to suspend its bug bounty program starting February 1, 2026.

Examples of such slop are provided by [Daniel] in a GitHub gist, which covers a wide range of very intimidating-looking vulnerabilities and seemingly clear exploits. Except that none of them are vulnerabilities when actually examined by a knowledgeable developer. Each is a lengthy word salad that an LLM churned out in seconds, yet which takes a human significantly longer to parse before dealing with the typical diatribe from the submitter.

Although there are undoubtedly still valid reports coming in, the truth of the matter is that the ease with which bogus reports can be generated by anyone who has access to an LLM chatbot and some spare time has completely flooded the bug bounty system and is overwhelming the very human developers who have to dig through the proverbial midden to find that one diamond ring.

We have mentioned before how troubled bounty programs are for open source, and how projects like Mesa have already had to fight off AI slop incidents from people with zero understanding of software development.

... does this count as fake news?

LLM-Generated Newspaper Provides Ultimate In Niche Publications

If you’re reading this, you probably have some fondness for human-crafted language. After all, you’ve taken the time to navigate to Hackaday and read this, rather than ask your favoured LLM to trawl the web and summarize what it finds for you. Perhaps you have no such pro-biological bias, and you just don’t know how to set up the stochastic parrot feed. If that’s the case, buckle up, because [Rafael Ben-Ari] has an article on how you can replace us with a suite of LLM agents.

Image
The AI-focused paper has a more serious aesthetic, but it’s still seriously retro.

He actually has two: a tech news feed, focused on the AI industry, and a retrocomputing paper based on SimCity 2000’s internal newspaper. Everything in both those papers is AI-generated; specifically, he’s using opencode to manage a whole dogpen of AI agents that serve as both reporters and editors, each in their own little sandbox.

Using opencode like this lets him vary the model by agent, potentially handing some tasks to small, locally-run models to save tokens for the more computationally-intensive tasks. It also allows each task to be assigned to a different model if so desired. With the right prompting, you could produce a niche publication with exactly the topics that interest you, and none of the ones that don’t.  In theory, you could take this toolkit — the implementation of which [Rafael] has shared on GitHub — to replace your daily dose of Hackaday, but we really hope you don’t. We’d miss you.

That’s news covered, and we’ve already seen the weather reported by “AI”— now we just need an automatically-written sports section and some AI-generated funny papers.  That’d be the whole newspaper. If only you could trust it.

Story via reddit.

Image

Can Skynet Be A Statesman?

There’s been a lot of virtual ink spilled about LLMs and their coding ability. Some people swear by the vibes, while others, like the  FreeBSD devs have sworn them off completely. What we don’t often think about is the bigger picture: What does AI do to our civilization? That’s the thrust of a recent paper from the Boston University School of Law, “How AI Destroys Institutions”. Yes, Betteridge strikes again.

We’ve talked before about LLMs and coding productivity, but [Harzog] and [Sibly] from the school of law take a different approach. They don’t care how well Claude or Gemini can code; they care what having them around is doing to the sinews of civilization. As you can guess from the title, it’s nothing good.

"A computer must never make a management decision."
Somehow the tl;dr was written decades before the paper was.

The paper a bit of a slog, but worth reading in full, even if the language is slightly laywer-y. To summarize in brief, the authors try and identify the key things that make our institutions work, and then show one by one how each of these pillars is subtly corroded by use of LLMs. The argument isn’t that your local government clerk using ChatGPT is going to immediately result in anarchy; rather it will facilitate a slow transformation of the democratic structures we in the West take for granted. There’s also a jeremiad about LLMs ruining higher education buried in there, a problem we’ve talked about before.

If you agree with the paper, you may find yourself wishing we could launch the clankers into orbit… and turn off the downlink. If not, you’ll probably let us know in the comments. Please keep the flaming limited to below gas mark 2.

A photo of the cats and the generated image

The Cutest Weather Forecast On E-Ink And ESP32

There’s a famous book that starts: “It is a truth universally acknowledged that a man in possession of a good e-ink display, must be in want of a weather station.” — or something like that, anyway. We’re not English majors. We are, however, major fans of this feline-based e-ink weather display by [Jesse Ward-Bond]. It’s got everything: e-ink, cats, and AI.

Image
The generated image needs a little massaging to look nice on the Spectra6 e-ink display.

AI? Well, it might seem a bit gratuitous for a simple weather display, but [Jesse] wanted something a little more personalized and dynamic than just icons. With that in the design brief, he turned to Google’s Nano Banana API, feeding it the forecast and a description of his cats to automatically generate a cute scene to match the day’s weather.

That turned out to not be enough variety for the old monkey brain, so the superiority of silicon — specifically Gemini–was called upon to write unique daily prompts for Nano Banana using a random style from a list presumably generated by TinyLlama running on a C64. Okay, no, [Jesse] wrote the prompt for Gemini himself. It can’t be LLM’s all the way down, after all. Gemini is also picking the foreground, background, and activity the cats will be doing for maximum neophilia.

Aside from the parts that are obviously on Google servers, this is all integrated in [Jesse]’s Home Assistant server. That server stores the generated image until the ESP32 fetches it. He’s using a reTerminal board from SeedStudio that includes an ESP32-S3 and a Spectra6 colour e-ink display. That display leaves something to be desired in coloration, so on top of dithering the image to match the palette of the display, he’s also got a bit of color-correction in place to make it really pop.

If you’re interested in replicating this feline forecast, [Jesse] has shared the code on GitHub, but it comes with a warning: cuteness isn’t free. That is to say, the tokens for the API calls to generate these images aren’t free; [Jesse] estimates that when the sign-up bonus is used up, it should cost about fourteen cents a pop at current rates. Worth it? That’s a personal choice. Some might prefer saving their pennies and checking the forecast on something more physical, while others might prefer the retro touch only a CRT can provide. 

A graph showing the poisoning success rate of 7B and 13B parameter models

It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

It stands to reason that if you have access to an LLM’s training data, you can influence what’s coming out the other end of the inscrutable AI’s network. The obvious guess is that you’d need some percentage of the overall input, though exactly how much that was — 2%, 1%, or less — was an active research question. New research by Anthropic, the UK AI Security Institute, and the Alan Turing Institute shows it is actually a lot easier to poison the well than that.

We’re talking parts-per-million of poison for large models, because the researchers found that with just 250 carefully-crafted poison pills, they could compromise the output of any size LLM. Now, when we say poison the model, we’re not talking about a total hijacking, at least in this study. The specific backdoor under investigation was getting the model to produce total gibberish.

Continue reading “It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds”

Graph showing accuracy vs model

Why You Shouldn’t Trade Walter Cronkite For An LLM

Has anyone noticed that news stories have gotten shorter and pithier over the past few decades, sometimes seeming like summaries of what you used to peruse? In spite of that, huge numbers of people are relying on large language model (LLM) “AI” tools to get their news in the form of summaries. According to a study by the BBC and European Broadcasting Union, 47% of people find news summaries helpful. Over a third of Britons say they trust LLM summaries, and they probably ought not to, according to the beeb and co.

It’s a problem we’ve discussed before: as OpenAI researchers themselves admit, hallucinations are unavoidable. This more recent BBC-led study took a microscope to LLM summaries in particular, to find out how often and how badly they were tainted by hallucination.

Not all of those errors were considered a big deal, but in 20% of cases (on average) there were “major issues”–though that’s more-or-less independent of which model was being used. If there’s good news here, it’s that those numbers are better than they were when the beeb last performed this exercise earlier in the year. The whole report is worth reading if you’re a toaster-lover interested in the state of the art. (Especially if you want to see if this human-produced summary works better than an LLM-derived one.) If you’re a luddite, by contrast, you can rest easy that your instincts not to trust clanks remains reasonable… for now.

Either way, for the moment, it might be best to restrict the LLM to game dialog, and leave the news to totally-trustworthy humans who never err.

Image

Nanochat Lets You Build Your Own Hackable LLM

Few people know LLMs (Large Language Models) as thoroughly as [Andrej Karpathy], and luckily for us all he expresses that in useful open-source projects. His latest is nanochat, which he bills as a way to create “the best ChatGPT $100 can buy”.

What is it, exactly? nanochat in a minimal and hackable software project — encapsulated in a single speedrun.sh script — for creating a simple ChatGPT clone from scratch, including web interface. The codebase is about 8,000 lines of clean, readable code with minimal dependencies, making every single part of the process accessible to be tampered with.

Image
An accessible, end-to-end codebase for creating a simple ChatGPT clone makes every part of the process hackable.

The $100 is the cost of doing the computational grunt work of creating the model, which takes about 4 hours on a single NVIDIA 8XH100 GPU node. The result is a 1.9 billion parameter micro-model, trained on some 38 billion tokens from an open dataset. This model is, as [Andrej] describes in his announcement on X, a “little ChatGPT clone you can sort of talk to, and which can write stories/poems, answer simple questions.” A walk-through of what that whole process looks like makes it as easy as possible to get started.

Unsurprisingly, a mere $100 doesn’t create a meaningful competitor to modern commercial offerings. However, significant improvements can be had by scaling up the process. A $1,000 version (detailed here) is far more coherent and capable; able to solve simple math or coding problems and take multiple-choice tests.

[Andrej Karpathy]’s work lends itself well to modification and experimentation, and we’re sure this tool will be no exception. His past work includes a method of training a GPT-2 LLM using only pure C code, and years ago we saw his work on a character-based Recurrent Neural Network (mis)used to generate baroque music by cleverly representing MIDI events as text.