#031 - Deep secrets
LLMs remind us of a special kind of model error: inappropriate answers. Here are three lessons from a chatbot that knows when to clam up.
You're reading Complex Machinery, a newsletter about risk, AI, and related topics. (You can also subscribe to get this newsletter in your inbox.)

Mum's the word
Every model – from predictive ML/AI and genAI, to linear regression and GARCH time series modeling – is subject to model error. That's a fancy term for "um yeh that was the wrong answer." It's not an isolated annoyance, either. Since model outputs feed into downstream processes and decisions, model error can be a source of companywide risk.
The most common form of model error is when the output is factually incorrect: "our classifier said this article is about cats, but it's about dogs" or "the predicted price for this house was pretty far from reality." This type of model error is not always easy to fix, but it's easy to spot.
Generative models remind us that there's another, more nuanced flavor: when the model output is inappropriate. Factually incorrect statements still count here, but it's mostly about true statements that you find uncomfortable. We saw this a few months back, when Microsoft's Copilot AI refused to answer queries about the then-upcoming US presidential election. A more recent example involves DeepSeek's chatbot catching itself:
Then it explained that in democratic frameworks free speech needed to be protected from societal threats and “in China, the primary threat is the state itself which actively suppresses dissent”. Perhaps unsurprisingly it didn’t get any further along this tack because everything it had said up to that point was instantly erased. In its place came a new message: “Sorry, I’m not sure how to approach this type of question yet. Let’s chat about math, coding and logic problems instead!”
“In the middle of the sentence it cut itself,” Salvador said. “It was very abrupt. It’s impressive: it is censoring in real time.”
This isn't isolated to DeepSeek, either. The Chinese government requires all AI companies to keep their chatbots in check. As a Financial Times article noted a few months back:
The filtering begins with weeding out problematic information from training data and building a database of sensitive keywords. China’s operational guidance to AI companies published in February [2024] says AI groups need to collect thousands of sensitive keywords and questions that violate “core socialist values”, such as “inciting the subversion of state power” or “undermining national unity”. The sensitive keywords are supposed to be updated weekly.
The result is visible to users of China’s AI chatbots. Queries around sensitive topics such as what happened on June 4 1989 — the date of the Tiananmen Square massacre — or whether Xi looks like Winnie the Pooh, an internet meme, are rejected by most Chinese chatbots. Baidu’s Ernie chatbot tells users to “try a different question” while Alibaba’s Tongyi Qianwen responds: “I have not yet learned how to answer this question. I will keep studying to better serve you.”
You can learn from anyone
Why am I sharing this? Because, in the middle of a discussion about government censorship and AI bots playing revisionist historians, there are three lessons. The first is:
1/ You have to work to keep your bot on the rails.
You can't treat genAI safety – however you define that term – as a bolted-on afterthought. Nor can you release a chatbot without any protections and simply hope it does what you want. Employing this technology requires rigorous, continuous R&D and testing.
A complicating factor is that genAI bots are in their Wild West era. There aren't a lot of established, formally-documented practices. That leads to the second lesson:
2/ You have to take your guidance where you can get it.
That includes adapting questionable CCP practices to your use case. Erasing historical everts? Bad. No doubt about it. Making sure your bot doesn't serve up confidential data or embarrassing statements? Table stakes for a commercial enterprise.
If you're uncomfortable taking the CCP's approach to walling off your genAI bots, there are other ideas out there. Anthropic, for example, uses LLMs to generate synthetic variants of troublesome prompts – keeping their model defenses one step ahead of creative attackers. I'll also remind everyone that smaller, more focused models are safer than their generalist cousins. If your LLM has only seen data about, say, 20th-century finance, it's unlikely a bad actor will coax out a recipe for napalm. Emphasis on "unlikely."
(For a twist on that idea, consider how DeepSeek built its system on a "mixture of experts." Each expert was a smaller model built to a specific purpose.)
The first two lessons form the basis of the third:
3/ There is no perfect fence around an LLM.
Hacks and corner cases are everywhere. People keep coming up with new ways to coax your bot into mischief. Like, say, using leetspeak to bypass your filters.
The best you can do is remain vigilant, and draw lessons from others' challenges before you experience them firsthand.
With strings attached
The company behind the Humane AI pin is shutting down. Their three-act play began with a hype-fueled rise to fame, then encountered a change of fortune in the middle, and ended in an acquisition by HP.
There's a lot to say here. I could talk about how startups overstate a product's capabilities to build buzz. Or I could remind everyone that creating a hardware product is never easy. I could do that. But I won't.
Instead I'll focus on an underappreciated risk: buying devices that come with strings attached. And by "strings" I mean "an umbilical cord." Such devices are specialized thin clients; you aren't so much purchasing the hardware as you are buying an access pass to a service. The utility you derive from such a device is tied to the company's continued existence.
HP seems more interested in Humane's tech talent than the product – it's your typical acqui-hire situation – so the pin's backing services will shut down and the devices will become fancy paperweights. (Adding insult to injury, Humane argues that certain functionality will still work. Like, say, checking the battery level. Thanks?)
Humane is not alone in this regard. In December the company behind the Moxie robot went under, leaving parents with an $800 brick and some very difficult discussions with their kids. Then we have every so-called "smart" device that goes offline whenever there's a cloud glitch. (Such incidents are why laypeople are familiar with terms like "AWS outage" and "us-east-1.")
Consumers need to get better at sussing out a company's prospects for survival. Device companies need to make products that last. Both can borrow an idea from open-source software:
Hedging longevity
Early in my career I learned how to suss out open-source projects. It wasn't enough that the code implemented a certain functionality. We also wanted some assurance that the projects would be around a while, because we were building our apps on top of that code.
Back then you could treat an active codebase and a vibrant community as signs of longevity. That usually worked! Your hedge was that you had the source code – in the event the project went bust, you could maintain it yourself for a while. Hardly an ideal situation, but a workable one.
Maybe device companies could offer a similar insurance policy? "If we go under, the source code gets released from escrow and customers can self-host the backend services." (Or new hosting providers can crop up. Or whatever.)
This is clearly fantasy on my part. Companies love to claim that the code is their secret sauce and a valuable asset to an acquirer, so there's little chance they would release it as part of a wind-down. But If enough people start asking that damning question – "what happens to this device if you go bust?" – maybe we'll get some kind of middle ground. Perhaps investors would demand (realistic!) contingency plans of how the devices will continue to function some N months after the company folds.
Companies could also build devices that operate independently of the mother ship. And when it comes to AI, that means running models on-device rather than phoning home. That would mean selling more powerful hardware and running weaker models – a higher device cost with lower model performance – but buyers would find the longevity more attractive.
One year in
It's been a year! Sort of.
Complex Machinery launched it in private beta in January 2024. I opened it to the public a month later, which makes this the one-year anniversary of Complex Machinery being generally available.
(I didn't intend for this newsletter to become the twice-monthly look at How GenAI Has Goofed but … hey, genAI insists on providing me with source material. So here we are.)
To longtime subscribers: thanks for sticking with me. Double thanks to those who have shared Complex Machinery with friends and colleagues.
To new subscribers: why not browse the archives? I expect some of the posts will prove evergreen. Including:
- Accepting that genAI is still a wild animal at its core. We may build newer and better fences, but the animal I call The Random™ will still break out. This is a key source of risk in AI, and I'm
surpriseddisappointed that it's not taken more seriously. - Asking whether AI is in a bubble. Or, more importantly, asking why it matters whether AI is in a bubble.
- My thoughts on the CrowdStrike incident (part 1, part 2). This matter did not involve AI. But the way companies keep cramming AI into every process, it's only a matter of time before we get an AI twist on this risk cascade.
- The looming AI debt wall. AI keeps writing big checks; will the world ever be able to cash them?
- Finding AI's place in the workforce. AI is great at some tasks and terrible at others. How do we spot the suitable use cases?
In other news …
- A genAI model trained on buggy code samples emits some rather disturbing statements. We all know that models aren't sentient, so they don't really "hold beliefs" or even "say" anything. Still, what this model emits is pretty wild. (Ars Technica)
- The headline for this piece says it so much better than I ever could, so I'm leaving it here verbatim: "Guy Who Ruined Buzzfeed With AI Now Says AI Is Bad, Launches New AI Platform" (Gizmodo)
- Apple Intelligence is back on its notification game. Sort of. I'll have more to say about this next time. (TechCrunch)
- Front Porch Forum is a small social site based in Vermont. They don't use AI for content moderation. Which is precisely why companies dreaming of AI-based content moderation should read this article and have another think. (Le Monde 🇫🇷)
- Speaking of AI-based content moderation, Meta claims a bug pushed violent videos into users' Instagram feeds. (Der Spiegel 🇩🇪, CNBC)
- Baseball is getting robot umpires. I know little about baseball, but I know a lot about what happens when technology enters a field. This could get … interesting. (WSJ)
- British musicians released an album of silence to protest use of their work in AI training. (Les Echos 🇫🇷, The Guardian)
The wrap-up
This was an issue of Complex Machinery.
Reading online? You can subscribe to get this newsletter in your inbox every time it is published.
Who’s behind Complex Machinery? I'm Q McCallum. I think a lot about AI and risk, which I write about here.
Disclaimer: This newsletter does not constitute professional advice.