#020 - Silence is golden

You're reading **Complex Machinery,** a newsletter about risk, AI, and related topics. (You can also subscribe to get this newsletter in your inbox.)

                November 1, 2024

            #020 - Silence is golden

                Sometimes a bot's best bet is to say nothing.

            You're reading Complex Machinery, a newsletter about risk, AI, and related topics. (You can also subscribe to get this newsletter in your inbox.)

            (Photo by Marcus Reubenstein on Unsplash)

Let's change the subject
I've seen reports that Copilot, Microsoft's AI chatbot, refuses to answer questions about the upcoming US presidential election.  This Bluesky post from Andy Baio includes a screencap of the bot's generic response:

I know elections are important to talk about, and I wish we could, but there's a lot of nuanced information that l'm not equipped to handle right now. It's best that I step aside on this one and suggest that you visit a trusted source.
How about another topic instead?

Not everyone is happy about this.  They figure a fancy LLM should be able to handle simple questions.  But keeping mum in this case is actually … a great idea?   
Hear me out. 
Imagine you're a product manager at Microsoft. Your bosses are pushing you to release a genAI chatbot because chatbots are cool.  And by "cool" they mean "a way to gain market share."
You, on the other hand, see warning signs.  You know that a chatbot's entire job description is Just Make Some Shit Up.  (It's in the name: everything that comes out of a genAI bot is, well, generated.  We only apply the label "hallucinations" to the generated artifacts that we don't like.)  It's making everything up based on patterns surfaced in its training data, yes.  But those are all grammatical patterns. Not factual. Not logical. Just this-word-is-likely-followed-by-that-word. And that kind of creative whimsy is unfit for sensitive topics.
Your bosses' desire for a chatbot is running headlong into their mission to boost Shareholder Value™.  Not to mention your mission of job security.  Hmm. 
What if you could protect the bot from its own naïveté?  It can't give an inappropriate answer if it never sees the question!  So you filter the bot's inputs, and anyone who wanders into the danger zone will get a flat-yet-folksy twist on "I'm afraid I can't do that, Dave."  
There you go.  Your bosses get their chatbot, and you avoid a PR fiasco because the bot won't, say, give people the wrong information about voting rights issues or whatever.  Shareholder Value™ is safe.  For now.
"But what if," your bosses ask you, "people get mad because the bot won't answer their question?"
To which you say:
"Why would anyone ask a probabilistic bullshit artist for a factual answer?"
(The reason: "Because companies keep pushing genAI bots as a replacement for search."  So your statement is hardly a mic drop moment.  But that's a story for another day.)
I admit that I'm biased.  Microsoft's Filter The Bot approach aligns with LLM safety guidelines I've written elsewhere.  Some of those ideas stem from age-old legal wisdom: it's hard to get in trouble for something you didn't say. 
If your company runs a public-facing, general-purpose LLM,  do yourself a favor and filter troublesome inputs.  This will prevent the bot from saying something stupid while wearing your corporate logo.  For bonus points you can build a focused, domain-specific model to further improve your odds.  
Sum total: if you like to sleep easy, be mindful of letting a genAI bot answer factual questions.
Steal this idea
Related to the topic of LLM safeguards, Google Cloud has updated its terms of service with a section on genAI:

4.3 Generative AI Safety and Abuse. Google uses automated safety tools to detect abuse of Generative AI Services. Notwithstanding the “Handling of Prompts and Generated Output” section in the Service Specific Terms, if these tools detect potential abuse or violations of Google’s AUP or Prohibited Use Policy, Google may log Customer prompts solely for the purpose of reviewing and determining whether a violation has occurred. See the Abuse Monitoring documentation page for more information about how logging prompts impacts Customer’s use of the Services.

I don't mean to tell you how to do your job – remember, this newsletter does not constitute professional advice – but if you provide any kind of outward-facing genAI service, you may want to pass this to your legal team.  Throw in Google Cloud's Abuse Monitoring policy, too.  They might be able to borrow some ideas. 
Bad actors are already poking at your system.  It's only a matter of time before regulators, customers, and your internal tech teams will want access to end-users' prompts and the model's outputs.  Why not establish those policies now?
Whispers of hallucinations
In the previous newsletter I talked about automation's impact on cost savings.  If a self-service kiosk – a touchscreen terminal just a step shy of an old-school arcade machine – cuts  airport check-in costs by 96%, what potential could AI hold for unlocking business efficiencies?
I stand by that observation, but I forgot to add an asterisk: not every cost-cutting use of AI is a good one. 
Take the example of hospitals using OpenAI's Whisper tool for transcribing patient consultations.  The problem is … well, it's precisely the problem you'd expect:

But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

Hmmm.
I don't (completely) blame this on OpenAI.  Whisper's terms of service agreement (TOS) forbids its use in such sensitive contexts.  And while individual end-users will speed-scroll through a TOS, it's not unreasonable to expect a hospital's legal and procurement departments to go through that with a fine-toothed comb.  We can file this under "the hospital should have known better."  
(Or perhaps under "employees bypassed the procurement department" – an example of AI slipping in through the side door – in which case the hospitals have a different problem.)
The take-home lesson is that using an AI tool may create more problems than it solves.  Right?
Not so much.
The lesson is that even if you don't pass your sensitive information to a chatbot, someone might do that for you.  
Investing in the next round
Bubbles sometimes leave behind supporting infrastructure when they collapse.  The late-1800s railroad bubble bequeathed mile upon mile of usable track.  Software development and open-source tooling weren't exactly new in the 1990s, but their strong Dot Com-era growth outlasted the internet mania.  That powered the next stages of web technology.   (If memory serves, the collapse also left a ton of internet connectivity.  But don't quote me on that.)
While it's too early to call Whatever Is Happening With AI a "bubble" – technically we can't use that word until after the fact – I do expect a market correction at some point.  I've spent the last couple of years wondering what shape that correction will take, and how the practitioners, tools and hardware will find new roles.
Do you know what was not on my post-AI-correction bingo card?  Power infrastructure. 
Large tech providers are investing in new power sources and improved delivery systems to drive their AI efforts.  (I initially read about this in Die Zeit.  MSNBC offers similar coverage in English.)  If these companies achieve even a fraction of that dream before the AI market turns, they could leave a lot of usable electricity infrastructure for other pursuits.   Some say this search for datacenter power may also boost clean energy initiatives. 
Will that be a fair trade for all of the nonsense coming out of chatbots?  Time will tell.
In other news …

Yet another tale of Strava data gone wrong.  This is our periodic reminder that "social app" is synonymous with "package up your personal info for misuse."  (Le Monde 🇫🇷)
Not even Microsoft employees like the new version of Copilot.  (Insider)
Does anybody really need an AI bot to summarize their Google Chat conversations?  (The Verge)
There's a … bit of a coordinated effort on Reddit to dupe AI bots.  (Ars Technica)
A radio station with AI hosts, and scripts (mostly) written by AI.  What could possibly go wrong? (The Register)
We mostly hear about what Hollywood doesn't like about AI (and with good reason).  But some filmmakers see potential for the technology beyond the creepy misuse of actors' likenesses.  (New York Times)

            The wrap-up
This was an issue of Complex Machinery. 
Reading online? You can subscribe to get this newsletter in your inbox every time it is published. 
Who’s behind Complex Machinery? I'm Q McCallum. I think a lot about AI and risk, which I write about here. 
Disclaimer: This newsletter does not constitute professional advice.

Don't miss what's next. Subscribe to Complex Machinery: