#004 - When your car talks to everyone but you
You're reading Complex Machinery, a newsletter about risk, AI, and related topics. (You can also subscribe to get this newsletter in your inbox.)
GM has been stress-testing the adage "there's no such thing as bad press." Some of their cars, you see, have been talking behind owners' backs. And now everyone's talking about GM. But not in a good way.
The surface story is a cautionary tale about data privacy, informed consent, and sneaky business dealings. Dig a little deeper, and it doubles as a lesson in three data-related risks.
It all starts with auto insurance:
Overcoming asymmetry
Insurance is like trading: you place bets on future outcomes, and you get paid for being right. Both fields perform tons of data analysis to spot the best bets to place.
Insurance has a couple of extra hurdles that you don't get in trading, though:
1/ Your counterparty – the insured person – has a ton of influence on the outcome of the bet. "Oh, you think I won't get into a car accident? Watch this!" Sure, you can raise their rates at the next policy renewal. But that's only after you've lost the bet, having paid out a claim.
2/ You only get information when a claim is filed. Until then, your data is incomplete. You have no visibility into the almosts and the close shaves. As far as your fancy models are concerned, the genuinely safe driver and the lucky daredevil look the same. Until it's too late.
To close off that information asymmetry, insurers offer discounts for you to strap a tracking device onto your vehicle. That gives them more fine-grained data on your day-to-day activities, letting them adjust your rates even if you don't file a claim.
But that only works when people are willing to use the tracking device, and if they behave normally even though they're being watched. That's an unlikely combination. What if insurers could address hurdle #2 by observing drivers' real, unvarnished behavior, and do so at scale?
Connected cars make this possible. A modern car is a small server room on wheels. Details that were once visible only to the driver and their mechanic – speed, braking habits, mileage, location – are now hoovered up by an array of onboard computers. And with network connectivity, the car can send that data back to the mothership, in near-real-time, for analysis.
(A couple of you will raise an eyebrow at that, so I'll answer it here: yes, my very-long-overdue article on Why Cars Are The New Smartphone continues to be long-overdue. Carry on.)
Manufacturers can analyze this data to improve their next-generation vehicles and proactively detect failures. That's great! They can also sell the data to other companies. That's not so great.
New York Times reporter Kashmir Hill broke the story on GM doing the not-so-great bit: LexisNexis Risk Solutions (LNRS – a group that does data analysis and modeling for insurers) purchased GM's connected-car data. They used it to develop risk models and other data products, which they sold to insurers, who raised those drivers' rates. Sometimes by a wide margin.
Risk 1: A weak link in the data supply chain
The general wisdom when building an AI model (or other data product) is to be mindful of how you source your training data. Quoting attorney Shane Glynn, in an article the two of us coauthored with Chris Butler:
The safest course of action is also the slowest and most expensive: obtain your training data [...] under an explicit license for use as training data. The next best approach is to use existing data collected under broad licensing rights that include use as training data even if that use was not the explicit purpose of the collection.
[...]
You don’t want to invest your time, effort, and money building models only to later realize that you can’t use them, or that using them will be much more expensive than anticipated because of unexpected licenses or royalty payments.
GM sold the data to a broker which in turn sold it to LNRS. (Technically, that broker was LNRS's parent company, LexisNexis, but I digress.) LNRS can, in a way, argue that they did the right thing. They weren't picking up a duffel bag of purloined data in some dark-alley deal; they were doing business with an established data broker, in a fancy office. LNRS had good (well, "plausible") reason to believe they were in the clear.
Sort of.
LNRS clearly didn't ask enough questions about how GM had acquired that data. When anyone upstream in your data supply chain is on shaky legal ground, so are you! So if your supplier picked up the duffel bag of purloined data in a dark-alley deal, then transferred it to a gold-encrusted box to deliver to you … it's still purloined data. And thanks to legal action or maybe just bad press, you might wind up having built a data product that you're not permitted to use.
Risk 2: You are the company you keep
Closely related to the previous point is reputation risk.
GM, quite literally, failed the newspaper test: their actions made front-page news and it didn't go over well with the public.
That's bad enough for GM, but at least that was GM's choice to expose itself to that reputation risk. It's worse for LNRS. Even if they genuinely believed that GM had sourced the data through clear, informed consent of vehicle owners – unlikely, but for the sake of brevity let's go with that for now – the data analysis group is now caught up in GM's wave of bad press. Their names are side-by-side in articles on the topic. Probably not the PR splash that LNRS wanted, but it's the one they've got.
Risk 3: Your data being used against you
The GM/LNRS incident highlights a special risk for consumers. It's not the old and tired (but still creepy and wholly inappropriate) case of companies dealing in our private behavioral data to market to us. It's an example of companies using that data against us.
A particularly nasty aspect of the LNRS models is that they wear the halo of being data (cold, dispassionate facts) while still relying on subjective interpretations of that data. Someone inside LNRS came up with the boundary conditions for "hard braking," of which drivers were unaware but insurers used to raise their rates. For the insurers, it's a subtle form of abdicating responsibility to a black box you've created. Not unlike AI-based sentencing tools and other Kafkaesque "computer says no" scenarios.
But it's not so bad, is it? At least it's not some "social credit" system, like the controversial one running in China, right? The answer – and the problem – is … well … we don't know.
Given the widespread data collection that we already knew about (thanks, invasive ad targeting!), and given that the GM story shed light on data collection we didn't know about, it's fair to ask: what other companies are engaging in adversarial use of personal data, just keeping a lower profile? We might not find out until they slip.
The end?
Shortly after that first New York Times article made the rounds, drivers started to file lawsuits against GM.
The plus side? GM says they've stopped sharing data. The minus side? They did so because the press coverage shamed them into doing the right thing.
And since "public shaming" is not the same as "actual regulation," how soon till GM does this again?
Fighting back against the long train
If I can end on a somewhat positive note, it helps to remember that "adversarial use of data" goes both ways.
UberEats courier Armin Samii had a hunch that he was not being properly compensated for his work. So he built an app to collect data from his UberEats delivery receipts. He then shared that app with other couriers, so they could check their own numbers.
A recent Financial Times article brought Samii's story to light. Other outlets have picked it up since then, but the original is well worth a read if you have an FT subscription. That article goes beyond Samii's UberEats tale to explore what it means to be a gig worker, someone whose tasks and pay are at the whim of AI models built by a distant, faceless company that clearly lacks on-the-ground knowledge of where and how the work takes place. Most importantly, the FT piece mentions Samii's difficulty in reaching a human being at UberEats when he reported other problems that were getting lost in the cracks. Like, say, how their face-scanning verification app couldn't handle his beard.
All of that leads back to the "Long Train Effect" that I mentioned at the end of the previous newsletter. As companies cede decision authority to AI models, and as those models experience shrinking human oversight, expect to see more cases like Samii's: people turning data back at problematic AI systems.
In other news …
- In case you needed a reminder. "Every scary thing Meta knows about me — and you" (The Times UK)
- Whether it's my phrasing of "dull, repetitive, predictable" or the military's version of "dull, dirty, dangerous" there are times we're better off with a robot doing the work. "Mercedes is trialing humanoid robots for ‘low skill, repetitive’ tasks" (The Verge)
- As much as I've talked about the need to test and red-team your AI chatbots … I admit that "use ASCII art to bypass safeguards" was not on my AI Safety Bingo Card. "ASCII art elicits harmful responses from 5 major AI chatbots" (Ars Technica)
- As it turns out, AI can also get you in trouble when you're not even using it. Or perhaps because you're not using it. "SEC Settles With Two Investment Advisers Over Alleged ‘AI Washing’" (WSJ)
- A report, commissioned by the French government, urges the country to boost its investment in AI. "La France appelée à tripler ses investissements dans l'intelligence artificielle" (Les Echos)
The wrap-up
This was an issue of Complex Machinery.
Reading online? You can subscribe to get this newsletter in your inbox every time it is published.
Who’s behind Complex Machinery? I'm Q McCallum. I think a lot about AI and risk, which I write about here.
Disclaimer: This newsletter does not constitute professional advice.