Complex Machinery

Subscribe
Archives
June 26, 2025

#039 - Small-batch, hand-crafted code

Hand-waving sometimes works, but sometimes it doesn't. Also: when people and machines write code.

You're reading Complex Machinery, a newsletter about risk, AI, and related topics. (You can also subscribe to get this newsletter in your inbox.)

Orange factory robots in action.  Photo by Simon Kadula on Unsplash.
(Photo by Simon Kadula on Unsplash)

As promised, I'm finally getting around to the Builder.AI debacle. I held off last time in part because I ran out of space, but also to review additional coverage. And while doing something unrelated, I stumbled across some other material which really framed the whole thing nicely.

I'll start with that latter bit:

Tim Harford, Financial Times columnist and author of The Undercover Economist, hosts a podcast called Cautionary Tales. A recent two-parter covered the digging of the Panama Canal – a project that France started, and America completed. Per the name of the show, this was a cautionary tale of big promises and failed delivery.

Harford's storytelling was top-notch, as usual. But this was still a frustrating listen, as it mirrored that of so many companies we've seen over the years. Tech startups in particular. The first episode explained that Ferdinand de Lesseps, riding his wave of success from building the Suez Canal, figured he could hand-wave his way through the Panama dig. He convinced his investors it would work. The Panamanian jungle proved them all wrong.

And that takes us to Builder.AI, née Engineer.AI. I will henceforth abbreviate those names as Builder and Engineer, respectively, because I am too lazy to keep typing the additional three characters. And because saying "AI" that much in one space is just bad luck.

Builder AI (kind of) faked it, but didn't make it

Builder's (Engineer's) claim to fame was AI-driven software development. You know how everyone's excited about genAI code assistants? Builder was early to that game. Their promise was to use AI to improve the speed and lower the cost of app dev.

That all seemed promising until Builder fell apart a few weeks ago. The headlines focused on allegations that Builder had faked some of those "AI-powered" claims. Instead of AI writing code, the story went, the company simply employed teams of developers to do the work. According to a WSJ piece:

Documents reviewed by The Wall Street Journal and several people familiar with the company’s operations, including current and former staff, suggest Engineer.ai doesn’t use AI to assemble code for apps as it claims. They indicated that the company relies on human engineers in India and elsewhere to do most of that work, and that its AI claims are inflated even in light of the fake-it-till-you-make-it mentality common among tech startups.

Even if Engineer/Builder were doing a Soylent Green impression – we don't know for sure; it's still an allegation – that's not quite what took the company out. That WSJ piece dates back to 2019.

That's six years ago for the mathy-types among you.

It's one thing for a wounded startup to keep chugging along, so long as no one knows it's in trouble. But to last for six years after the world has smelled something fishy? That's tough to pull off. Consider that hedge fund LTCM, crypto operation FTX, and energy giant Enron all collapsed just months after news spread that they were hurting.

You might counter that blood-test startup Theranos lasted for two years after WSJ reporter John Carreyou broke the story that something was amiss. Or that payments company Wirecard held on for a whopping five years. But in those cases, the companies engaged in aggressive, protracted legal warfare for their survival. (And in the Wirecard case, there were allegations of … ahem … extralegal tactics as well.) Engineer/Builder was able to brush off the accusations of fakery and keep going, business-as-usual style.

What ultimately took Builder down, then? It was your typical startup bullshittery: they were apparently hemorrhaging cash. And they were able to hide it. Until they weren't. Builder's hand-waving and vibes simply could not outrun the budget shortfalls. (They definitely gave it a go. Bloomberg reports the company was allegedly round-tripping payments to a client to support the illusion of revenue.)

Something old, something new

That leads to three key points about the Builder story:

1/ This isn't about AI. It's about your standard startup tactics. I've noted elsewhere that every fraud is a race against time. (I'm pretty sure I first used this a couple of years ago when I was covering web3 and crypto. Make of that what you will.) Today I'm adapting that to "Fake It Till You Make It is a race against time." If you win, you get accolades and money and all that. If you lose, well, you get trouble.

Business trickery is easier to pull off these days if you're wearing that AI halo, yes. But AI is not required. With the right reputation and enough attitude, investors and customers will be so excited to ride on the rocket you've built – and so afraid you'll kick them off to make room for another passenger – that they won't dare upset you with probing questions. Just ask anyone who threw money at Bernie Madoff. Or at Theranos. Or other such operations.

2/ They were ahead of their time. Sort of. Note that Builder was talking about AI-driven software development in 2019. That's three years before widely-available, public-facing LLMs were A Thing. This is the old-school definition of AI: machine learning, not generative.

Maybe the founders were just riding the excitement of AI. Maybe they genuinely saw potential. Either way, it's hard to deny that today's business world has fallen in love with AI-assisted software development. And Builder was there early.

3/ They were also pretty bad at this game. Related to that previous point, investors and customers are eager to throw money at AI. Especially AI-driven software development. So it's wild that Builder ran out of money. I wonder how they would have fared, had they been able to hold on just a few more months.

Maybe they did the foolish thing of Actually Looking For Real Projects? Maybe. But when you're riding on an emerging-tech hype wave, that can get dicey. Or maybe the founders played the usual startup card of pitching a big vision. You know, deploying a reality distortion field in the hopes that, if they get enough people on-board, things will Just Happen™.

We'll never know. But this does bring us back to that Cautionary Tales two-parter. The first episode detailed the French approach, which mostly involved ignoring experts and hand-waving at any problems. The second episode explains how the American leadership started down a similar road, but eventually listened to the experts and, well, we know the rest.

And look, I get the appeal of hand-waving. There's nothing quite like bringing things into existence through sheer force of belief and repeating your message. But at some point that "fake it till you make it" playbook degrades into "writing checks your ass can't cash."

An even older story

In the previous newsletter I touched on the way genAI code assistants strike a nerve in certain developers' brains. I linked to a piece that compared those tools to cocaine.

To be fair, it's not just the developers who have this addiction. Management gets it, too! They crow about "developer productivity" but then institute policies that bring about anything but. Like, say, demanding strongly encouraging their dev teams to use code assistants. Because, apparently, software dev is really just a game of churning out code.

This problem is as old as time. Remember when companies, desperate to evaluate developer productivity, turned to things like "lines of code" (LOC)? They fell for the old trap of going for metrics that were easy to calculate instead of metrics that were actually useful.

And then there was that big wave of tech outsourcing in the early 2000s. Companies had found cheaper software dev talent overseas (first in India, then Eastern Europe, then various other places) and figured they'd give it a go.

It's Econ 101, really: you swap in an equivalent good at a lower price point, and you fatten your margins.

I'll emphasize "equivalent good" and "lower price."

In one tale of outsourcing woe, I witnessed there were rumors of a company that told its local, senior-level developers to stop writing code. They were to instead write Very Detailed Software Specs for the overseas outsourcing team. So detailed, as it turned out, that the local developers could have written the code themselves in less time. But management pressed them. So they wrote the specs and shipped them off to the overseas dev team.

And then … the code came back. Remember what I said about "equivalent good?" This was definitely not equivalent to what the local team would have built. In fact, so I heard, it was terrible. The company then had to pay their local developers to write the code they'd wanted to write themselves in the first place. And threw in some additional local contract talent to meet deadlines. You know the drill.

There's also the matter of "lower price." The overseas team was cheaper per developer hour, yes. But management's calculations didn't account for the bigger picture. When you take the cost of the local developers writing specs, plus the outsourced work, then add the local dev team's rewrite, plus the bonus contract labor, you wind up with … the same end-product, at a multiple of the original, local-developer price tag.

People are generally terrible at estimating the total cost of something. But even with that framing, this was pretty bad. And from what I've heard from industry peers, this tale of woe was hardly a one-off.

Could a similar story play out with genAI code assistants? Perhaps! You pay your local developers to write detailed specs (prompts), then pay for the outsource team (the LLM), then pay for your developers to clean up what the outsource team provides. The LLM costs less than an offshore developer, sure. And maybe the cleanup isn't as bad as a full rewrite. But when you sit down to calculate the total cost, how often do you really win out?

There are many like it, but is this one mine?

I'll close out with an excerpt from that WSJ piece on Engineer/Builder (emphasis added):

Engineer.ai says its "human-assisted Al" allows anyone to create a mobile app by clicking through a menu on its website. Users can then choose existing apps similar to their idea, such as Uber's or Facebook's. [...]

"We've built software and an AI called Natasha that allows anyone to build custom software like ordering pizza," Engineer.ai founder Sachin Dev Duggal said in an onstage interview in India last year. Since much of the code underpinning popular apps is similar, the company's "human-assisted AI" can help assemble new ones automatically, he said. Roughly 82% of an app the company had recently developed "was built autonomously, in the first hour" by Engineer.ai's technology, Mr. Duggal said at the time.

That part in bold goes beyond "popular apps." These days, professional software developers use the same programming languages (Java, Python, Ruby, whatever) and the same third-party libraries (Pandas, Torch, Rails). Language syntax and best practices shape how developers write. Call semantics of the libraries and toolkits influence what developers write. Combined, that limits the ways developers will express common operations such as "connect to a database" or "display this table." And frankly, most of those operations are variants of the samples in the language or library documentation.

Point being: while there's certainly unique, game-changing intellectual property (IP) out there, it makes for a small portion of your typical app's codebase. The code makes your app run, but it might not make it special.

Experienced developers have known this for ages. The terms of service (TOS) agreements for AI-based code assistance tools make this even clearer. I hope company leadership and legal departments take note. It's possible that AI code assistants will make their mark on company valuations as well as software IP litigation. Like when the plaintiff cries theft, but the defendant just points to their prompts: "Sorry, bro, I didn't steal your code. I just asked Copilot to write me the backend for a social media app …"

In other news …

  • Going to my usual line of "the rise of algorithmic trading can tell us a lot about where AI is headed," here's an example of bot-on-bot combat in a commercial setting. (Technology Review)
  • French music streamer Deezer reports a high percentage of AI-based fraudulent listens. (The Guardian)
  • Mattel prepares Barbie for an AI boost. (Financial Times)
  • Software tools for creatives embrace genAI. (Les Echos 🇫🇷)
  • A consulting firm has reworked its project structures for a genAI world. (Insider)
  • That Kalshi ad that ran during the NBA finals? It cost just two grand to generate. (Insider)
  • A couple of issues back I mentioned the "grief tech" industry. Here's a detailed look at one family's story. (New York Times)
  • Amsterdam tried to build a bias-free AI system. It did not pan out. Consider this your period reminder that Really Really Hoping That AI Works will not, in fact, make AI work. (Technology Review)
  • After years of bragging about large headcount, tech startups are now shifting to a "revenue per employee" metric. No word on whether they'll eventually settle on plain old "actual revenue, without accounting shenanigans." (Bloomberg)
  • You know how I keep saying that genAI is not a search substitute? WhatsApp's genAI assistant offers the latest example. (The Guardian)
  • The company behind Applebee's and IHOP is bringing genAI to their restaurant game. Some of the use cases seem to call for plain old ML/AI, but what do I know? (WSJ)
  • Meta's AI app really, really likes to share. The world isn't ready for that. (TechCrunch)

The wrap-up

This was an issue of Complex Machinery.

Reading online? You can subscribe to get this newsletter in your inbox every time it is published.

Who’s behind Complex Machinery? I'm Q McCallum. I think a lot about AI and risk, which I write about here.

Disclaimer: This newsletter does not constitute professional advice.

Don't miss what's next. Subscribe to Complex Machinery:
Bluesky