In February, this reporter picked up a flyer at an anti-AI march in London. It read, in what may or may not have been a deliberate homage to South Park's underpants gnomes: "Step 1: Grow a digital super mind. Step 2: ? Step 3: ?" Produced by Pause AI, an activist group that co-organized the protest, the flyer ended with a plea: "Pause AI until we know what the hell Step 2 is."

The reference, for the uninitiated, is to the 1998 South Park episode in which Kenny, Kyle, Cartman, and Stan discover gnomes whose business plan is "Phase 1: Collect underpants. Phase 2: ? Phase 3: Profit." The meme has since been used to satirize everything from startup strategies to Elon Musk's Mars mission funding plan. Right now, it captures the state of AI perfectly: Companies have built the tech (Step 1) and promised transformation (Step 3). How they get there remains a giant question mark.

Pause AI believes Step 2 must involve regulation, though exactly what that looks like and who enforces it is up for debate. AI boosters, meanwhile, are convinced Step 3 is salvation and tend to skip over the middle bit entirely. OpenAI's chief scientist Jakub Pachocki described AI to me as an "economically transformative technology," with the sunny uplands apparently just over the horizon. But everyone's taking a different route, and it's anyone's guess who'll make it.

For every grand claim about the future, there's a sobering reality check. Consider two recent studies. One from Anthropic predicted which jobs LLMs will affect most - managers, architects, and media types should brace for change; groundskeepers, construction workers, and hospitality folks, not so much. But these predictions are really just guesses based on what LLMs seem good at, not how they actually perform in the workplace.

Another study from February by researchers at Mercor, an AI hiring startup, tested several AI agents powered by top-tier models from OpenAI, Anthropic, and Google DeepMind on 480 workplace tasks routinely done by human bankers, consultants, and lawyers. Every agent failed to complete most of its duties.

Why such wide disagreement? For starters, consider who's making the claims and why - Anthropic has skin in the game. Most people telling us something big is about to happen base that on how fast AI coding tools are improving. But not all tasks can be hacked with coding. Other studies find LLMs are bad at strategic judgment calls.

What's more, tools aren't dropped into cleanrooms. They must work in places contaminated with people and existing workflows, and sometimes adding AI makes things worse. Sure, maybe those workflows need to be torn up and refashioned around the new technology, but that takes time and guts.

That big hole? It's right where Step 2 should be. The lack of agreement on what's about to happen and how creates an information vacuum filled by the latest wild claim of the week, evidence be damned. We're so unmoored from any real understanding that a single social media post can shake markets.

We need fewer guesses and more evidence. That requires transparency from model makers, coordination between researchers and businesses, and new ways to evaluate this technology in the real world. The tech industry - and with it the world's economy - rests on the promise that AI will be transformative. But that's not yet a sure bet. Next time you hear bold claims, remember: most businesses are still figuring out what to do with their underpants.