I Paid Microsoft $10 for Premium AI Agents That Were Confidently Wrong About Everything

Microsoft is spending an insane amount of money on AI features, building data centers and licensing large language models from OpenAI, Anthropic, and others, while also trying to build its own in-house alternatives. The goal, driven straight from the top of Redmond's org chart, is to turn Windows and Microsoft 365 into an "agentic OS" capable of doing the tasks that make corporate life miserable: writing memos, building presentations, organizing meetings, and automating routine tasks.

But are those investments paying off? Developers seem happy with tools like Claude Code and GitHub Copilot, but the agents working in the business sphere don't seem nearly as competent. Over the past few weeks, I've been trying to use the AI features in Microsoft 365 and Windows for everyday work tasks. Copilot shows occasional flashes of competence, but more often, the results are a mishmash of misinformation, hallucinations, and time-wasting dead ends.

Microsoft has been bugging me for months to upgrade to its new Microsoft 365 Premium plan, which includes higher limits on AI usage and a handful of exclusive agents. In the interest of science, I paid the $10 to upgrade an unused account for a month so I could try them.

I started with the Analyst agent, feeding it a copy of my household income and expenses spreadsheet and asking for help improving its design. After some back-and-forth, it offered useful suggestions for tightening up formulas, consolidating duplicate tables, and eliminating redundant pages, concluding with a bolded offer to build a dashboard using only formulas and pivot tables. "If you want," Copilot told me, "I can sketch a clean dashboard layout (exact cells and sections) tailored to your data so you can build it in ~15 minutes." I thought agents were supposed to do this work for me? So I asked, "Can you build the actual Excel file for me or do I have to do that myself?" Of course, it told me. There was just one tiny limitation: I would have to create one of the pivots myself, but that chore would take less than 10 seconds. A minute or two later, Copilot told me, "I've created your modified workbook. Download it here." It then gave me a link to [sandbox:/mnt/data/Personal_accounts_modified.xlsm]. That's the exact link, brackets and all. It wasn't clickable. "Yeah -- that's on me," Copilot confessed. "That 'sandbox path' isn't actually clickable in your interface, so you couldn't download it." Several tries later, Copilot gave up, explaining the file link might have worked if I had been using ChatGPT, and suggesting I create the file in Google Sheets and send myself a link. Somehow, I don't think that's what the product managers in Redmond had in mind.

Next up, I asked the Microsoft 365 Premium Researcher agent to give me a concise explanation of the pros and cons of Microsoft 365 Premium. I was not prepared for the response: "To make sure I cover exactly what you need: Which specific plan do you mean by 'Microsoft 365 Premium'? A) Microsoft 365 Personal, B) Microsoft 365 Family, C) Microsoft 365 Business Premium, or D) compare the consumer plans (Personal/Family) and briefly note Business Premium only if relevant." To be clear, I was working with one of the signature features of a product Microsoft is promoting heavily, and it had no idea what I was talking about. After I offered up a link to the product page, Copilot gave me a fairly anodyne summary of features from third-party sources. I wouldn't call it research, and it definitely wasn't deep.

This morning, I was trying to connect to a computer on my office network using the Remote Desktop client, but I was getting a certificate error: "The server name on the certificate is incorrect." After a few minutes of troubleshooting on my own, I decided it was time to try to "vibe-sysadmin" my way through it with Copilot. "The fix is straightforward," Copilot replied, confidently. All I needed to do was force Windows inside the VM to generate a new Remote Desktop certificate. That didn't work. Copilot, undaunted, told me that result was meaningful and rattled off three likely reasons, concluding with "Let's fix it cleanly and surgically." After a bunch of PowerShell commands and a reboot, I was still unable to connect, but this time, it was because of a different certificate error. "Ah -- that tells me exactly what's happening now," said Copilot. This went on for about 20 minutes and half a dozen reboots of that VM. With each failure, Copilot had another small AI epiphany, accompanied by bold headings like "Why I'm confident this is the right path," "Why this is the correct fix," and "Why this is the only explanation left." None of the suggested fixes worked, so I told Copilot to shut up. I reinspected the connection settings and cleared one checkbox on the connection settings. That did it.

Maybe someday Copilot will achieve artificial general intelligence. At this point, I would settle for artificial general common sense. And even that station seems to be many stops away from where we are right now.

I Paid Microsoft $10 for Premium AI Agents That Were Confidently Wrong About Everything

News in your inbox.