AI continues to be touted as a tool to replace humans at different jobs. Apparently, software engineering is also not one of those jobs.
One of the things I keep being told is that, no really, AI is going to replace humans at virtually every job that exists. It’s not only better at the job, but it’s more efficient and, well, you’ll see. Well, we’re seeing a lot and it is showing that AI is indeed not replacing people’s jobs any time soon. If anything, we see proof time and time again that anyone who jumps to try and replace people with AI only end up with egg on their face… or worse. This is because AI lands squarely in the category of unproven technology.
Still, that’s not stopping some business leaders from making the jump anyway because, hey, hype can’t be all that bad, right? So, we see hilarity ensue anyway as a result. This includes the lawyers that tried to use ChatGPT to write their legal briefs only to have a judge question them after as to why the lawyer was using fabricated caselaw. Then there was DoNotPay that was ultimately forced to shut down after also being really bad at its job of writing lawsuits.
Then there was the world of journalism which was filled with even more hilarity. There was the case of CNET replacing their journalists with AI only to have it backfire quite hard. Gannet tried the same thing, only concealing the fact that the material was written by AI only to watch the whole thing backfire harder.
There was also the case where one company came up with a “revolutionary” AI where it was predicting that the Switch 2 was coming out in September of… 2024. Yeah, that was an epic fail. Then there was the AI media outlet that falsely claimed that a DA was charged with murder. Yeah, that was a bad one.
Then there were efforts by companies to create an AI to summarize content instead of writing it. As it turns out, AI sucks at that too. There was the case of Apple Intelligence falsely stating that Luigi Mangione had shot himself. On top of that was Google’s AI Overview which recommended that users eat rocks or use glue to keep the cheese from sliding off pizza.
Yet, despite all of that, there are people who continue to insist that AI is this magical technology that is foolproof and that it’s going to be replacing humans at jobs. So, maybe there is no amount of evidence to the contrary that will cause people to change their minds. Still, that won’t stop us from pointing out the failarity that comes with people adopting AI thinking that it’s going to be this great big revolutionary move. This latest example has to do with software engineering. A tool by Cognition AI called “Devin” was supposed to be the software engineer that will output code for companies. The tool was put to the test and it turned out a success rate of a whopping… 15%. Oops. From the Register:
A service described as “the first AI software engineer” appears to be rather bad at its job, based on a recent evaluation.
The auto-coder is called “Devin” and was introduced in March 2024. The bot’s creator, an outfit called Cognition AI, has made claims such as “Devin can build and deploy apps end to end,” and “can autonomously find and fix bugs in codebases.” The tool reached general availability in December 2024, starting at $500 per month.
“Devin is an autonomous AI software engineer that can write, run and test code, helping software engineers work on personal tasks or their team projects,” Cognition’s documentation declares. It “can review PRs, support code migrations, respond to on-call issues, build web applications, and even perform personal assistant tasks like ordering your lunch on DoorDash so you can stay locked in on your codebase.”
The software agent was also called out by another YouTube code pundit for allegedly including critical security issues.
Now, three data scientists affiliated with Answer.AI, an AI research and development lab founded by Jeremy Howard and Eric Ries, have tested Devin and found it completed just three out of 20 tasks successfully.
In an analysis conducted earlier this month by Hamel Husain, Isaac Flath, and Johno Whitaker, Devin started well, successfully pulling data from a Notion database into Google Sheets. The AI agent also managed to create a planet tracker for checking claims about the historical positions of Jupiter and Saturn.
But as the three researchers continued their testing, they encountered problems.
“Tasks that seemed straightforward often took days rather than hours, with Devin getting stuck in technical dead-ends or producing overly complex, unusable solutions,” the researchers explain in their report. “Even more concerning was Devin’s tendency to press forward with tasks that weren’t actually possible.”
As an example, they cited how Devin, when asked to deploy multiple applications to the infrastructure deployment platform Railway, failed to understand this wasn’t supported and spent more than a day trying approaches that didn’t work and hallucinating non-existent features.
It’s kind of hard to view this as anything other than a fail.
While we can chuck this onto the pile of failures of AI, that won’t stop the hype at all. People will continue to insist that we are on the verge of a “revolutionary” step with AI replacing most human work. There will be those making dubious claims of how AI is becoming sentient and also those who say that AI is going to cause humanity to go extinct among other completely insane theories. These claims have been going on for more than a year at least now. How anyone treats these claims seriously is beyond me given the failures I keep seeing over and over again. Still, as long as people end up firmly believing this garbage and running with the technology, it’ll give us an ample supply of stories like this where we can sit here and point and laugh at the foolish decisions.