There is this seeming need to discredit AI from some people that goes overboard. Some friends and family who have never really used LLMs outside of Google search feel compelled to tell me how bad it is.
But generative AIs are really good at tasks I wouldn’t have imagined a computer doing just a few year ago. Even if they plateaued in place where they are right now it would lead to major shakeups in humanity’s current workflow. It’s not just hype.
The part that is over hyped is companies trying to jump the gun and wholesale replace workers with unproven AI substitutes. And of course the companies who try to shove AI where it doesn’t really fit, like AI enabled fridges and toasters.
The part that is over hyped is companies trying to jump the gun and wholesale replace workers with unproven AI substitutes. And of course the companies who try to shove AI where it doesn’t really fit, like AI enabled fridges and toasters.
This is literally the hype. This is the hype that is dying and needs to die. Because generative AI is a tool with fairly specific uses. But it is being marketed by literally everyone who has it as General AI that can “DO ALL THE THINGS!” which it’s not and never will be.
The obsession with replacing workers with AI isn’t going to die. It’s too late. The large financial company that I work for has been obsessively tracking hours saved in developer time with GitHub Copilot. I’m an older developer and I was warned this week that my job will be eliminated soon.
So the company that is obsessed with money that you work for has discovered a way to (they think) make more money by getting rid of you and you’re surprised by this?
At least you’ve been forewarned. Take the opportunity to abandon ship. Don’t be the last one standing when the music stops.
I never said that I was surprised. I just wanted to point out that many companies like my own are already making significant changes to how they hire and fire. They need to justify their large investment in AI even though we know the tech isn’t there yet.
Even if they plateaued in place where they are right now it would lead to major shakeups in humanity’s current workflow
Like which one? Because it’s now 2 years we have chatGPT and already quite a lot of (good?) models.
Which shakeup do you think is happening or going to happen?
I quit my previous job in part because I couldn’t deal with the influx of terrible, unreliable, dangerous, bloated, nonsensical, not even working code that was suddenly pushed into one of the projects I was working on. That project is now completely dead, they froze it on some arbitrary version.
When junior dev makes a mistake, you can explain it to them and they will not make it again. When they use llm to make a mistake, there is nothing to explain to anyone.
I compare this shake more to an earthquake than to anything positive you can associate with shaking.
And so, the problem wasn’t the ai/llm, it was the person who said “looks good” without even looking at the generated code, and then the person who read that pull request and said, again without reading the code, “lgtm”.
If you have good policies then it doesn’t matter how many bad practice’s are used, it still won’t be merged.
The only overhead is that you have to read all the requests but if it’s an internal project then telling everyone to read and understand their code shouldn’t be the issue.
The problem here is that a lot of the time looking for hidden problem is harder than writing good code from scratch. And you will always be at a danger that llm snuck some sneaky undefined behaviour past you. There is a whole plethora of standards, conventions, and good practices that help humans to avoid it, which llm can ignore at any random point.
So you’re either not spending enough time on review or missing whole lot of bullshit. In my experience, in my field, right now, this review time is more time consuming and more painful than avoiding it in the first place.
Don’t underestimate how degrading and energy sucking it is for a professional to spend most of the working time sitting through autogenerated garbage, and how inefficient it is.
More business for me. As a DevOps guy, my job is to create automation to flag “ terrible, unreliable, dangerous, bloated, nonsensical, not even working code”
A technology that makes people put bad code is a problematic technology. If your team/project managed to overcome it’s problems so far doesn’t mean it is good or overall helpful. Peoole not seeing the problem is actually the worst part.
I hardly see it changed to be honest. I work in the field too and I can imagine LLMs being good at producing decent boilerplate straight out of documentation, but nothing more complex than that.
I often use LLMs to work on my personal projects and - for example - often Claude or ChatGPT 4o spit out programs that don’t compile, use inexistent functions, are bloated etc.
Possibly for languages with more training (like Python) they do better, but I can’t see it as a “radical change” and more like a well configured snippet plugin and auto complete feature.
LLMs can’t count, can’t analyze novel problems (by definition) and provide innovative solutions…why would they radically change programming?
I hardly see it changed to be honest. I work in the field too and I can imagine LLMs being good at producing decent boilerplate straight out of documentation, but nothing more complex than that.
I think one of the top lists on advent of code this year is a cheater that fully automated the solutions using LLMs. Not sure which LLM though, I use LLMs quite a bit and ChatGPT 4o frequently tells me nonsense like “perhaps subtracting by zero is affecting your results” (issues I thought were already gone in GPT 4, but I guess not, Sonnet 3.5 does a bit better in this regard).
Maybe some postmortem analysis will be interesting.
The AoC is also a context in which the domain is self-contained and there is probably a ton of training material on similar problems and tasks.
I can imagine LLM might do decently there.
Also there is no big consequence if they don’t and it’s probably possible to bruteforce (which is how many programming tasks have been solved).
I think you’re spot on with LLMs being mostly trained on these kinds of tasks. Can’t say I’m an expert in how to build a training set, but I imagine it’s quite easy to do with these kinds of problems because it’s easy to classify a solution as correct or incorrect. This is in contrast to larger problems which are less guided by algorithmic efficiency and more by sound design/architecture.
Still, I think it’s quite impressive. You don’t have to go very far back in time to have top of the line LLMs unable to solve these kinds of problems.
Also there is no big consequence if they don’t and it’s probably possible to bruteforce (which is how many programming tasks have been solved).
Usually with AoC part 1 is brute-forceable, but part 2 is not. Very often part 1 is to find the 100th number, and part 2 is to find the 1 000 000 000 000th number or something. Last year, out of curiosity, I had a brute-force solution for one problem that successfully completed on ~90% of the input. Solution was multi-threaded and running on a 16 core CPU for about 20 days before I gave up. But the LLMs this year (not sure if this was a problem last year) are in the top list of fastest users to solve the problems.
Just to precise, when I said bruteforce I didn’t imagine a bruteforce of the calculation, but a brute force of the code. LLMs don’t really calculate either way, but what I mean is more: generate code -> try to run and see if tests work -> if it doesn’t ask again/refine/etc. So essentially you are just asking code until what it spits out is correct (verifiable with tests you are given).
But yeah, few years ago this was not possible and I guess it was not due to the training data. Now the problem is that there is not much data left for training, and someone (Bloomberg?) reported that training chatGPT 5 will cost billions of dollars, and it looks like we might be near the peak of what this technology could offer (without any major problem being solved by it to offset the economical and environmental cost).
That is my experience, it’s generally quite decent for small and simple stuff (as I said, distillation of documentation). I use it for rust, where I am sure the training material was much smaller than other languages. It’s not a matter a prompting though, it’s not my prompt that makes it hallucinate functions that don’t exist in libraries or make it write code that doesn’t compile, it’s a feature of the technology itself.
GPTs are statistical text generators after all, they don’t “understand” the problem.
It’s also pretty young, human toddlers hallucinate and make things up. Adults too. Even experts are known to fall prey to bias and misconception.
I don’t think we know nearly enough about the actual architecture of human intelligence to start asserting an understanding of “understanding”. I think it’s a bit foolish to claim with certainty that LLMs in a MoE framework with self-review fundamentally can’t get there. Unless you can show me, materially, how human “understanding” functions, we’re just speculating on an immature technology.
As much as I agree with you, humans can learn a bunch of stuff without first learning the content of the whole internet and without the computing power of a datacenter or consuming the energy of Belgium. Humans learn to count at an early age too, for example.
I would say that the burden of proof is therefore reversed. Unless you demonstrate that this technology doesn’t have the natural and inherent limits that statistical text generators (or pixel) have, we can assume that our mind works differently.
Also you say immature technology but this technology is not fundamentally (I.e. in terms of principle) different from what Weizenabum’s ELIZA in the '60s. We might have refined model and thrown a ton of data and computing power at it, but we are still talking of programs that use similar principles.
So yeah, we don’t understand human intelligence but we can appreciate certain features that absolutely lack on GPTs, like a concept of truth that for humans is natural.
No actually it has changed pretty fundamentally. These aren’t simply a bunch of FCNs put together. Look up what a transformer is, that was one of the major breakthroughs that made modern LLMs possible.
humans can learn a bunch of stuff without first learning the content of the whole internet and without the computing power of a datacenter or consuming the energy of Belgium. Humans learn to count at an early age too, for example.
I suspect that if you took into consideration the millions of generations of evolution that “trained” the basic architecture of our brains, that advantage would shrink considerably.
I would say that the burden of proof is therefore reversed. Unless you demonstrate that this technology doesn’t have the natural and inherent limits that statistical text generators (or pixel) have, we can assume that our mind works differently.
I disagree. I’d argue evidence suggests we’re just a more sophisticated version of a similar principle, refined over billions of years. We learn facts by rote, and learn similarities by rote until we develop enough statistical text (or audio) correlations to “understand” the world.
Conversations are a slightly meandering chain of statistically derived cliches. English adjective order is universally “understood” by native speakers based purely on what sounds right, without actually being able to explain why (unless you’re a big grammar nerd). More complex conversations might seem novel, but they’re just a regurgitation of rote memorized facts and phrases strung together in a way that seems appropriate to the conversation based on statistical experience with past conversations.
Also you say immature technology but this technology is not fundamentally (I.e. in terms of principle) different from what Weizenabum’s ELIZA in the '60s. We might have refined model and thrown a ton of data and computing power at it, but we are still talking of programs that use similar principles.
As with the evolution of our brains, which have operated on basically the same principles for hundreds of millions of years. The special sauce between human intelligence and a flatworm’s is a refined model.
So yeah, we don’t understand human intelligence but we can appreciate certain features that absolutely lack on GPTs, like a concept of truth that for humans is natural.
I’m not sure you can claim that absolutely. That kind of feature is an internal experience, you can’t really confirm or deny if a GPT has something similar. Besides, humans have a pretty tenuous relationship with the concept of truth. There are certainly humans that consider objective falsehoods to be Truth.
Exactly this. Things have already changed and are changing as more and more people learn how and where to use these technologies. I have seen even teachers use this stuff who have limited grasp of technology in general.
My kid’s teachers had what I thought was a fantastic approach - have the kids write an outline. Use an LLM to generate an essay from that outline, then critique the essay
I don’t know anything about the online news business but it certainly appears to have changed. Most of it is dreck, either way, and those organizations are not a positive contributor to society, but they are there, it is a business, and it has changed society
I don’t see the change. Sure, there are spam websites with AI content that were not there before, but is this news business at all? All major publishers and newspapers don’t (seem to) use AI as far as I can tell.
Also I would argue this is no much of a change except maybe in simplicity to generate fluff. All of this existed already for 20 years now, and it’s a byproduct of the online advertisement business (that for sure was a major change in society!). AI pieces are just yet another way to generate content in the hope of getting views.
No, not that either. Unless you consider “use LLM to summarize the changes/errors/inaccuracies, then have a human read the whole thing again” an improvement over “just have a human read the whole thing”.
Because LLM will do all these things:
point you toward issues
point you toward non-issues
not point you toward issues
change stuff even when “instructed” not to
If there is one thing you don’t want to throw an LLM at without full, unbiased review, it’s documents where the wording is legally binding. And if you have to do a full, unbiased review to begin with, where you can’t even trust your tool to have highlighted all the important parts, you may as well not bother with the tool.
I really can’t see this being done by any sane person. Why would you have a generator of text reviewing stuff (besides grammar)?
Do you have any reference of some companies doing this, perhaps?
Its complex pattern matching and looking up existing case law online. This work has been outsourced to contracting companies for at least 7 years that I’m aware of. If it is something that can be documented in a run book for non professionals to do for twenty cents on the dollar then there is no reason it can’t be done by a script for .002.
Computers have always been good at pattern recognition. This isn’t new. LLM are not a type of actual AI. They are programs capable of recognizing patterns and Loosely reproducing them in semi randomized ways. The reason these so-called generative AI Solutions have trouble generating the right number of fingers. Is not only because they have no idea how many fingers a person is supposed to have. They have no idea what a finger is.
The same goes for code completion. They will just generate something that fills the pattern they’re told to look for. It doesn’t matter if it’s right or wrong. Because they have no concept of what is right or wrong Beyond fitting the pattern. Not to mention that we’ve had code completion software for over a decade at this point. Llms do it less efficiently and less reliably. The only upside of them is that sometimes they can recognize and suggest a pattern that those programming the other coding helpers might have missed. Outside of that. Such as generating act like whole blocks of code or even entire programs. You can’t even get an llm to reliably spit out a hello world program.
I never know what to think when I come across a comment like this one—which does describe, even if only at a surface level, how an LLM works—with 50% downvotes. Like, are people angry at reality, is that it?
With as much misinformation that’s being spread about regarding LLMs. It would only lose more people’s comprehension to go into anything more than a generalization.
The problem is people are being sold AGI. But chat GPT and all these other tools don’t even remotely qualify for that. They’re really nothing more than a glorified Alice chatbot system on steroids. The one neat new trick to all this is that they’ve automated the training a bit. But these llms have no more comprehension of their output or the input they were given than something like the old Alice chatbot.
These tools have been described as artificial intelligence to layman for decades at this point. It makes it really hard to change that calcified opinion. People would rather believe that it’s some magical thing not just probability and maths.
“It’s part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, ‘that’s not thinking’”
-Pamela McCorduck
“AI is whatever hasn’t been done yet.”
- Larry Tesler
That’s the curse of the AI Effect.
Nothing will ever be “an actual AI” until we cross the barrier to an actual human-like general artificial intelligence like Cortana from Halo, and even then people will claim it isn’t actually intelligent.
Well at least until those who study intelligence and self-awareness actually come up with a comprehensive definition for it. Something we don’t even have currently. Which makes the situation even more silly. The people selling LLMs and AGNs as artificial intelligence are the PT Barnum of the modern era. This way to the egress folks come see the magnificent egress!
The thing is, AGI and AI are different things. Like your “LLMs aren’t real AI” thing , large language models are a type of machine learning model, and machine learning is a field of study in artificial intelligence.
LLMs are AI. Search engines are AI. Recommendation algorithms are AI. Siri, Alexa, self driving cars, Midjourney, Elevenlabs, every single video game with computer players, they are all AI. Because the term “Artificial Intelligence” by itself is extremely loose, and includes the types of narrow AI all of those are.
Which then get hit by the AI Effect, and become “just another thing computers can do now”, and therefore, “not AI”.
That just Compares it to human level intelligence. Something which we cannot currently even quantify. Let alone understand. It’s ultimately a comparison, a simile not a scientific definition.
Search engines have always been databases. With interfaces programmed by humans. Not ai. They’ve never suddenly gained new functionality inexplicably. If there’s a new feature someone programmed it.
Search engines are however becoming llms and are getting worse for it. Unless you think eating rocks and glue is particularly intelligent. Because there is no comprehension there. It’s simply trying to make its output match patterns it recognizes. Which is a precursor step. But is not “intelligence”. Unless a program doing what it’s programed to do is artificial intelligence. Which is such a meaningless measure because that would mean notepad is artificial intelligence. Windows is artificial intelligence. Linux is artificial intelligence.
You can’t just throw out random Wikipedia links. For example, the Article on AGI explicitly says we don’t have a definition of what human level cognition actually is. Which is what the person you were replying to was saying. You’re doing a fallacious appeal to authority, except that the authority doesn’t agree with you.
I mean, I think intelligence requires the ability to integrate new information into one’s knowledge base. LLMs can’t do that, they have to be trained on a fixed corpus.
Also, LLMs have a pretty shit-tastic track record of being able to differentiate correct data from bullshit, which is a pretty essential facet of intelligence IMO
LLMs have a perfect track record of doing exactly what they were designed to, take an input and create a plausible output that looks like it was written by a human.
They just completely lack the part in the middle that properly understands what it gets as the input and makes sure the output is factually correct, because if it did have that then it wouldn’t be an LLM any more, it would be an AGI.
The “artificial” in AI does also stand for the meaning of “fake” - something that looks and feels like it is intelligent, but actually isn’t.
Large context window LLMs are able to do quite a bit more than filling the gaps and completion. They can edit multiple files.
Yet, they’re unreliable, as they hallucinate all the time. Debugging LLM-generated code is a new skill, and it’s up to you to decide to learn it or not. I see quite an even split among devs. I think it’s worth it, though once it took me two hours to find a very obscure bug in LLM-generated code.
If you consider debugging broken LLM-generated code to be a skill… sure, go for it. But, since generated code is able to use tons of unknown side effects and other seemingly (for humans) random stuff to achieve its goal, I’d rather take the other approach, where it takes a human half an hour to write the code that some LLM could generate in seconds, and not have to learn how to parse random mumbo jumbo from a machine, while getting a working result.
Writing code is far from being the longest part of the job; and you gingerly decided that making the tedious part even more tedious is a great idea to shorten the already short part of it…
It’s similar to fixing code written by interns. Why hire interns at all, eh?
Is it faster to generate then debug or write everything? Needs to be properly tested. At the very least many devs have the perception of being faster, and perception sells.
It actually makes writing web apps less tedious. The longest part of a dev job is pretending to work actually, but that’s no different from any office jerb.
Humans are notoriously worse at tasks that have to do with reviewing than they are at tasks that have to do with creating. Editing an article is more boring and painful than writing it. Understanding and debugging code is much harder than writing it etc., observing someone cooking to spot mistakes is more boring than cooking etc.
This also fights with the attention required to perform those tasks, which means a higher ratio of reviewing vs creating tasks leads to lower quality output because attention is depleted at some point and mistakes slip in.
All this with the additional “bonus” to have to pay for the tool AND the human reviewing while also wasting tons of water and energy.
I think it’s wise to ask ourselves whether this makes sense at all.
I’m too unfamiliar with the cooking and writing/publishing biz. I’d rather not use this analogy.
I can see many business guys paying for something like Devin, making a mess, then hiring someone to fix it. I can see companies not hiring junior devs, and requiring old devs to learn to generate and debug. Just like they required devs to be “full stack”. You can easily prevent that if you have your own company. If … Do you have your own company?
What collective perspective? There’s gonna be winners and losers, non uniform rewards and costs. Companies are already acting like that. And IMO more will join. They’re a hive mind who eagerly copy Google, Amazon, Facebook. And younger devs will add “LLM code gen” to their resumes. No job is safe, even kings and dictators get their heads chopped off.
I have one of those at work now, but my experience with it is still quite limited. With Copilot it was quite useful for knocking up quick boutique solutions for particular problems (stitch together a load of PDFs sorted on a name heading), with the proviso that you might end up having to repair bleed between dependency versions and repair syntax. I couldn’t trust it with big refactors of existing systems.
Cursor and Claude are a lot better than Copilot, but none of them can be trusted. For existing large code repos, LLMs can generate tests and similar boring stuff. I suspect there’ll be an even bigger shift to micro services to make it easier for LLMs generate something that works.
“AI technology is exceptionally expensive, and to justify those costs, the technology must be able to solve complex problems, which it isn’t designed to do.”
Generative AI can indeed do impressive things from a technical standpoint, but not enough revenue has been generated so far to offset the enormous costs. Like for other technologies, It might just take time (remember how many billions Amazon burned before turning into a cash-generating machine? And Uber has also just started turning some profit) + a great deal of enshittification once more people and companies are dependent.
Or it might just be a bubble.
As humans we’re not great at predicting these things including of course me. My personal prediction? A few companies will make money, especially the ones that start selling AI as a service at increasingly high costs, many others will fail and both AI enthusiasts and detractors will claim they were right all along.
See now, I would prefer AI in my toaster. It should be able to learn to adjust the cook time to what I want no matter what type of bread I put in it. Though is that realky AI? It could be. Same with my fridge. Learn what gets used and what doesn’t. Then give my wife the numbers on that damn clear box of salad she buys at costco everytime, which take up a ton of space and always goes bad before she eats even 5% of it. These would be practical benefits to the crap that is day to day life. And far more impactful then search results I can’t trust.
There’s a good point here that like about 80% of what we’re calling AI right now… isn’t even AI or even LLM. It’s just… algorithm, code, plain old math. I’m pretty sure someone is going to refer to a calculator as AI soon. “Wow, it knows math! Just like a person! Amazing technology!”
(That’s putting aside the very question of whether LLMs should even qualify as AIs at all.)
In my professional experience, AI seems to be just a faster way to generate an algorithm that is really hard to debug. Though I am dev-ops/sre so I am not as deep in it as the devs.
I remined of the time researchers used an evolutionary algorithm to devise a circuit that would emit a tone on certain audio inputs and not on others. They examined the resulting circuit and found an extra vestigial bit, but when they cut it off, the chip stopped working. So they re-enabled it. Then they wanted to show off their research at a panel, and at the panel it completely failed. Dismayed they brought it back to their lab to figure out why it stopped working, and it suddenly started working fine.
After a LOT of troubleshooting they eventually discovered that the circuit was generating the tone by using the extra vestigial bit as an antenna that picked up emissions from a CRT in the lab and downconverted it to the desired tone frequency. Turn of the antenna, no signal. Take the chip away from that CRT, no signal.
That’s what I expect LLMs will make. Complex, arcane spaghetti stuff that works but if you look at it funny it won’t work anymore, and nobody knows how it works at all.
As a devops person, I’m constantly jumping back and forth to whatever programming language and tools each team uses. Sometimes it takes a bit to find the context, and I’m hoping ai can help. Unfortunately, allowing the ai to see code is currently off limits by corporate policy, so it only helps in those situations where I need to generate boilerplate
In my jobs there have slways been certain stule requirements to the code. AI doesn’t take those into account. So I would have to rework the code anyway. And of course there are the local libraries it know nothing about.
Fight technology with technology. I’m sure you can specify a style for it to generate, but we already run everything through a prettifier configured for what we look for …. Unless you mean a higher order like naming or architecture
Lol, the lead can’t spec the style, he just reviews the code and asks for changes. Sometimes it’s just that we already have a method that does a similar thing, so we should use it. Of course an AI wouldn’t know about that unless you gave it access to your code. And given how speed first AI companies are, I would never trust that data with them. But other times it’s just the leads personal preference.
I just had to transfer one of my guys out after frequent arguments to do that. I don’t understand - I point out a function that does exactly why he wants, yet he still wants to reinvent it.
I’m dreading when I come back after break. I got 50% a new junior guy who keeps saying he’s a great programmer. No sign of it so far but my management insists I take him on. All he needs to do is expose a new endpoint, wire up functionality that’s already there, and I walked him through it. Should be easy, right? No reinventing the wheel, right?
You better believe that AI-powered toaster would only accept authorized bread from a bakery that paid top dollar to the company that makes them. To ensure the best quality possible and save you from inferior toast, of course.
And I’m sure each slice will have an entirely necessary chip on it, legally protected from workarounds , to prevent using other brand or commodity bread ensure the optimal experience
I agree with your wife: there’s always an aspirational salad in the fridge. For most foods, I’m pretty good at not buying stuff we won’t eat, but we always should eat more veggies. I don’t know how to persuade us to eat more veggies, but step 1 is availability. Like that Reddit meme
There is this seeming need to discredit AI from some people that goes overboard. Some friends and family who have never really used LLMs outside of Google search feel compelled to tell me how bad it is.
But generative AIs are really good at tasks I wouldn’t have imagined a computer doing just a few year ago. Even if they plateaued in place where they are right now it would lead to major shakeups in humanity’s current workflow. It’s not just hype.
The part that is over hyped is companies trying to jump the gun and wholesale replace workers with unproven AI substitutes. And of course the companies who try to shove AI where it doesn’t really fit, like AI enabled fridges and toasters.
This is literally the hype. This is the hype that is dying and needs to die. Because generative AI is a tool with fairly specific uses. But it is being marketed by literally everyone who has it as General AI that can “DO ALL THE THINGS!” which it’s not and never will be.
The obsession with replacing workers with AI isn’t going to die. It’s too late. The large financial company that I work for has been obsessively tracking hours saved in developer time with GitHub Copilot. I’m an older developer and I was warned this week that my job will be eliminated soon.
So the company that is obsessed with money that you work for has discovered a way to (they think) make more money by getting rid of you and you’re surprised by this?
At least you’ve been forewarned. Take the opportunity to abandon ship. Don’t be the last one standing when the music stops.
I never said that I was surprised. I just wanted to point out that many companies like my own are already making significant changes to how they hire and fire. They need to justify their large investment in AI even though we know the tech isn’t there yet.
Like which one? Because it’s now 2 years we have chatGPT and already quite a lot of (good?) models. Which shakeup do you think is happening or going to happen?
Computer programming has radically changed. Huge help having llm auto complete and chat built in. IDEs like Cursor and Windsurf.
I’ve been a developer for 35 years. This is shaking it up as much as the internet did.
I quit my previous job in part because I couldn’t deal with the influx of terrible, unreliable, dangerous, bloated, nonsensical, not even working code that was suddenly pushed into one of the projects I was working on. That project is now completely dead, they froze it on some arbitrary version.
When junior dev makes a mistake, you can explain it to them and they will not make it again. When they use llm to make a mistake, there is nothing to explain to anyone.
I compare this shake more to an earthquake than to anything positive you can associate with shaking.
And so, the problem wasn’t the ai/llm, it was the person who said “looks good” without even looking at the generated code, and then the person who read that pull request and said, again without reading the code, “lgtm”.
If you have good policies then it doesn’t matter how many bad practice’s are used, it still won’t be merged.
The only overhead is that you have to read all the requests but if it’s an internal project then telling everyone to read and understand their code shouldn’t be the issue.
The problem here is that a lot of the time looking for hidden problem is harder than writing good code from scratch. And you will always be at a danger that llm snuck some sneaky undefined behaviour past you. There is a whole plethora of standards, conventions, and good practices that help humans to avoid it, which llm can ignore at any random point.
So you’re either not spending enough time on review or missing whole lot of bullshit. In my experience, in my field, right now, this review time is more time consuming and more painful than avoiding it in the first place.
Don’t underestimate how degrading and energy sucking it is for a professional to spend most of the working time sitting through autogenerated garbage, and how inefficient it is.
More business for me. As a DevOps guy, my job is to create automation to flag “ terrible, unreliable, dangerous, bloated, nonsensical, not even working code”
This is a problem with your team/project. It’s not a problem with the technology.
A technology that makes people put bad code is a problematic technology. If your team/project managed to overcome it’s problems so far doesn’t mean it is good or overall helpful. Peoole not seeing the problem is actually the worst part.
Sir, I use it to assist me in programming. I don’t use it to write entire files or functions. It’s a pattern recognizer.
Your team had people who didn’t review code. That’s a problem.
I hardly see it changed to be honest. I work in the field too and I can imagine LLMs being good at producing decent boilerplate straight out of documentation, but nothing more complex than that.
I often use LLMs to work on my personal projects and - for example - often Claude or ChatGPT 4o spit out programs that don’t compile, use inexistent functions, are bloated etc. Possibly for languages with more training (like Python) they do better, but I can’t see it as a “radical change” and more like a well configured snippet plugin and auto complete feature.
LLMs can’t count, can’t analyze novel problems (by definition) and provide innovative solutions…why would they radically change programming?
I think one of the top lists on advent of code this year is a cheater that fully automated the solutions using LLMs. Not sure which LLM though, I use LLMs quite a bit and ChatGPT 4o frequently tells me nonsense like “perhaps subtracting by zero is affecting your results” (issues I thought were already gone in GPT 4, but I guess not, Sonnet 3.5 does a bit better in this regard).
Maybe some postmortem analysis will be interesting. The AoC is also a context in which the domain is self-contained and there is probably a ton of training material on similar problems and tasks. I can imagine LLM might do decently there.
Also there is no big consequence if they don’t and it’s probably possible to bruteforce (which is how many programming tasks have been solved).
I think you’re spot on with LLMs being mostly trained on these kinds of tasks. Can’t say I’m an expert in how to build a training set, but I imagine it’s quite easy to do with these kinds of problems because it’s easy to classify a solution as correct or incorrect. This is in contrast to larger problems which are less guided by algorithmic efficiency and more by sound design/architecture.
Still, I think it’s quite impressive. You don’t have to go very far back in time to have top of the line LLMs unable to solve these kinds of problems.
Usually with AoC part 1 is brute-forceable, but part 2 is not. Very often part 1 is to find the 100th number, and part 2 is to find the 1 000 000 000 000th number or something. Last year, out of curiosity, I had a brute-force solution for one problem that successfully completed on ~90% of the input. Solution was multi-threaded and running on a 16 core CPU for about 20 days before I gave up. But the LLMs this year (not sure if this was a problem last year) are in the top list of fastest users to solve the problems.
Just to precise, when I said bruteforce I didn’t imagine a bruteforce of the calculation, but a brute force of the code. LLMs don’t really calculate either way, but what I mean is more: generate code -> try to run and see if tests work -> if it doesn’t ask again/refine/etc. So essentially you are just asking code until what it spits out is correct (verifiable with tests you are given).
But yeah, few years ago this was not possible and I guess it was not due to the training data. Now the problem is that there is not much data left for training, and someone (Bloomberg?) reported that training chatGPT 5 will cost billions of dollars, and it looks like we might be near the peak of what this technology could offer (without any major problem being solved by it to offset the economical and environmental cost).
Just from today https://www.techspot.com/news/106068-openai-struggles-chatgpt-5-delays-rising-costs.html
You’re missing it. Use Cursor or Windsurf. The autocomplete will help in so many tedious situations. It’s game changing.
ChatGPT 4o isn’t even the most advanced model, yet I have seen it do things you say it can’t. Maybe work on your prompting.
That is my experience, it’s generally quite decent for small and simple stuff (as I said, distillation of documentation). I use it for rust, where I am sure the training material was much smaller than other languages. It’s not a matter a prompting though, it’s not my prompt that makes it hallucinate functions that don’t exist in libraries or make it write code that doesn’t compile, it’s a feature of the technology itself.
GPTs are statistical text generators after all, they don’t “understand” the problem.
It’s also pretty young, human toddlers hallucinate and make things up. Adults too. Even experts are known to fall prey to bias and misconception.
I don’t think we know nearly enough about the actual architecture of human intelligence to start asserting an understanding of “understanding”. I think it’s a bit foolish to claim with certainty that LLMs in a MoE framework with self-review fundamentally can’t get there. Unless you can show me, materially, how human “understanding” functions, we’re just speculating on an immature technology.
As much as I agree with you, humans can learn a bunch of stuff without first learning the content of the whole internet and without the computing power of a datacenter or consuming the energy of Belgium. Humans learn to count at an early age too, for example.
I would say that the burden of proof is therefore reversed. Unless you demonstrate that this technology doesn’t have the natural and inherent limits that statistical text generators (or pixel) have, we can assume that our mind works differently.
Also you say immature technology but this technology is not fundamentally (I.e. in terms of principle) different from what Weizenabum’s ELIZA in the '60s. We might have refined model and thrown a ton of data and computing power at it, but we are still talking of programs that use similar principles.
So yeah, we don’t understand human intelligence but we can appreciate certain features that absolutely lack on GPTs, like a concept of truth that for humans is natural.
No actually it has changed pretty fundamentally. These aren’t simply a bunch of FCNs put together. Look up what a transformer is, that was one of the major breakthroughs that made modern LLMs possible.
I suspect that if you took into consideration the millions of generations of evolution that “trained” the basic architecture of our brains, that advantage would shrink considerably.
I disagree. I’d argue evidence suggests we’re just a more sophisticated version of a similar principle, refined over billions of years. We learn facts by rote, and learn similarities by rote until we develop enough statistical text (or audio) correlations to “understand” the world.
Conversations are a slightly meandering chain of statistically derived cliches. English adjective order is universally “understood” by native speakers based purely on what sounds right, without actually being able to explain why (unless you’re a big grammar nerd). More complex conversations might seem novel, but they’re just a regurgitation of rote memorized facts and phrases strung together in a way that seems appropriate to the conversation based on statistical experience with past conversations.
As with the evolution of our brains, which have operated on basically the same principles for hundreds of millions of years. The special sauce between human intelligence and a flatworm’s is a refined model.
I’m not sure you can claim that absolutely. That kind of feature is an internal experience, you can’t really confirm or deny if a GPT has something similar. Besides, humans have a pretty tenuous relationship with the concept of truth. There are certainly humans that consider objective falsehoods to be Truth.
@remindme@mstdn.social 1 year. Let me know about the seachange of new 10x transform based programmers that have automated me out of a job.
@horse_battery_staple Ok, I will remind you on Friday Dec 26, 2025 at 7:49 AM PST.
Exactly this. Things have already changed and are changing as more and more people learn how and where to use these technologies. I have seen even teachers use this stuff who have limited grasp of technology in general.
My kid’s teachers had what I thought was a fantastic approach - have the kids write an outline. Use an LLM to generate an essay from that outline, then critique the essay
I don’t know anything about the online news business but it certainly appears to have changed. Most of it is dreck, either way, and those organizations are not a positive contributor to society, but they are there, it is a business, and it has changed society
I don’t see the change. Sure, there are spam websites with AI content that were not there before, but is this news business at all? All major publishers and newspapers don’t (seem to) use AI as far as I can tell.
Also I would argue this is no much of a change except maybe in simplicity to generate fluff. All of this existed already for 20 years now, and it’s a byproduct of the online advertisement business (that for sure was a major change in society!). AI pieces are just yet another way to generate content in the hope of getting views.
Review of legal documents.
Oh boy…what can possibly go wrong for documents where small minutiae like wording can make a huge difference.
Creating legal documents, no. Reviewing legal documents for errors and inaccuracies totally.
No, not that either. Unless you consider “use LLM to summarize the changes/errors/inaccuracies, then have a human read the whole thing again” an improvement over “just have a human read the whole thing”.
Because LLM will do all these things:
If there is one thing you don’t want to throw an LLM at without full, unbiased review, it’s documents where the wording is legally binding. And if you have to do a full, unbiased review to begin with, where you can’t even trust your tool to have highlighted all the important parts, you may as well not bother with the tool.
I really can’t see this being done by any sane person. Why would you have a generator of text reviewing stuff (besides grammar)? Do you have any reference of some companies doing this, perhaps?
Its complex pattern matching and looking up existing case law online. This work has been outsourced to contracting companies for at least 7 years that I’m aware of. If it is something that can be documented in a run book for non professionals to do for twenty cents on the dollar then there is no reason it can’t be done by a script for .002.
Aside from a handful of business that tried to do that and failed miserably, some of them failing in actual court, you mean?
Computers have always been good at pattern recognition. This isn’t new. LLM are not a type of actual AI. They are programs capable of recognizing patterns and Loosely reproducing them in semi randomized ways. The reason these so-called generative AI Solutions have trouble generating the right number of fingers. Is not only because they have no idea how many fingers a person is supposed to have. They have no idea what a finger is.
The same goes for code completion. They will just generate something that fills the pattern they’re told to look for. It doesn’t matter if it’s right or wrong. Because they have no concept of what is right or wrong Beyond fitting the pattern. Not to mention that we’ve had code completion software for over a decade at this point. Llms do it less efficiently and less reliably. The only upside of them is that sometimes they can recognize and suggest a pattern that those programming the other coding helpers might have missed. Outside of that. Such as generating act like whole blocks of code or even entire programs. You can’t even get an llm to reliably spit out a hello world program.
I never know what to think when I come across a comment like this one—which does describe, even if only at a surface level, how an LLM works—with 50% downvotes. Like, are people angry at reality, is that it?
With as much misinformation that’s being spread about regarding LLMs. It would only lose more people’s comprehension to go into anything more than a generalization.
The problem is people are being sold AGI. But chat GPT and all these other tools don’t even remotely qualify for that. They’re really nothing more than a glorified Alice chatbot system on steroids. The one neat new trick to all this is that they’ve automated the training a bit. But these llms have no more comprehension of their output or the input they were given than something like the old Alice chatbot.
These tools have been described as artificial intelligence to layman for decades at this point. It makes it really hard to change that calcified opinion. People would rather believe that it’s some magical thing not just probability and maths.
They are bullshit machines, trained to output something that users think is the right output.
Downvoting someone on the Internet is easier than tangentially modifying reality in a measurable way
Downvoting sounds like a task that’s ripe for automation with AI!
“It’s part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, ‘that’s not thinking’”
-Pamela McCorduck
“AI is whatever hasn’t been done yet.”
- Larry Tesler
That’s the curse of the AI Effect.
Nothing will ever be “an actual AI” until we cross the barrier to an actual human-like general artificial intelligence like Cortana from Halo, and even then people will claim it isn’t actually intelligent.
Well at least until those who study intelligence and self-awareness actually come up with a comprehensive definition for it. Something we don’t even have currently. Which makes the situation even more silly. The people selling LLMs and AGNs as artificial intelligence are the PT Barnum of the modern era. This way to the egress folks come see the magnificent egress!
They already did. AGI - artificial general intelligence.
The thing is, AGI and AI are different things. Like your “LLMs aren’t real AI” thing , large language models are a type of machine learning model, and machine learning is a field of study in artificial intelligence.
LLMs are AI. Search engines are AI. Recommendation algorithms are AI. Siri, Alexa, self driving cars, Midjourney, Elevenlabs, every single video game with computer players, they are all AI. Because the term “Artificial Intelligence” by itself is extremely loose, and includes the types of narrow AI all of those are.
Which then get hit by the AI Effect, and become “just another thing computers can do now”, and therefore, “not AI”.
That just Compares it to human level intelligence. Something which we cannot currently even quantify. Let alone understand. It’s ultimately a comparison, a simile not a scientific definition.
Search engines have always been databases. With interfaces programmed by humans. Not ai. They’ve never suddenly gained new functionality inexplicably. If there’s a new feature someone programmed it.
Search engines are however becoming llms and are getting worse for it. Unless you think eating rocks and glue is particularly intelligent. Because there is no comprehension there. It’s simply trying to make its output match patterns it recognizes. Which is a precursor step. But is not “intelligence”. Unless a program doing what it’s programed to do is artificial intelligence. Which is such a meaningless measure because that would mean notepad is artificial intelligence. Windows is artificial intelligence. Linux is artificial intelligence.
You can argue what you think the words should mean in your opinion in the field of artificial intelligence. I agree with some of them.
It just doesn’t change what they actually do mean.
You can’t just throw out random Wikipedia links. For example, the Article on AGI explicitly says we don’t have a definition of what human level cognition actually is. Which is what the person you were replying to was saying. You’re doing a fallacious appeal to authority, except that the authority doesn’t agree with you.
That’s a disturbing handwave. “We don’t really know what intelligence is, so therefore, anything we call intelligence is fair game”
A thermometer tells me what temperature it is. It senses the ambient heat energy and responds with a numeric indicator. Is that intelligence?
My microwave stops when it notices steam from my popcorn bag. Is that intelligence?
If I open an encyclopedia book to a page about computers, it tells me a bunch of information about computers. Is that intelligence?
If AI helps us realize that a thermometer fits the definition of Intelligence when it shouldn’t, then it’s entirely valid to refine the definition
I mean, I think intelligence requires the ability to integrate new information into one’s knowledge base. LLMs can’t do that, they have to be trained on a fixed corpus.
Also, LLMs have a pretty shit-tastic track record of being able to differentiate correct data from bullshit, which is a pretty essential facet of intelligence IMO
LLMs have a perfect track record of doing exactly what they were designed to, take an input and create a plausible output that looks like it was written by a human. They just completely lack the part in the middle that properly understands what it gets as the input and makes sure the output is factually correct, because if it did have that then it wouldn’t be an LLM any more, it would be an AGI.
The “artificial” in AI does also stand for the meaning of “fake” - something that looks and feels like it is intelligent, but actually isn’t.
Sometimes it seems like the biggest success of AI has been refining the definition of intelligence. But we still have a long way to go
Large context window LLMs are able to do quite a bit more than filling the gaps and completion. They can edit multiple files.
Yet, they’re unreliable, as they hallucinate all the time. Debugging LLM-generated code is a new skill, and it’s up to you to decide to learn it or not. I see quite an even split among devs. I think it’s worth it, though once it took me two hours to find a very obscure bug in LLM-generated code.
If you consider debugging broken LLM-generated code to be a skill… sure, go for it. But, since generated code is able to use tons of unknown side effects and other seemingly (for humans) random stuff to achieve its goal, I’d rather take the other approach, where it takes a human half an hour to write the code that some LLM could generate in seconds, and not have to learn how to parse random mumbo jumbo from a machine, while getting a working result.
Writing code is far from being the longest part of the job; and you gingerly decided that making the tedious part even more tedious is a great idea to shorten the already short part of it…
It’s similar to fixing code written by interns. Why hire interns at all, eh?
Is it faster to generate then debug or write everything? Needs to be properly tested. At the very least many devs have the perception of being faster, and perception sells.
It actually makes writing web apps less tedious. The longest part of a dev job is pretending to work actually, but that’s no different from any office jerb.
What is your favorite flavor of kool aid?
Grape, my nigga.
Humans are notoriously worse at tasks that have to do with reviewing than they are at tasks that have to do with creating. Editing an article is more boring and painful than writing it. Understanding and debugging code is much harder than writing it etc., observing someone cooking to spot mistakes is more boring than cooking etc.
This also fights with the attention required to perform those tasks, which means a higher ratio of reviewing vs creating tasks leads to lower quality output because attention is depleted at some point and mistakes slip in. All this with the additional “bonus” to have to pay for the tool AND the human reviewing while also wasting tons of water and energy. I think it’s wise to ask ourselves whether this makes sense at all.
To make sense of that, figure out what pays more observing/editing or cooking/writing. Big shekels will make boring parts exciting
Think also the amount of people doing both. Also writers earn way more than editors, and stellar chefs earn way more than cooking critics.
If you think devs will be paid more to review GPT code, well, I would love to have your optimism.
I’m too unfamiliar with the cooking and writing/publishing biz. I’d rather not use this analogy.
I can see many business guys paying for something like Devin, making a mess, then hiring someone to fix it. I can see companies not hiring junior devs, and requiring old devs to learn to generate and debug. Just like they required devs to be “full stack”. You can easily prevent that if you have your own company. If … Do you have your own company?
I don’t, like 99% of people don’t or won’t. My job is safe, I am arguing from a collective perspective.
I simply don’t think companies will act like that. Also the mere reduction of total number of positions will compress salaries.
What collective perspective? There’s gonna be winners and losers, non uniform rewards and costs. Companies are already acting like that. And IMO more will join. They’re a hive mind who eagerly copy Google, Amazon, Facebook. And younger devs will add “LLM code gen” to their resumes. No job is safe, even kings and dictators get their heads chopped off.
I have one of those at work now, but my experience with it is still quite limited. With Copilot it was quite useful for knocking up quick boutique solutions for particular problems (stitch together a load of PDFs sorted on a name heading), with the proviso that you might end up having to repair bleed between dependency versions and repair syntax. I couldn’t trust it with big refactors of existing systems.
Cursor and Claude are a lot better than Copilot, but none of them can be trusted. For existing large code repos, LLMs can generate tests and similar boring stuff. I suspect there’ll be an even bigger shift to micro services to make it easier for LLMs generate something that works.
This is easy to say about the output of AIs… if you don’t check their work.
Alas, checking for accuracy these days seems to be considered old fogey stuff.
Goldman Sachs, quote from the article:
Generative AI can indeed do impressive things from a technical standpoint, but not enough revenue has been generated so far to offset the enormous costs. Like for other technologies, It might just take time (remember how many billions Amazon burned before turning into a cash-generating machine? And Uber has also just started turning some profit) + a great deal of enshittification once more people and companies are dependent. Or it might just be a bubble.
As humans we’re not great at predicting these things including of course me. My personal prediction? A few companies will make money, especially the ones that start selling AI as a service at increasingly high costs, many others will fail and both AI enthusiasts and detractors will claim they were right all along.
See now, I would prefer AI in my toaster. It should be able to learn to adjust the cook time to what I want no matter what type of bread I put in it. Though is that realky AI? It could be. Same with my fridge. Learn what gets used and what doesn’t. Then give my wife the numbers on that damn clear box of salad she buys at costco everytime, which take up a ton of space and always goes bad before she eats even 5% of it. These would be practical benefits to the crap that is day to day life. And far more impactful then search results I can’t trust.
There’s a good point here that like about 80% of what we’re calling AI right now… isn’t even AI or even LLM. It’s just… algorithm, code, plain old math. I’m pretty sure someone is going to refer to a calculator as AI soon. “Wow, it knows math! Just like a person! Amazing technology!”
(That’s putting aside the very question of whether LLMs should even qualify as AIs at all.)
In my professional experience, AI seems to be just a faster way to generate an algorithm that is really hard to debug. Though I am dev-ops/sre so I am not as deep in it as the devs.
I remined of the time researchers used an evolutionary algorithm to devise a circuit that would emit a tone on certain audio inputs and not on others. They examined the resulting circuit and found an extra vestigial bit, but when they cut it off, the chip stopped working. So they re-enabled it. Then they wanted to show off their research at a panel, and at the panel it completely failed. Dismayed they brought it back to their lab to figure out why it stopped working, and it suddenly started working fine.
After a LOT of troubleshooting they eventually discovered that the circuit was generating the tone by using the extra vestigial bit as an antenna that picked up emissions from a CRT in the lab and downconverted it to the desired tone frequency. Turn of the antenna, no signal. Take the chip away from that CRT, no signal.
That’s what I expect LLMs will make. Complex, arcane spaghetti stuff that works but if you look at it funny it won’t work anymore, and nobody knows how it works at all.
As a devops person, I’m constantly jumping back and forth to whatever programming language and tools each team uses. Sometimes it takes a bit to find the context, and I’m hoping ai can help. Unfortunately, allowing the ai to see code is currently off limits by corporate policy, so it only helps in those situations where I need to generate boilerplate
In my jobs there have slways been certain stule requirements to the code. AI doesn’t take those into account. So I would have to rework the code anyway. And of course there are the local libraries it know nothing about.
Fight technology with technology. I’m sure you can specify a style for it to generate, but we already run everything through a prettifier configured for what we look for …. Unless you mean a higher order like naming or architecture
Lol, the lead can’t spec the style, he just reviews the code and asks for changes. Sometimes it’s just that we already have a method that does a similar thing, so we should use it. Of course an AI wouldn’t know about that unless you gave it access to your code. And given how speed first AI companies are, I would never trust that data with them. But other times it’s just the leads personal preference.
I just had to transfer one of my guys out after frequent arguments to do that. I don’t understand - I point out a function that does exactly why he wants, yet he still wants to reinvent it.
I’m dreading when I come back after break. I got 50% a new junior guy who keeps saying he’s a great programmer. No sign of it so far but my management insists I take him on. All he needs to do is expose a new endpoint, wire up functionality that’s already there, and I walked him through it. Should be easy, right? No reinventing the wheel, right?
You better believe that AI-powered toaster would only accept authorized bread from a bakery that paid top dollar to the company that makes them. To ensure the best quality possible and save you from inferior toast, of course.
Lol, enshitification should at least take a few months… I hope.
or you go to make some toast and it spends 15 minutes downloading “updates” before you can use it
And I’m sure each slice will have an entirely necessary chip on it, legally protected from workarounds , to
prevent using other brand or commodity breadensure the optimal experienceYou really wouldn’t.
I was so hoping that was toasty the toaster! Waffles? How about a bagel?
I agree with your wife: there’s always an aspirational salad in the fridge. For most foods, I’m pretty good at not buying stuff we won’t eat, but we always should eat more veggies. I don’t know how to persuade us to eat more veggies, but step 1 is availability. Like that Reddit meme
It’s been years… maybe we don’t need the costco size for the love of pete.
So true.
Like what outcome?
I have seen gains on cell detection, but it’s “just” a bit better.