cross-posted from: https://lemmy.intai.tech/post/43759
cross-posted from: https://lemmy.world/post/949452
OpenAI’s ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models
People talk about OpenAI as if its some utopian saviour that’s going to revolutionise society. When in reality its a large corporation flooding the internet with terrible low-quality content using machine learning models that have existed for years. And the fields it is “automating” are creative ones that specifically require a human touch, like art and writing. Language learning models and image generation isn’t going to improve anything. They’re not “AI” and they never will be. Hopefully when AI does exist and does start automating everything we’ll have a better economic system though :D
The thing that amazes me the most about AI Discourse is, we all learned in Theory of Computation that general AI is impossible. My best guess is that people with a CS degree who believe in AI slept through all their classes.
The existence of natural intelligence is the proof that artificial intelligence is possible.
We can simulate all manner of physics using a computer, but we can’t simulate a brain using a computer? I’m having a real hard time believing that. Brains aren’t magic.
we all learned in Theory of Computation that general AI is impossible.
I strongly suspect it is you who has misunderstood your CS courses. Can you provide some concrete evidence for why general AI is impossible?
Evidence, not really, but that’s kind of meaningless here since we’re talking theory of computation. It’s a direct consequence of the undecidability of the halting problem. Mathematical analysis of loops cannot be done because loops, in general, don’t take on any particular value; if they did, then the halting problem would be decidable. Given that writing a computer program requires an exact specification, which cannot be provided for the general analysis of computer programs, general AI trips and falls at the very first hurdle: being able to write other computer programs. Which should be a simple task, compared to the other things people expect of it.
Yes there’s more complexity here, what about compiler optimization or Rust’s borrow checker? which I don’t care to get into at the moment; suffice it to say, those only operate on certain special conditions. To posit general AI, you need to think bigger than basic block instruction reordering.
This stuff should all be obvious, but here we are.
Given that humans can write computer programs, how can you argue that the undecidability of the halting problem stops intelligent agents from being able to write computer programs?
I don’t understand what you mean about the borrow checker in Rust or block instruction reordering. These are certainly not attempts at AI or AGI.
What exactly does AGI mean to you?
This stuff should all be obvious, but here we are.
This is not necessary. Please don’t reply if you can’t resist the temptation to call people who disagree with you stupid.
This is proof of one thing: that our brains are nothing like digital computers as laid out by Turing and Church.
What I mean about compilers is, compiler optimizations are only valid if a particular bit of code rewriting does exactly the same thing under all conditions as what the human wrote. This is chiefly only possible if the code in question doesn’t include any branches (if, loops, function calls). A section of code with no branches is called a basic block. Rust is special because it harshly constrains the kinds of programs you can write: another consequence of the halting problem is that, in general, you can’t track pointer aliasing outside a basic block, but the Rust program constraints do make this possible. It just foists the intellectual load onto the programmer. This is also why Rust is far and away my favorite language; I respect the boldness of this play, and the benefits far outweigh the drawbacks.
To me, general AI means a computer program having at least the same capabilities as a human. You can go further down this rabbit hole and read about the question that spawned the halting problem, called the entscheidungsproblem (decision problem) to see that AI is actually more impossible than I let on.
Here are two groups of claims I disagree with that I think you must agree with
1 - brains do things that a computer program can never do. It is impossible for a computer to ever simulate the computation* done by a brain. Humans solve the halting problem by doing something a computer could never do.
2 - It is necessary to solve the halting problem to write computer programs. Humans can only write computer programs because they solve the halting problem first.
*perhaps you will prefer a different word here
I would say that:
- it doesn’t require solving any halting problems to write computer programs
- there is no general solution to the halting problem that works on human brains but not on computers.
- computers can in principle simulate brains with enough accuracy to simulate any computation happening on a brain. However, there would be far cheaper ways to do any computation.
Which of my statements do you disagree with?
I don’t see how this is any different than humans copying or being inspired by something. While I hate seeing companies profiting off of the commons while giving nothing of value back, how do you prove that an AI model is using your work in any meaningful or substantial way? What would make me really mad is if this dumb shit leads to even harsher copyright laws. We need less copyright not more.
I once looked outside. Could I be sued for observing a public space?
i once looked at a picture of spider man and badman then made a crappy drawing biterman
to jail with me!
Good. Technology always makes strides before the law can catch up. The issue with this is that multi million dollar companies use these gaps in the law to get away with legally gray and morally black actions all in the name of profits.
Edit: This video is the best way to educate yourself on why ai art and writing is bad when it steals from people like most ai programs currently do. I know it’s long, but it’s broken up into chapters if you can’t watch the whole thing.
Totally agree. I don’t care that my data was used for training, but I do care that it’s used for profit in a way that only a company with big budget lawyers can manage
But if we’re drawing the line at “did it for profit”, how much technological advancement will happen? I suspect most advancement is profit driven. Obviously people should be paid for any work they actually put in, but we’re talking about content on the internet that you willingly create for fun and the fact it’s used by someone else for profit is a side thing.
And quite frankly, there’s no way to pay you for this. No company is gonna pay you to use your social media comments to train their AI and even if they did, your share would likely be pennies at best. The only people who would get paid would be companies like reddit and Twitter, which would just write into their terms of service that they’re allowed to do that (and I mean, they already use your data for targeting ads and it’s of course visible to anyone on the internet).
So it’s really a choice between helping train AI (which could be viewed as a net benefit for society, depending on how you view those AIs) vs simply not helping train them.
Also, if we’re requiring payment, only the super big AI companies can afford to frankly pay anything at all. Training an AI is already so expensive that it’s hard enough for small players to enter this business without having to pay for training data too (and at insane prices, if Twitter and Reddit are any indication).
Hundreds of projects in github are supported by donations, innovation happens even without profit incentives. It may slow down the pace of AI development but I am willing to wait anothrt decade for AIs if it protects user data and let’s regulation catch up.
Reddit is currently trying to monetize their user comments and other content by charging for API access. Which creates a system where only the corporations profit and the users generating the content are not only unpaid, but expected to pay directly or are monetized by ads. And if the users want to use the technogy trained by their content they also have to pay for it.
Sure seems like a great deal for corporations and users getting fleeced as much as possible.
I can’t speak for others, but I don’t consider posts I made on a website I don’t own to be my property. If anything, it’s amusing to think of my idiotic rants making up a tiny fraction of an AIs “knowledge”.
Piracy isn’t stealing and neither is this.
Piracy is literally theft, what are you talking about?
It is absolutely not theft. If you’d like a physical crime to compare it to, forgery would be what you are looking for. But piracy is not at all theft.
That is, unless you are talking about Captain Davy Jones and his pirate ship. That type of piracy is theft.
Hope it goes through and sets a president.
Vote Skynet for 2024 Presidential Election, the efficient choice!
I think you mean precident
“Massive Trouble”
Step 1 - Scrape everyone’s data to make your LLM and make a high profile deal worth $10B
Step 2 - Get sued by everyone whose data you scraped
Step 3 - Settle and everyone in the class will be eligible for $5 credit using ChatGPT-4
Step 4 - Bask in the influx of new data
Step 5 - Profit
i posted on the public internet with the intent and understanding that it would be crawled by systems for all kinds of things. if i dont want content to be grabbed i dont publish it publicly
you can’t easily have it both ways imo. even with systems that do strong pki if you want the world in general to see it you are giving up a certain amount of control over how the content gets used.
law does not really matter here as much as people would like to try to apply it, this is simply how public content will be used. Go post in a garden if you don’t want to get scrapped, just remember the corollary is your reach, your voice is limited to the walls of that garden.
What you said makes a lot of sense. But here’s the catch: it assumes OpenAI checked the licensing for all the stuff they grabbed. And I can guarantee you they didn’t.
It’s damn near impossible to automatically check the licensing for all the stuff they got she we know for a fact they got stuff whose licensing does not allow it to be used this way. Microsoft has already been sued for Copilot, and these lawsuits will keep coming. Assuming they somehow managed to only grab legit material and they used excellent legal advisors that assured them out would stand in court, it’s definitely impossible to tell what piece of what goes where after it becomes a LLM token, and also impossible to tell what future lawsuits will decide about it.
Where does that leave OpenAI? With the good ol’ “I grabbed something off the internet because I could”. Why does that sound familiar? It’s something people have been doing since the internet was invented, it’s commonly referred to as “piracy”. But it’s supposed to be wrong and illegal. Well either it’s wrong and illegal for everybody or the other way around.
The difference between piracy and having your content used for training a generative model, is that in the latter case, the content isn’t redistributed. It’s like downloading a movie from netflix (and eventually distributing it for free) vs watching a movie on netflix and using it as inspiration to make your own movie.
The legality of it all is unclear and most of that is because the technology evolved so quickly that the legal framework is just not equipped to deal with it. Despite the obvious moral issues with scraping artist’s content.
It’s wild to see people in the piracy community of all places have an issue with someone benefiting from data they got online for free.
Many of us are sharing without reward and have strong ethical beliefs regarding for-profit distribution of material versus non-profit sharing.
The difference is that they are profitting from other people’s work and properties, I don’t profit from watching a movie or playing a game for free, I just save some money.
You do if you make games or movies and those things give you inspiration.
This is just how learning is done though, whether it’s AI or human.
Absolutely not comparable. Inspiration and an amalgation of everything a LLM consumes are completely different things.
I’d argue that what we do is an amalgamation of what we are exposed to, to a great extent. And we are exposed to way less information than a LLM.
It really isn’t that bonkers. A lot software thought is about licensing. See GPL and Creative Commons and all that stuff thats all about how things can be profited from/responsibilities around it. Benefiting from free data is one thing. Privately profiting at the expense or not sharing the capability/advances that came from it is another. Willing to bet there’s GPL violations via the training sets.
Is it even possible to attach licenses to text posts on social media?