Some argue that bots should be entitled to ingest any content they see, because people can.
You all really want a terminator-like future don’t you. Let’s use the most inflated possible wording, certainly there will be no issues
Get a grip. Skynet didn’t evolve cuz it scanned harry potter and watership down.
This is the best summary I could come up with:
Unfortunately, many people believe that AI bots should be allowed to grab, ingest and repurpose any data that’s available on the public Internet whether they own it or not, because they are “just learning like a human would.” Once a person reads an article, they can use the ideas they just absorbed in their speech or even their drawings for free.
Iris van Rooj, a professor of computational cognitive science at Radboud University Nijmegen in The Netherlands, posits that it’s impossible to build a machine to reproduce human-style thinking by using even larger and more complex LLMs than we have today.
NY Times Tech Columnist Farhad Manjoo made this point in a recent op-ed, positing that writers should not be compensated when their work is used for machine learning because the bots are merely drawing “inspiration” from the words like a person does.
“When a machine is trained to understand language and culture by poring over a lot of stuff online, it is acting, philosophically at least, just like a human being who draws inspiration from existing works,” Manjoo wrote.
In his testimony before a U.S. Senate subcommittee hearing this past July, Emory Law Professor Matthew Sag used the metaphor of a student learning to explain why he believes training on copyrighted material is usually fair use.
In fact, Microsoft, which is a major investor in OpenAI and uses GPT-4 for its Bing Chat tools, released a paper in March claiming that GPT-4 has “sparks of Artificial General Intelligence” – the endpoint where the machine is able to learn any human task thanks to it having “emergent” abilities that weren’t in the original model.
The original article contains 4,088 words, the summary contains 274 words. Saved 93%. I’m a bot and I’m open source!
deleted by creator
Prove to me, right now, that you’re sentient. Or I won’t talk to you.
We don’t even know what sentience is, FFS.
Sentience is the little hump that we can at least sort of see some evidence of, judging by how similar regions of brains activate in certain circumstances. Sapience is the real tricky one.
deleted by creator
There is a so-called “hard problem of consciousness”, although I take exception with calling it a problem.
The general problem is that you can’t really prove that you have subjective experience to others, and neither can you determine if others have it, or whether they merely act like they have it.
But, a somewhat obvious difference between AIs and humans is that AIs will never give you an answer that is not statistically derivable from their training dataset. You can give a human a book on a topic, and ask them about the topic, and they can give you answers that seem to be “their own conclusions” that are not explicitly from the book. Whether this is because humans have randomness injected into their reason, or they have imperfect reasoning, or some genuine animus of “free will” and consciousness, we cannot rightly say. But it is a consistent difference between the humans and the AIs.
The Monty Hall problem discussed in the article – in which AIs are asked to answer the Monty Hall problem, but they are given explicit information that violate the assumptions of the Monty Hall problem – is a good example of something where a human will tend to get it right, through creativity, while an AI will tend to get it wrong, due to statistical regression to the mean.
Don’t we humans derive from our trained dataset: our lives?
If you had a human with no “trained dataset” they would have only just been born. But even then you run into an issue there as it’s been shown that fetuses respond to audio stimulation while they’re in the womb.
The question of consciousness is a really hard one for sure that we may never have an answer that everyone agrees on.
Right now we’re in the infant days of AI.
To be clear, I don’t think the fundamental issue is whether humans have a training dataset. We do. And it includes copyrighted work. It also includes our unique sensory perceptions and lots of stuff that is definitely NOT the result of someone else’s work. I don’t think anyone would dispute that copyrighted text, pictures, sounds are integrated into human consciousness.
The question is whether it is ethical, and should it be legal, to feed copyrighted works into an AI training dataset and use that AI to produce material that replaces, displaces, or competes with the copyrighted work used to train it. Should it be legal to distribute or publish that AI-produced material at all if the copyright holder objects to the use of their work in an AI training dataset? (I concede that these may be two separate, but closely related, questions.)
What level of abstraction is enough? Training doesn’t store or reference the work at all. It derives a set of weights from it automatically. But what if you had a legion of interns manually deriving the weights and entering them in instead? Besides the impracticality of it, if I look at a picture, write down a long list of small adjustments, -2.343, -.02, +5.327, etc etc etc, and adjust the parameters of the algorithm without ever scanning it in, is that legal? If that is, does that mean the automation of that process is the illegal part?
Right now our understanding of derivative works is mostly subjective. We look at the famous Obama “HOPE” image, and the connection to the original news photograph from which it was derived seems quite clear. We know it’s derivative because it looks derivative. And we know it’s a violation because the person who took the news photograph says that they never cleared the photo for re-use by the artist (and indeed, demanded and won compensation for that reason).
Should AI training be required to work from legally acquired data, and what level of abstraction from the source data constitutes freedom from derivative work? Is it purely a matter of the output being “different enough” from the input, or do we need to draw a line in the training data, or…?
All good questions.
We were talking about consciousness not AI created works and copyright but I do have some opinions on that.
I think that if an artist doesn’t want their works included in an AI dataset then it is their right to say no.
And yeah all the extra data that we humans fundamentally aquire in life does change everything we make.
And yeah all the extra data that we humans fundamentally aquire in life does change everything we make.
I’d argue that it’s the crucial difference. People on this thread are arguing like humans never make original observations, or observe anything new, or draw new conclusions or interpretations of new phenomena, so everything humans make must be derived from past creations.
Not only is that clearly wrong, but it also fails the test of infinite regress. If humans can only create from the work of other humans, how was anything ever created? It’s a risible suggestion.
Why don’t you like calling it a “problem”? That just means it’s something we have a questions about, not that it’s problematic. It’s like a math problem, it’s a question we don’t have an answer for.
I hesitate to call it a problem because, by the way it’s defined, subjective experience is innately personal.
I’ve gotten into this question with others, and when I began to propose thought problems (like, what if we could replicate sensory inputs? If you saw/heard/felt everything the same as someone else, would you have the same subjective conscious experience?), I’d get pushback: “that’s not subjective experience, subjective experience is part of the MIND, you can’t create it or observe it or measure it…”.
When push comes to shove, people define consciousness or subjective experience as that aspect of experience that CANNOT be shown or demonstrated to others. It’s baked into the definition. As soon as you venture into what can be shown or demonstrated, you’re out of bounds.
So it’s not a “problem”, as such. It’s a limitation of our ability to self-observe the operating state of our own minds. An interesting question, perhaps, but not a problem. Just a feature of the system.
That’s just ridiculous imo, it seems like they’re afraid of the idea that maybe we’re just automata with a different set of random inputs and flaws. And to me, that’s the kind of idea that the problem of consciousness is trying to explore.
But if you just say, “no, that’s off limits,” that’s not particularly helpful. Science can give us a lot of insight into how thoughts work, how people react vs other organisms to the same stimuli, etc. It can be studied, and we can use the results of those studies to reason about the nature of consciousness. We can categorize life by their sophistication, and we can make inferences about the experiences each category of life have.
So I think it’s absolutely a problem that can and should be studied and reasoned about. Though I can see how that idea can be uncomfortable.
Well, it’s a “problem” for philosophers. I don’t think it’s a “problem” for neurology or hard science, that’s the only point I was trying to make.
Well what an interesting question.
Let’s look at the definitions in Wikipedia:
Sentience is the ability to experience feelings and sensations.
Experience refers to conscious events in general […].
Feelings are subjective self-contained phenomenal experiences.
Alright, let’s do a thought experiment under the assumptions that:
- experience refers to the ability to retain information and apply it in some regard
- phenomenal experiences can be described by a combination of sensoric data in some fashion
- performance is not relevant, as for the theoretical possibility, we only need to assume that with infinite time and infinite resources the simulation of sentience through AI needs to be possible
AI works by telling it what information goes in and what goes out, and it therefore infers the same for new patterns of information and it adjusts to “how wrong it was” to approximate the correction. Every feeling in our body is either chemical or physical, so it can be measured / simulated through data input for simplicity sake.
Let’s also say for our experiment that the appropriate output it is to describe the feeling.
Now I think, knowing this, and knowing how good different AIs can already comment on, summarize or do any other transformative task on bigger texts that exposes them to interpretation of data, that it should be able to “express” what it feels. Let’s also conclude that based on the fact that everything needed to simulate feeling or sensation it can be described using different inputs of data points.
This brings me to the logical second conclusion that there’s nothing scientifically speaking of sentience that we wouldn’t be able to simulate already (in light of our assumptions).
Bonus: while my little experiment is only designed for theoretical possibility and we’d need some proper statistical calculations to know if this is practical in a realistic timeframe already and with a limited amount of resources, there’s nothing saying it can’t. I guess we have to wait for someone to try it to be sure.
deleted by creator
Interesting, please tell me how ‘parroting back a convincing puree of the model it was trained on’ is in any way different from what humans are doing.
And that is the point.
It sounds stupidly simple, but AIs in itself was the idea to do the learning and solving problems more like a human would. By learning how to solve similar problems, and transfer the knowledge to a new problem.
Technically there’s an argument that our brain is nothing more than an AI with some special features (chemicals for feelings, reflexes, etc). But it’s good to remind ourselves we are nothing inherently special. Although all of us are free to feel special of course.
But we make the laws, and have the privilege of making them pro-human. It may be important in the larger philosophical sense to meditate on the difference between AIs and human intelligence, but in the immediate term we have the problem that some people want AIs to be able to freely ingest and repeat what humans spent a lot of time collecting and authoring in copyrighted books. Often, without even paying for a copy of the book that was used to train the AI.
As humans, we can write the law to be pro-human and facilitate human creativity.
deleted by creator
No, it really is.
Sentience is irrelevant to fair use tho. They’re not related at all.
Copyright and fair use are laws written for humans, to protect human creators and insure them the ability to profit from their creativity for a limited time, and to grant immunity to other humans for generally accepted uses of that work without compensation.
I agree that sentience is irrelevant, but whether the actors involved are human or not is absolutely relevant.
Legally there is no difference. Remember, corporations are people.
This article acts like it is a privilege to read a book or hear a song and it can be revoked…lol
Rights are irrelevant.
You made something. That doesn’t give u the right to say what can or can’t ingest it.
Under these rules all fanfic would be illegal.
Search engines would be illegal…can’t scan my website that’s copyrighted. Radio would be illegal. Random ppl listening to ur songs…that’s a nono.
AI does not learn as we do when ingesting information.
I read an article about a subject. I will forget some of it. I will misunderstand some of it. I will not understand some of it. (These two are different because in misunderstanding I think I understand but I am wrong. In simply not understanding the information I can not make heads or tails of that portion)
Later when I make use of what I may have learned these same effects will happen again to whatever it was I correctly understood.
Another, I as a natural intelligence know what I can quote, and what I should not due to copyrights, social mores, and law. AI regurgitates everything that might match regardless of source.
The third issue: The AI does not understand even with copious training data. It does not know that dogs bark, it does not have a concept of a dog.
I once wrote a more simple program that took a body of text and noted the third letter following each set of two, it built probability tables from the pair of letters + the next letter. After ingesting what little training information I was able to give it it would choose two letters at random and then generate the following letter using the statistics it had learned. It had no concept of words, much less the meaning of any words it might form.
I read an article about a subject. I will forget some of it. I will misunderstand some of it. I will not understand some of it. (These two are different because in misunderstanding I think I understand but I am wrong. In simply not understanding the information I can not make heads or tails of that portion)
Just because you’re worse at comprehension or have worse memory doesn’t make you any more real. And AIs also “forget” things, they also get stuff imperfectly, because they don’t store any actual “full length texts” or anything. It’s just separete words (more or less) and the likelyhood of what should come next.
Another, I as a natural intelligence know what I can quote, and what I should not due to copyrights, social mores, and law. AI regurgitates everything that might match regardless of source.
Except you don’t not perfectly. You can be absolutely sure that you often say something someone else has said or written, which means they technically have a copyright to it… But noone cares for the most part.
And it goes the other way too - you can quote something imperfectly.
Both actually can/do happen already with AIs, though it would be great if we could train them with proper attribution - at least for the clear cut cases.
The third issue: The AI does not understand even with copious training data. It does not know that dogs bark, it does not have a concept of a dog.
A sufficiently advanced artificial intelligence would be indistinguishible from natural intelligence. What sets them apart then?
You can look at animals, too. They also have intelligence, and yet there are many concepts that are incomprehensible to them.
The thing is though, how can you actually tell that you don’t work the exact same way? Sure the AI is more primitive, has less inputs - text only, no other outside stimuli - but the basis isn’t all that different.
When creating art do you get to make rules about who or what experiences it? Or is that a selfish asshole take?
Paint a picture but only some ppl get to see it. Sing a song but only some get to hear it.
What planet do you live on where those things are true?
Well, that’s the question at hand. Who? Definitely not, people have an innate right to think about what they observe, whether that thing was made by someone else, or not.
What? I’d argue that’s a much different question.
Let’s take an extreme case. Entertainment industry producers tried to write language into the SAG-AFTRA contract that said that, if an extra is hired for a production, they can use that extra’s image – including 3D spatial body scans – in perpetuity, for any purpose, and that privilege of eternal image storage and re-use was included in the price of hiring an extra for 1 day of work.
The producers would make precisely the same argument you are – how dare you tell them how they can use the images that they captured, even if it’s to use and re-use a person’s image and shape in visual media, forever. The actors argue that their physiognomy is part of their brand and copyright, and using their image without their express permission (and, should they require it, compensation) is a violation of their rights.
Or, I could just take pictures of somebody in public places without their consent and feed them into an AI to create pictures of the subject flashing children. They were my pictures, taken by me, and how dare anybody get to make rules about who or what experiences them, right?
The fact is, we have rules about the capture and re-use of created works that have applied to society for a very long time. I don’t think we should give copyright holders eternal locks on their work, but neither is it clear that a 100% free use policy on created work is the right answer. It is reasonable to propose something in between.
What is not a different question. As a creator you don’t get to say what or who can ingest your creation. If you did Google image search wouldn’t exist.
The thing you’re failing to realize is that this isn’t the first time a computer has been used to ingest info. The rules you assert have never been true to this point. Crawlers have been scanning web pages and images since the dawn of the Internet.
You act like this just started happening so now you get to put rules on what gets to look at that image. Too late there’s decades of precedent.
But there are absolutely rules on whether Google – or anything else – can use that search index to create a product that competes with the original content creators.
For example, https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.
Google indexing of copyrighted works was considered “fair use” only because they only offered a few preview pages associated with each work. Google’s web page excerpts and image thumbnails are widely believed to pass fair use under the same concept.
Now, let’s say Google wants to integrate the content of multiple copyrighted works into an AI, and then give away or sell access to that AI which can spit out the content (paraphrased, in some capacity) of any copyrighted work it’s ever seen. You’ll even be able to ask it questions, like “What did Jeff Guin say about David Koresh’s religious beliefs in his 2023 book, Waco?” and in all likelihood it will cough up a summary of Mr. Guinn’s uniquely discovered research and journalism.
I don’t think the legal questions there are settled at all.
You just proved my point there is nothing from stopping Google from scanning all those they just have to limit what they show of what they scanned. There it is easy to prove because the content is verbatim.
In the case of ai it is not verbatim. How do you prove the results are directly derived from say reading harry potter vs ingesting a forums worth of content regarding hp? I don’t think as a plaintiff u can show damages or that your works were even used… The only reason this is even an issue is because chatgpt creators admitted they scanned books etc.
How do you prove the results are directly derived
Mathematically? It’s a computer algorithm. Its output is deterministic, and both reproducible and traceable.
Give the AI two copies of its training dataset, one with the copyrighted work, one without it. Now give it the same prompt and compare the outputs.
The difference is the contribution of the copyrighted work.
You mention Harry Potter. In Warner Bros. Entertainment, Inc. v. RDR Books, Warner Brothers lawyers argued that a reference encyclopedia for the Harry Potter literary universe was a derivative work. The court disagreed, on the argument that the human authors of the reference book had to perform significant creative work in extracting, summarizing, indexing and organizing the information from JK Rowling’s original works.
I wonder if the court would use the same reasoning to defend the work of an AI?