Simply explained: how does GPT work?

sizeoftheuniverse@programming.dev · 1 year ago

Simply explained: how does GPT work?

W^Unt!2@waveform.social · 1 year ago

You know when your typing on your phone and you have that bar above the keyboard showing you what word it thinks you are writing? If you click the word before you finish typing it, it can even show you the word it thinks you are going to write next. Gpt works the same way, it just has waaaay more data that it can sample from.

It’s all just very advanced predictive text algorithms.

Ask it a question about basketball. It looks through all documents it can find about basketball and sees often they reference, hoops, Michael Jordan, sneakers, NBA ect. And just outputs things that are highly referenced in a structure that makes grammatical sense.

For instance, if you had the word ‘basketball’ it knows it’s very unlikely for the word before it to be ‘radish’ and it’s more likely to be a word like ‘the’ or ‘play’ so it just strings it together logically.

That’s the basics anyway.

qwertyasdef@programming.dev · 1 year ago

Ask it a question about basketball. It looks through all documents it can find about basketball…

I get that this is a simplified explanation but want to add that this part can be misleading. The model doesn’t contain the original documents and doesn’t have internet access to look up the documents (though that can be added as an extra feature, but even then it’s used more as a source to show humans than something for the model to learn from on the fly). The actual word associations are all learned during training, and during inference it just uses the stored weights. One implication of this is that the model doesn’t know about anything that happened after its training data was collected.

W^Unt!2@waveform.social · 1 year ago

I wonder what an ELI5 version of ‘stored weights’ would be in this context.

Taival@suppo.fi · edit-2 1 year ago

Not quite ELI5 but I’ll try “basic understanding of calculus” level.

In very broad terms, the model learns complex relationships between words (or tokens to be specific, explained below) as probabilistic scores. At its simplest, this could mean the likelihood of one word appearing next to another in the massive amounts of text the model was trained with: the words “apple” and “pie” are often found together, so they might have a high-ish score of 0.7, while the words “apple” and “chair” might have a lower score of just 0.2. Recent GPT models consist of several billions of these scores, known as the weights. Once their values have been estabilished by feeding lots of text through the model’s training process, they are all that’s needed to generate more text.

Without getting into the math too much, this is how a GPT model then uses these numbers to come up with words:

The input prompt is first chopped up into tokens that are each assigned a number. For example, the OpenAI tokenizer translates “Hello world!” into the numbers [15496, 995, 0]. You can think of this as the A=1, B=2, C=3… cipher we all learnt as kids, but the numbers are also assigned to common words, syllables and punctuation.
These numbers are inserted into a massive system of equations where they are multiplied together with the billions of weights of the model in a specific manner. This calculation results in a probability score from 0 to 1 for each token known by the model, representing how likely that token is to appear next in sequences that look similar to your input.
One of the tokens with the highest scores is chosen as the model’s output semi-randomly to provide variance.
This cycle is then repeated over and over, generating the text one token at a time.

In reality we’re not quite so sure what the weights represent to the model exactly, but this is the gist of it. All we know is that they signify the importances or non-importances that the model places on some pattern that was present in the training data. Some of these patterns could be just simple two-word pairs, but many are probably much more complicated. Lots of researchers are currently trying to get a better idea of how these numbers are actually affecting the model’s output.

Lmaydev@programming.dev · edit-2 1 year ago

How closely related words and their attributes are to other words.

W^Unt!2@waveform.social · 1 year ago

Edit: i see now it’s an article and not just you asking a question lol. I’ll leave it up anyway.

abhibeckert@lemmy.world · edit-2 1 year ago

If someone types the word “how are you?” into an SMS message box… chances are really high the other person will respond with “I’m good, how are you?” or something along those lines.

That’s how ChatGPT works. It’s essentially a database of likely responses to questions.

It’s not a fixed list of responses to every possible question, it’s a mathematical one that can handle arbitrary questions and deliver the most likely arbitrary response. So for example if you ask “how you are?” you’ll get the same answer as “how are you?”

ChatGPT is also programmed to behave a certain way - for example if you actually ask how it is, it will tell you it’s not a person and doesn’t have feelings/etc. That’s not part of the algorithm, that’s part of the “instructions” OpenAI has given to ChatGPT. They have specifically told it not to imply that it is human or alive.

Finally - it’s a little bit random, so if you ask the same question 20 times, you’ll get 20 slightly different responses.

ChatGPT is not “impressively smart” at all. It’s just responding with mathematically the most likely answer to every question you ask. It will often give the same answer as a smart person, but not always. For example I just asked it how far from my city to a nearby town, it said 50 miles that it’d take 2 hours to drive there. The correct answer is 40 miles / 1 hour.

I expect there’s probably a whole bunch of incorrect information in the training data set — it’s a popular tourist drive, and tourists probably would take longer, stop along the way, take detours, etc. For a tourist, ChatGPT’s answer might be correct. But it’s not because it’s smart, it’s just that’s what the algorithm produces.

qwop@programming.dev · edit-2 1 year ago

I think calling it just like a database of likely responses is too much of a simplification and downplays what it is capable of.

I also don’t really see why the way it works is relevant to it being “smart” or not. It depends how you define “smart”, but I don’t see any proof of the assumptions people seem to make about the limitations of what an LLM could be capable of (with a larger model, better dataset, better training, etc).

I’m definitely not saying I can tell what LLMs could be capable of, but I think saying “people think ChatGPT is smart but it actually isn’t because <simplification of what an LLM is>” is missing a vital step to make it a valid logical argument.

The argument is relying on incorrect intuition people have. Before seeing ChatGPT I reckon if you’d told people how an LLM worked they wouldn’t have expected it to be able to do things it can do (for example if you ask it to write a rhyming poem about a niche subject it wouldn’t have a comparable poem about in its dataset).

A better argument would be to pick something that LLMs can’t currently do that it should be able to do if it’s “smart”, and explain the inherent limitation of an LLM which prevents it from doing that. This isn’t something I’ve really seen, I guess because it’s not easy to do. The closest I’ve seen is an explanation of why LLMs are bad at e.g. maths (like adding large numbers), but I’ve still not seen anything to convince me that this is an inherent limitation of LLMs.

qwertyasdef@programming.dev · 1 year ago

Agreed, smartness is about what it can do, not how it works. As an analogy, if a chess bot could explore the entire game tree hundreds of moves ahead, it would be pretty damn smart (easily the best in the world, probably strong enough to solve chess) despite just being dumb minmax plus absurd amounts of computing power.

The fact that ChatGPT works by predicting the most likely next word isn’t relevant to its smartness except as far as its mechanism limits its outputs. And predicting the most likely next word has proven far less limiting than I expected, so even though I can think of lots of reasons why it will never scale to true intelligence, how could I be confident that those are real limits and not just me being mistaken yet again?

drexy_rexy@programming.dev · 1 year ago

deleted by creator

lasagna@programming.dev · edit-2 1 year ago

I wish I understood it well enough to give a simple explanation.

A good place to start is to understand deep learning.

Edit: I just noticed it’s an article. Could do with some tags lol