I believe lighting plays a very important part in making a scene realistic when it comes to creating one artificially, like in 3D modelling. That is why I also think the lighting of these AI generated images is the prime source of what impresses people about these images since no matter how unrealistic or distorted the subject is, the lighting makes it look like a natural part of the background. This is clearly different from photos like from poorly Photoshopped ones where the subject feels deliberately inserted into the scene from a cutout.
I am interested to understand how LLMs understand the context of the lighting when creating images. Do they make use of samples which happen to have the exact same lighting positions or do they add the lighting as an overlay instead? Also, why is it that lighting doesn’t look convincing in some cases like having multiple subjects together etc.?
The deal with LLMs is that it’s very difficult to say which piece of training material went into which output. Everything gets chopped up and mixed, and it’s computationally difficult to run backwards.
My understanding of the image generators is that they operate one pixel at a time too, looking only at neighboring pixels. So in that sense, it’s not correct to say they understand the context of anything.
It kinda understands context.
An image generator makes an image of static similar to like a TV does with bad signal. The Ai looks atthe static and sees shapes in it. The prompt influences what it’s trying to “see”. It starts filling in the static to a full image, it does this in steps, more steps generally means a better quality image.
Also to say a LLM is a Large Language Model and is different from an image generator, though the proccess for them is very similar.