A 4 min 1080p30fps video taken with my phone camera is 518MB, While a 12 min 1080p30fps video ripped from youtube is 341MB, both are using mp4 h.264 as codec and the youtube one isnt of lower quality, so why this big difference?
Compression. YouTube videos usually have low bitrate too.
Well yes, they have a lower bitrate, but why same quality as a bigger bitrate video? Same resolution, codec, frame rate…
Because they aren’t the same quality? Lossy compression is lossy
Because you can compress video without reducing the resolution, codec, or frame rate. When a camera records two green pixels: it records (green pixel) (green pixel). When the video is compressed, it changes to (two green pixels) which takes up less storage space but retains the same information. Compression is computationally expensive, which is why cameras typically don’t do it on the fly.
A lot of the data in the video file you take isn’t that visible but it’s there for when you put it in editing software
For example: if you took that lower sized YouTube rip and threw it into editing software to tinker with the brightness (as a simple example) you quickly start getting artifacts whereas the video you took would be able to handle the editing without the artifacts much better as there’s more data to work with
Basically the YouTube video has had all the extra data scrapped off that it doesn’t need because it’s not going to be edited, it’s only going to be viewed
It’ the individual frames that are compressed, essentially the video is unpacked and detail is culled from averages across multiple other frames beside it. So if the top of the video, for example the sky, doesn’t change then that part will be kept static.
It’s not so much properties about the video, but properties about each frame. I can take a 1080p image and blow it up to 8K in GIMP, but it’s got the same detail as a 1080p image.
If you do multiple passes you can alleviate some of the downsides of low bitrates. You can always easily spot it in dark areas. I despise watching space movies or shows on streaming services because of the resulting excessive banding artifacts.
Video encoding has several tradeoffs:
- Bitrate
- Resolution/frame rate
- Perceived quality
- Computational complexity of encoding
- Computational complexity of decoding
The cell phone encoding chips for video encoding on device make sacrifices to preserve speed of encoding and preserve battery life (higher computational complexity costs more processing cycles and tends to use more power). So it’s simpler encoding, in exchange for inefficient bitrate compression.
YouTube (and all the social media sites) have huge server farms with highly specialized encoding chips for making the videos more efficient with bitrate for quality. That makes sense because videos designed to be watched millions of times could benefit from even a very slight improvement of bitrate in exchange for a one-time cost of complex encoding. It’s also why YouTube tends not to convert to AV1 (very efficient in bitrate for quality, but computationally complex to encode) until a video has a few hundred views, because it’s not clear whether that tradeoff is worth it until they know a lot of people will be watching it.
Netflix customizes even further for a per-video basis and looks for even more specialized tricks on a scene-by-scene basis, because every single one of its videos only needs to be encoded once for each quality/format but will be watched millions of times.
In other words, it’s like any other engineering problem. The engineers choose different tradeoffs based on context, which means that the cell phone applies a different set of tradeoffs compared to the social media site’s server farm.
I don’t know if this is in fact the case for you, but often codecs can provide better compression if they can spend more CPU time trying to find an optimal encoding.
Cameras have to do real-time encoding on a limited-power device. YouTube doesn’t have those constraints and may spend more computation time on encoding.
Here’s the FFmpeg documentation for x264, an open source h.264 encoder: https://trac.ffmpeg.org/wiki/Encode/H.264
So the encoding strategy is tunable for an encoder, but also there are different implementations that might perform differently. They all produce a h.264 video stream that’s decodable by any standard player.
Resolution and quality are different things, especially when different codecs are used for data compression. An AVI i record for acreen capture might hit 250 MB, running that through ffmpeg to get an mp4 might be 20MB, if i drop the quality but same resolution then sometimes 7MB