video,codec,video-conversion – How Bitrate Differs for Same Resolution and Framerate

codecvideovideo conversion

Reading about video quality I found that it depends of resolution, frames per second and bitrate, which decides the size of the video.

My question is how the bitrate is calculated and how it can differ.

Let's say a video has a 360×240 resolution. It takes 86400 pixels per frame.
The frame rate is 30 Hz. So the video takes 86400 × 30 = 2592000 pixels per second.

So let's say 1 pixel is 3 Bytes (24 Bits) of data: we have 2592000 × 24 bits per second video (62208000 Bits), that is 62208 kBits (This does not sound right, maybe some problem in my calculation).

But how can it differ and how does it make difference in quality?

Best Answer

What you've calculated is the bitrate for a raw, uncompressed video. You typically won't find these except in research or other specialized applications. Even broadcasters use compressed video, albeit at a much higher bitrate than your typical YouTube video.

So, video quality has a lot to do with how the video was compressed. The more you compress it, the less bits it takes per frame. Also, the more you compress, the worse the quality is. Now, some videos are much easier to compress than others – in essence, this is why they have a lower bitrate even though they have the same resolution and framerate.

In order to understand why this is, you need to be aware of the two main principles video compression uses. These are called "spatial" and "temporal redundancy".

Spatial redundancy

Spatial redundancy exists in images that show natural content. This is the reason JPEG works so well — it compresses image data because blocks of pixels can be coded together. These are 8 × 8 pixels, for example. These are called "macroblocks".

Modern video codecs do the same: They basically use similar algorithms to JPEG in order to compress a frame, block by block. So you don't store bits per pixel anymore, but bits per macroblock, because you "summarize" pixels into larger groups. By summarizing them, the algorithm will also discard information that is not visible to the human eye — this is where you can reduce most of the bitrate. It works by quantizing the data. This will retain frequencies that are more perceivable and "throw away" those we can't see. Quantizing factor is expressed as "QP" in most codecs, and it's the main control knob for quality.

You can now even go ahead and predict macroblocks from macroblocks that have been previously encoded in the same image. This is called intra prediction. For example, a part of a grey wall was already encoded in the upper left corner of the frame, so we can use that macroblock in the same frame again, for example for the macroblock right next to it. We will just store the difference it had to the previous one and save data. This way, we don't have to encode two macroblocks that are very similar to each other.

Why does bitrate change for same image size? Well, some images are easier to encode than others. The higher the spatial activity, the more you actually have to encode. Smooth textures take up less bits than detailed ones. The same goes for intra prediction: A frame of a grey wall will allow you to use one macroblock to predict all others, whereas a frame of flowing water might not work that well.

Temporal redundancy

This exists because a frame following another frame is probably very similar to its predecessor. Mostly, just a tiny bit changes, and it wouldn't make sense to fully encode it. What video encoders do is just encode the difference between two subsequent frames, just like they can do for macroblocks.

Taking an example from Wikipedia's article on motion compensation, let's say this is your original frame:

Then the difference to the next frame is just this:

The encoder now only stores the actual differences, not the pixel-by-pixel values. This is why the bits used for each frame are not the same every time. These "difference" frames depend on a fully encoded frame, and this is why there are at least two types of frames for modern codecs:

I-frames (aka keyframes) — these are the fully encoded ones
P-frames — these are the ones that just store the difference

You occasionally need to insert I-frames into a video. The actual bitrate depends also on the number of I-frames used. Moreover, the more difference in motion there is between two subsequent frames, the more the encoder has to store. A video of "nothing" moving will be easier to encode than a sports video, and use less bits per frame.

Best Answer

Spatial redundancy

Temporal redundancy

Related Solutions

How to use ffmpeg to split a video into images and then reassemble exactly the same

Related Question