The science of video encoding is intricate and extensive. With TestDevLab's commitment to enhancing video quality evaluation services, we have delved deeper into understanding how various codecs and their settings can impact the quality of video.
In this blog, we take a closer look at H.264, the most popular video compression codec, and compare the video quality at different resolutions and bitrates. We will also discuss how to encode videos with different parameters and explore how video quality is affected by these different bitrates and resolutions. To measure the video quality, we will be using two video quality metrics—VMAF and VQTDL. At the end, we will compare the video quality scores and share our key findings on exactly how bitrate at different resolutions affects video quality.
Let’s get started.
What is H.264?
Advanced Video Coding (AVC), also known as H.264, is currently the most widely used video compression standard. H.264 is a very well-established standard, already in use for over 20 years. Development on it started around the year 2000, with it being formally approved in March, 2003 and finalized in July, 2004. Today, H.264 remains a dominant video codec, with over 83% of industry professionals using it. H.264 is compatible with most modern streaming protocols and can be played on almost every device or player. H.264 works by performing prediction, transformation and encoding processes to process a frame of a video in units of a macroblock (16x16 displayed pixels) where it forms a prediction to look for similarities between frames. As a result, it reduces the overall amount of data for the given video, while still being able to provide a high quality video at low bitrates.
Video quality metrics
VMAF
One of the video quality metrics that we use for evaluation is VMAF. It is a full-reference video quality assessment algorithm developed by Netflix. VMAF gives quality scores from 0-100 for more human interpretable scoring compared to ones coming from PSNR and SSIM as examples. Our implementation of VMAF also uses temporal alignment by using the added AruCo markers to compare the correct reference and degraded video frames.
VQTDL
Video Quality Testing with Deep Learning, or simply VQTDL, is a no-reference image quality evaluation algorithm developed by TestDevLab that correlates well with subjective human perception. It was developed specially for real-time streaming and WebRTC products.
Preparing videos with ffmpeg
Before we begin evaluation, we first need videos to evaluate. For this, ffmpeg is a perfect tool.
For video encoding there are 3 main variables:
- Resolution. This refers to the two numbers that show the height and width of the video. Multiplying both numbers will give you the total amount of pixels the video has.
- Bitrate. This is the amount of data packed into each second of a video. A higher bitrate typically results in a more detailed visual experience.
- Codec (and their different settings). This is an algorithm that will try to efficiently use the bitrate available to give the best quality representation of video.
To create videos of different qualities that can later be evaluated, we used ffmpeg to convert the full-quality reference video and encoded it at different settings. For resolutions, we chose the main resolutions provided by most video players—144p, 240p, 360p, 480p, 720p and 1080p.
For bitrate we tried 200kbps, 500kbps, 1000kbps, 2500kbps, 5000kbps and 8000kbps. These bitrate values are what YouTube recommends as part of their upload encoding settings. It's important to note that while these were the parameters we used during the encoding process, the end results were not precisely these numbers. This is because there are multiple factors that influence how ffmpeg allocates bitrate, like the codec efficiency itself, the content of the video, and other factors. So, the end results are not precisely the chosen bitrates but close to that.
For this test we used a video of a person sitting by a table with a fixed camera and a fixed position. In the future, we also plan to test video with more dynamic content, however, in this case, considering that most of the video is static, the end results will be more stable, as it is easier for the codec to encode content that is not changing.
The ffmpeg command:
ffmpeg -i reference.mp4 -vf scale=-2:{target_resolution} -c:v libx264 -minrate {target_bitrate}K -maxrate {target_bitrate}K -b:v {target_bitrate}K output.mp4
Check out the full list of videos used to create this report.
Evaluation
Our VMAF implementation is based on the publicly available Netflix implementation, however, it also adds our own developed spatial and temporal video alignment using special AruCo markers which allows us to evaluate videos of different resolutions or videos that are not originally synced with the reference video.
For this use case, we scaled each of the videos to 720p to simulate the same viewing experience for all evaluations.
Results
VMAF
For each resolution, an overtime graph like this was generated:
These graphs show the differences that bitrate can have at the same resolution. In this case, you can see that there is a big jump in scores between 200kbps and 500kbps bitrate, while beyond that the difference between results is much smaller. Also, because we are using a non-dynamic bitrate, the 200kbps start of the video has low quality. This is because there isn't enough bitrate to have the full image quality right from the start and it needs a few seconds to establish the full image quality. Later, the bitrate is mostly used on changes between frames and as the video we are using is quite static, the low bitrate is actually enough to adapt to those changes.
With each smaller resolution, the difference between 200kbps and 500kbps becomes even more gradual and there is a somewhat visible difference in all resolutions until 240p.
At 144p, the difference between the bitrates becomes minimal where there aren't any major jumps between the scores.
If we take a look at all of the averages between the different resolutions, we can make multiple interesting observations:
- At lower resolutions, especially 144p and 240p, the quality does not significantly increase, even with more bitrate, because the limited resolution cannot accommodate it.
- Because bitrate has to accommodate for more pixels, at lower bitrate, lower resolutions are actually more advantageous. For example, the quality of the 720p video at 200kbps (scoring 84) is better than the 1080p video at 200kbps (scoring 79)
- The highest bitrate at 144p that we could achieve was around 1700 kbps and the difference in scores between the lowest bitrate at 200kbps was only a 6.5% increase.
- At 1080p, since the resolution can accommodate the 8000 kbps bitrate, the difference in VMAF scores between that and the lowest bitrate at 200kbps is a much larger 17.5% increase.
VQTDL
Just as we did with VMAF, we measured the quality using VQDTL and generated an overtime graph.
If we take a look at 1080p results, there are some differences compared to how the data behaves with VMAF:
A more noticeable drop in results starts at 1000kbps and there is an especially big jump in results from 200kbps to 500kbps. While the overtime data is not as stable as it was with VMAF, these quality drops are from excellent at 4.8 to very high at 4.2, so when observing the differences subjectively, it is very minor.
The range of results between the different bitrates decrease with each resolution until the 240p resolution, where there is only a noticeable difference between the 200kpbs and 500kbps results:
At 144p there are no more noticeable differences between the different bitrates:
Lastly, there are the average results for all bitrates and resolutions:
- Just like with VMAF, at lower resolution, the quality does not increase significantly with higher bitrate and it is better to have a lower bitrate as it has almost the same end results.
- Similar to VMAF, there is only about a 6% increase between the lowest and highest bitrate at 144p.
- At 1080p, the difference between the highest and lowest bitrate results is about 30%, but due to the smaller overall scale of the scores compared to VMAF, the difference is between a 4.7 which is considered close to the original image and a 3.6, which is still a good quality image with no major artifacts.
YouTube
To further compare the video quality using VMAF and VQTDL, we also uploaded the static video to YouTube and evaluated the different resolution options that were offered—144p, 360p, 720p and 1080p. We used the yt-dlp tool to later download these videos and evaluate them just as our own generated tests.
YouTube encodes each of these resolutions with the following bitrate:
- 144p - 49kbps
- 360p - 142kbps
- 720p - 330kbps
- 1080 - 638kbps
VMAF
As it can be seen in this average value graph, the results for these YouTube videos correlate well with both the bitrate levels as well as the final results gathered:
The 1080p average result for YouTube is between the 500kbps and 1000 kbps bitrate result, while the 720p results are a bit higher than the 200kbps results.
Both the 360p and 144p are the lowest quality since the bitrate YouTube encoded these videos is lower than the 200kbps that was chosen as the lowest bitrate value for our created videos.
VQTDL
We can see a very similar result for the VQTDL data. The YouTube data correlates with the data gathered from the videos we created the same as it was with VMAF:
Key takeaways
The data we collected provides us with valuable information regarding the importance of bitrate at different resolutions and how it affects video quality using H.264. Here are some of our key takeaways:
Quality increases become more marginal with higher bitrate
The higher the bitrate, the more marginal the quality increase will become. For example, at 1080p, the ~300kbps increase from 200kbps to 500kbps gives an increase of around 11% in VMAF scores but at the highest, the ~3000kbps increase from 5000kbps to 8000kbps gives a VMAF increase of only 0.5%.
Balance between bitrate and resolution is important
Bitrate gains can be outweighed by both the resource-burdensome process of trying to add more bitrate, as well as the end result of having larger files.
At some point, you can't add more bitrate
As was observed, at some point, there just can't be more bitrate added to a specific resolution video, for example where at 144p the max bitrate that was possible was only around 1700kbps.
At lower resolution, it can be better to use lower bitrates
At these lower resolutions, you can achieve a very similar result with much less investment, for example, with 500kbps and a 144p resolution we can get VMAF scores of 48, while the max bitrate at 1700kbps only gives scores of 49. Similarly, at a higher 480p resolution and 500kbps we get VMAF scores of 88, while at a max resolution of 8000kbps, the scores are around 90, meaning a 2% increase in scores but almost 16 times the increase in bitrate.
Both VMAF and VQTDL show consistent scores
Comparing both VQTDL and VMAF, there was a clear correlation between these metrics, as higher bitrates resulted in higher scores. We also saw cases where there were more major drops between some bitrates, like in the case between 200kbps and 500kbps at 1080p resolution.
In this blog post we presented only a small part of the experiments we have been running on video quality evaluation. Stay tuned for future posts where we explore how dynamic content changes the behavior of the video quality as well as exploring codecs outside of H.264.
Do you have a software solution that relies on high video quality? We can help you find out how well your solution performs under different real-life conditions. Get in touch to learn more about our audio and video quality testing services.