Banding artwork can be quite annoying. But, first, you may be wondering, what is a banding artifact?
You are enjoying a show on your brand new TV. Great content delivered great quality. But then, you notice some bands in an otherwise beautiful sunset scene. What was that? A sci-fi plot twist? Some device error? Presumably, the banding artwork, which appears as a false stair edge, should vary smoothly in the image area.
Bands that may appear in the sky in sunset scenes, dark scenes, flat backgrounds, etc., in any case, we do not like them, and no one should be distracted from the story by their presence.
Just a subtle change in the video signal can cause banding artwork. This slight variation in the value of some pixels affects the perceived quality. Bands are more visible (and annoying) when the viewing conditions are right: dark environment without good contrast large TVs and screen reflections.
Below are some examples. Since we don’t know where and when you’re reading this blog post, we exaggerate the banding patterns, so you get the gist. The first example is from an opening scene of one of our first shows. Check the sky. Can you see the bands? The viewing environment (background brightness, ambient light, screen brightness, contrast, viewing distance) affects the band’s visibility. You can play with those things and observe how the concept of banding is affected.
Banding patterns are also found in compressed images, as we have often used in this to illustrate the point:
Voyager even encountered banding on the way; xkcd 🙂
We set up an experiment to measure perceived quality in the presence of banding artwork. We asked participants to rate the effect of the banding artwork on a scale from 0 (obsolete) to 100 (invisible) for different videos with different resolutions, bit-rates, and ditching. Participants rated a total of 86 videos. Most of the content was banding-prone, some not. Collected Average Feed Score (MOS) across the full scale.
According to the general metrics, test videos with perceptible banding should be of high quality (e.g., PSNR> 40dB and VMAF> 80). However, the test score shows something completely different, as we will see below.
Encodes video on Netflix scale. Similarly, the quality of the video is assessed on a scale within the encoding pipeline, not by the army of people rating each video. This is where objective video quality metrics come in, as they automatically provide actionable insights into the actual quality of an encode.
PSNR has been the primary video quality metric for decades: it is based on the average pixel distance of the video encoded from the source video. In the case of banding, this distance is smaller than its perceptible effect. As a result, PSNR numbers contain very little information about banding. Thematic test data confirm the lack of correlation between PSNR and MOS:
Another video quality metric is VMAF, which Netflix has jointly developed with several partners and is open source on Github. VMAF has become a Indeed Evaluating the performance of the encoding system and the quality of running the encoding optimization, which is an important factor for the quality of Netflix encoding. However, VMAF does not specifically target the banding artwork. It was designed to capture video quality of movies and shows in the presence of encoding and scaling artwork, especially in our streaming use. VMAF works exceptionally well in general, but, like PSNR, the presence of banding has nothing to do with MOS:
VMAF, PSNR and other commonly used video quality metrics do not accurately detect banding artifacts and if we do not catch the problem, we cannot take steps to fix it. Ideally, our wish list for a banding detector would include the following items:
- High relationship with MOS for distorted content with banding artwork
- Simple, intuitive, distortion-specific, and based on the principles of the human visual system
- Consistent performance across different resolutions, qualities and bit-depths delivered in our service
- Powerful to disable, which video pipelines typically turn on
We have not found any algorithm in the literature that suits our purpose. So we are ready for a development.
We created by hand an algorithm to meet our needs in a traditional theoretical NNN (non-neural network) method. A white box solution that is available with only a few, visually inspired, parameters from the first principles: the contrast-conscious multiscale banding index (CAMBI).
A block diagram describing the steps involved in Cambi is shown below. Cambi acts as a no-reference banding detector, taking a (distorted) video as an input and generating a banding visibility score as the output. The algorithm produces pixel-level maps on multiple scales for encoded video frames. Subsequently, it added these maps to a single index inspired by the Human Contrast Sensitivity Function (CSF).
Each input frame goes through three pre-processing steps.
The first step is to extract the luma component: although chromatic banding exists, although like most works in the past, we assume that most of the banding can be captured in the luma channel. The second step is to convert the Luma channel to 10-bit (if the input is 8-bit).
Third, we calculate for the presence of the drawing in the frame. Diathering is a deliberately applied noise used to randomize a quantization error that is shown to reduce banding visibility. For both dithard and non-dithard content, we use 2 × 2 filters to smooth the intensity values to replicate the low-pass filtering performed by human visual systems.
Multiscale banding confidence
We consider banding detection to be a contrast-detection problem, and so banding visibility is primarily regulated by CSF. The CSF itself largely depends on the contrast felt across a step and the spatial frequency of the steps. Cambi clearly calculates the contrast across the pixels by looking at the difference in pixel intensity and does it on multiple scales to calculate the spatial frequency. This is done by calculating the pixel-based banding confidence in different contrasts and scales, each referred to as a CAMBI map for the frame. The banding confidence calculation also takes into account the sensitivity of the brightness change depending on the local brightness. At the end of this process, twenty cambi maps are obtained in four contrast steps and five scales capturing banding across each scale.
Cambi maps are spontaneously assembled to get the final banding indicator. CAMBI maps are spatially pooled based on observations within the initial linear phase of the CSF. First, the pooling contrast level is applied, leaving the maximum weighted contrast for each position. The result is five maps, one on a scale. Further down this post are examples of such maps.
Since the poorest quality regions dominate the perceived quality of the video, only a percentage of the pixels, with the highest banding, are considered for spatial pooling for the map on each scale. Scale reflective scores are linearly matched with CSF-based weights to find the CAMBI for each scale.
According to our experiments, Cambi is temporarily stable in a video shot, so a simple average is enough as a temporary pooling process across the frame. However, keep in mind that this feature is broken for videos with multiple shots with different features.
Our results show that Cambi provides a high correlation with MOS, as described above, with very little correlation between VMAF and PSNR. The table reports two correlation factors, such as Spearman’s Rank Order Correlation (SROCC) and Pearson’s Linear Correlation (PLCC):
The plot below imagines that the CAMBI is well related to the thematic score and a CAMBI of about 5 where the banding starts to be a bit annoying. Notice that, in contrast to the two quality metrics, CAMBI is inversely related to MOS: the higher the CAMBI score, the more perceptible the banding is, and thus the lower the quality.
We use this sunset as an example of how Banding and Cambi score. Below we also show the same sunset with fake colors, so the bands pop up even more.
There is no banding on the sea part of the photo. In the sky, the size of the bands increases as the distance from the sun increases. The following five maps, on each scale, gain the confidence of banding at different spatial frequencies. These maps are more spatially compiled, calculated by CSF, CAMBI score 19 for the frame, which is understandably ‘Annoying‘Per’Very annoyingBanding according to MOS data.
A banding detection process that is strong on multiple encoding parameters can help detect the onset of banding in the video and act as a first step towards mitigation. In the future, we hope to use Cambi to develop a new version of VMAF that can account for the banding artwork.
We have open-source CAMBI as a new standalone feature in libvmaf. Like VMAF, Cambi is an organic project that is expected to improve gradually over time. We welcome any feedback and contributions.
We would like to thank Christos Bumpis, Kyle Swanson, Andre Norkin and Anush Murthy for their fruitful discussions and for making this work possible for the participants in the thematic experiments.