What is an “Image”?

An image is essentially a ‘matrix’ or an ‘array’. For example, the most common type of image might be an array with the shape (Height, Width, Channel), where the number at each position is quantised to eight bits, representing 256 different levels as an integer or floating-point number. The data stored here is generally called the code value or pixel value.

Encoding

Storing this array directly can result in a very large file, so compression methods, or ’encoding’, are needed. Encoding methods are constantly evolving. For instance, there’s JPG (JPEG) encoding, which uses techniques like chroma subsampling, Discrete Cosine Transform (DCT), and Huffman coding for compression. There’s also the highly extensible TIFF (*.tif) format, which is a container format that can internally use various lossless/lossy encoding methods like ZIP, LZW, PackBits, and even JPEG. Then there are more advanced codecs like HEIC (HEIF/HEVC) and AV1, which feature flexible partitioning structures, multi-mode intra-prediction, and more advanced entropy coding, enabling very high encoding efficiency.

The encoding process may introduce some loss, which is known as lossy compression. The commonly used JPG standard includes both lossless and lossy modes, although the lossless mode is rarely used. Many later encoding methods also support lossless compression—AV1 even offers a true lossless profile. As long as one is not deliberately pursuing extreme compression ratios, the quality loss from lossy compression is actually difficult to perceive.

Another difference between various encoding methods is the quantisation bit depth they allow. The JPG standard supports 8-bit and 12-bit quantisation, but the 12-bit mode is also uncommon. Newer formats often support higher bit depths; for example, HEIC and AVIF can handle 10-bit or even 12-bit, while a TIFF container can hold data with 16 or even 32 bits (floating-point) per channel. This is closely related to the requirements of HDR: to avoid visible banding over a wider brightness range, higher quantisation precision is needed.

Additional Information

A pile of numbers is meaningless on its own; we also need to know the space in which they are defined. At a minimum, the primaries, white point, and transfer function (i.e., the definition of the RGB space) must be specified.

This additional information tells the decoder and the system’s colour management the specific meaning of these numbers. Without it, the data is usually treated as sRGB, and if the colour space does not match the actual data, major problems will occur.

After the decoding software extracts the pixel values and additional information, the colour management system converts these pixel values from a known colour space into driving values for the display, allowing them to be displayed correctly. Therefore, in theory, the code values can even be tristimulus values and linear, as long as there is corresponding additional information to define them. The images below are two PMCC colour charts tagged with the XYZ space and a linear transfer function. The pixel values in the images are directly the tristimulus values.

PMCC colour chart XYZ image under a D65 illuminant

This is a tristimulus value image of a colour chart under a D65 illuminant, where values exceeding 1 have been clipped. This is actually an incorrect way of handling it; CIEXYZ in nclx requires an equal-energy white as the white point, and a Bradford CAT from an ICC profile could be used for the conversion.

PMCC colour chart XYZ image under a D65 illuminant, adapted to white point E

This is the image adapted to an equal-energy white point (done by directly replacing the white point). Because it is adapted to an equal-energy white, no values will exceed 1. When displayed, the system’s colour management will perform chromatic adaptation to the display’s white point, so its appearance should be close to the background in light mode. The page background in light mode has a code value of 245.

If you are using iOS or iPadOS, you may not be able to see these two images. Testing on other systems has shown they are generally visible.

Additional information can take many forms, such as an embedded ICC profile, XML statements or nclx data stored within the image file, or it can be stored in the EXIF data.

This step is the most critical part of achieving an HDR effect. Once the correct transfer function is specified, the decoder can convert the code values into what is known as “HDR” content, which can exceed the nominal luminance of SDR.

There is a rather special method for implementing HDR called a Gain Map. A single file stores two images (an SDR image and a gain map) along with some corresponding additional information (specific gain coefficients, etc.). The decoder can then compute a new HDR image from the two images. Therefore, the gain map could perhaps also be considered a form of additional information.

Display-Referred Linear Light

When performing image format conversions, all references to linear light space should be Display-Referred. This means calculating the luminance (either brightness or absolute tristimulus values) of the image on the display after it has been shown.

During decoding, the EOTF is used. During encoding, the inverse EOTF is used, not the OETF (there is a distinction for transfer functions like PQ).

PQ or HLG Transfer Functions

Similar to HDR video, changing the transfer function from Gamma or Rec. 709 to PQ or HLG can achieve the transition from SDR to HDR. For still images, the international standard ISO-22028-5 already exists.

Canon was the first to introduce 10-bit HEIC encoding in its mirrorless cameras, using PQ as the transfer function. Sony has HLG for still images. In recent versions of ACR, the AVIF and 16-bit TIF files produced when HDR output is enabled without maximum compatibility are PQ-encoded.

For this type of HDR image, one simply needs to apply the correct transfer function to convert to or from linear light.

Gainmap

A Gainmap is a method for implementing HDR specifically for still images. Its advantage is excellent compatibility, as it can store both SDR and HDR content simultaneously (rather than relying on dynamic metadata and a TMO) and is very friendly to display drivers.

JPG, JXL, and AVIF can all store this format. In particular, a JPG with a Gainmap is essentially two JPG files concatenated together. Image viewers that do not support this format will simply read the first file as a standard SDR image. When sending the original image on social media, the subsequent Gainmap can be preserved. Even if the app itself does not support it, saving it to another app may still reveal the HDR effect.

The first large-scale application of Gainmap was likely on the OPPO Find X6 Pro. Later, Google promoted the UltraHDR format. The ISO is currently developing the ISO-21496-1 standard, and UltraHDR version 1.1 is already compatible with this standard.

A Gainmap can be written for luminance only or for all three channels. The “ProXDR” in the recently released OPPO Find X8 Ultra refers to a three-channel Gainmap.

A Gainmap can be understood as a form of Supplemental Enhancement Information (SEI) or Colour Remapping Information (CRI), which records the difference between the SDR and HDR sources. Additionally, it stores the absolute luminance relationship of the Gainmap through something akin to static metadata.

The metadata includes: content max luminance gain (how much brighter the HDR is compared to the SDR), display max luminance gain (how much brighter the master HDR is compared to the SDR), the Gamma used for encoding the gainmap, and an optional offset.

Regarding the content max and display max luminance gains, an example is the HDR limiter in ACR, which can limit the HDR headroom during post-production to ’n’ stops. For example, if a three-stop limit is set, the maximum display gain during post-production is three stops, but the content may have a luminance gain exceeding three stops, which is simply clipped. The purpose of setting this display max luminance gain metadata is likely to restore the creative intent from the time of production.

Regarding SDR’s Nominal Luminance

Although rarely adhered to in practice, SDR actually has a specified white point luminance. For example, sRGB is 80 nits, and ITU-R BT.2035 specifies 100 nits.

SDR content can be converted to absolute luminance based on this value, and then encoded using the inverse EOTF. More often, the nominal luminance used is 203 nits. This value originates from the recommendations for various luminance levels in ITU-R BT.2408, where diffuse white is 203 nits, but it also states that this diffuse white luminance should not be interpreted as the nominal luminance for SDR.