Tools for Reading RAW Data
Dcraw
The most famous tool for reading RAW data is undoubtedly dcraw. Dcraw can convert RAW files of various encoding formats into TIFF or PPM format.
Using the command line dcraw -4 -T -D file_name
yields a 16-bit TIFF file that records the direct RAW values, without any processing such as demosaicing, white balancing, or black level subtraction.
Unfortunately, dcraw’s last update was on 1 June 2018. Consequently, it no longer includes additional parameters (like white balance, colour matrices, etc.) for cameras released after that date, and even its core function of extracting RAW data may not work reliably.
For instance, with the lossless compressed RAW format introduced by Sony in their fourth-generation cameras (e.g., ILCE-7M4), dcraw reports a “cannot decode file” error, even though the file extension is still .arw. In contrast, it can still decode the older uncompressed RAW format.
dcraw.c
is the core file of dcraw, consisting of over ten thousand lines of pure C code.
Rawpy/LibRaw
Rawpy is a Python wrapper for LibRaw. LibRaw provides a unified interface for accessing RAW data to extract pixel values. It is based on dcraw, having refactored dcraw.c
into a more modern and modular library, and has continued to be supported after dcraw ceased development.
Differences Between Methods and Brands
Sony
Uncompressed RAW
Test model: ILCE-7CM2:
The following methods yield identical results when reading uncompressed RAW files:
- Converting with Dcraw to TIFF and then reading with OpenImageIO
- Reading directly with Rawpy
- Converting with Adobe DNG Converter and then reading with Rawpy
- Converting with Adobe Camera Raw and then reading with Rawpy (same as above, although the dimensions appear different when viewed in ACR)
- Converting with Adobe DNG Converter, then converting to TIFF with dcraw, and finally reading with OIIO
The dimensions read by the above methods are (4688, 7040), which is 33,003,520 pixels, with a value range of 0-16383.
In practice, the DNG files converted by Adobe DNG Converter and Adobe Camera Raw are identical, so this will not be detailed further. Reading DNG files with Rawpy and dcraw is also equivalent.
The following method produces a different result:
- After converting to DNG with Capture One and then reading with Rawpy (or converting to TIFF with dcraw and then reading), the image becomes (4672, 7008) with a value range of 0-65535. The height is reduced by 16 pixels (8 from the top and 8 from the bottom), and the width is reduced by 32 pixels (12 from the left and 20 from the right). Even after cropping, there are differences compared to the other methods. Looking at the ratio (after normalising both to their respective maximums of 65535 or 16383), the maximum ratio is 1.04, the minimum is 0.95, and the average is 1.00. This alignment was performed by eye in Photoshop.
Lossless Compressed RAW
The principle behind Sony’s lossless compressed RAW is to first pad the data with zeros to a multiple of 512, then divide it into blocks, and finally separate it into four sub-images based on the Bayer pattern for differential and Huffman coding.
For lossless compressed RAW, the situation is more complex because there is currently no way to convert between uncompressed ARW and lossless compressed ARW files (if two separate shots are taken, even with tethered shooting, the resulting displacement errors would be larger than one pixel). The following are the tested scenarios:
- Dcraw does not support lossless compressed RAW (as Sony introduced it after dcraw was no longer being updated).
- Reading the ARW with Rawpy yields dimensions of (5120, 7168). This is due to the block-based compression (a multiple of 512). Only the top-left (4688, 7040) area contains image data; the rest is filled with zeros (not the black level), and the values range from 0-16383.
- After converting to DNG with Adobe DNG Converter and then reading with Rawpy, the resulting dimensions are (4686, 7038), which is two pixels smaller in each dimension than the uncompressed RAW. If you crop 2 pixels from the bottom and 2 from the right of the padded image from the previous scenario, the results match perfectly, also with a range of 0-16383.
- Reading a DNG converted by Capture One with Rawpy also results in dimensions of (4672, 7008) and a range of 0-65535. This is again cropped by 8 pixels top/bottom, and 12 left / 20 right. After cropping the content area of the ARW read by Rawpy, the results are a close match (the average of the differences is even the same, around -5e-6, which requires further testing).
Regarding DNGs from Capture One
Theoretically, a codec that decodes RAW and encodes to DNG should not introduce complex errors. However, DNGs exported from Capture One not only have different dimensions but also stretch the original 14-bit data to 16-bit, and they do not perfectly match the RAW data read by other methods.
With the help of Gemini and DeepSeek, a more detailed analysis was conducted. Regarding the conversion from 14-bit to 16-bit, Capture One appears to perform a left bit shift by two places, which is equivalent to multiplying by 4. After left-shifting the ARW data read by Rawpy and then comparing it with the C1-exported DNG (by division and subtraction), the resulting quotient is 1.000004, and the difference is on the order of e-7. The R and B channels match perfectly; all errors come from the two G channels and are content-dependent. In some images, the maximum error in the G channels can reach up to 10%, though in most cases, it does not exceed 5%.
Best Practices for Sony RAW
In summary, the recommended approach for using Sony RAW files is to shoot in uncompressed RAW and then read them directly with Rawpy. You can use Adobe DNG Converter to conveniently convert uncompressed RAW files into lossless compressed DNGs to reduce file size without any loss. Alternatively, you can read lossless compressed RAW files with Rawpy and crop them, but be aware that converting lossless compressed RAW to DNG will result in the loss of two rows and two columns of pixels. DNG files obtained through other methods have unknown factors and should not be used.
Canon
Test model: 600D, which outputs CR2 files.
Reading with Rawpy and reading after conversion to DNG yield identical results. The image dimensions are (3516, 5344). The 142 pixels on the left and 51 pixels on the top appear to be the optical black area (the part physically masked for black level calibration), which reads out values close to the black level. The remainder is the image.
Dcraw can process files from the 600D. It reads out the part without the optical black area, which matches the cropped data from Rawpy or a DNG conversion.
The CR3 output from an R6 Mark II (read with Rawpy or via DNG conversion) is similar. The left 154 pixels and top 96 pixels constitute the optical black area, and there is also a white area of 8 pixels on the right.
Hasselblad
Test model: X2D-100C. The camera outputs files in 3FR format.
The sensor in this camera is identifiable; it is highly likely the IMX-461. (Correction: It’s not just “highly likely”; the sensor model IMX-461BQR can be found directly within the 3FR file.)
Hasselblad has several file formats, primarily 3FR and FFF. You can convert from 3FR to FFF using the Phocus software. During conversion, you have options for adjustments and whether to use the embedded 3FR configuration, but these do not affect the resulting FFF file’s data when read by other software (e.g., it reads the same with Rawpy). Examining the file headers reveals that 3FR follows the little-endian TIFF specification (starting with 49 49 2A 00
), while FFF uses the big-endian TIFF specification (starting with 4D 4D 00 2A
).
The specification sheet for the IMX461-BQR is public. It lists total pixels as 11760x8896, effective pixels as 11664x8750, and active pixels as 11656x8742.
Reading with dcraw or Rawpy produces an image of size 11904x8842, which exceeds the sensor’s total pixel count. There is a noisy border around the edge (76px on the left, 68px on the right, 2px on the top, and none on the bottom). Further in is the optical black area (48px on the left and right, 90px on the top, and none on the bottom). The innermost part is an image of size 11664x8750. The width of the image including the black area is 11760 (equal to the total pixel width), but the height is 8840 (which is rather strange).
Both 3FR and FFF files can be converted to DNG. The resulting DNG has the dimensions of the inner image area and can be aligned with it, but the content is slightly different.
Currently, the best method for reading Hasselblad RAW files seems to be reading the 3FR with LibRaw, extracting the image area, and potentially using the optical black area for black level correction.
Fujifilm
Test model: X-T5.
Fujifilm’s RAW files are rather unique because their colour filter array is not a standard Bayer pattern but an X-Trans pattern, which has a minimum repeating unit of 6x6 pixels. Fortunately, this does not affect our analysis of the raw image data itself.
Feeding the RAF file directly into Rawpy yields an image with a width of 7872 and a height of 5196. After conversion to DNG, the width is 7728 and the height is 5152. The distribution of the extra 144 pixels in width and 44 pixels in height is as follows:
- 12 pixels of image data on the left; 12 pixels of image data and 120 black pixels on the right.
- 16 pixels of image data and 5 black pixels at the top; 16 pixels of image data and 7 black pixels at the bottom.
The pixel values in the overlapping areas are identical.
Additionally, Rawpy reads the raw pattern from the RAF file incorrectly, whereas the raw pattern in the DNG is correct.
Nikon
To be continued
I’ve found that the return on investment for this research is extremely low, and there is still much to learn. I will pause after a preliminary investigation of Nikon files. For practical use, the recommendation is to convert files to DNG using Adobe DNG Converter and then read them with Rawpy to directly access the image content.