Skip to main content

Solving the “Cornell DICOM Bug” with PICTools Medical

Imaging compression and conversion is often taken for granted today, but it once represented a significant hurdle for many industries. Organizations needed the ability to manipulate image formats in order to share and view files between different systems. As early solutions to this problem took shape, there were more than a few unexpected bumps along the way. In the healthcare industry, one of the most well-known examples of these challenges involved an open-source code library for JPEG compression often referred to as the “Cornell codec.”

The Cornell DICOM Bug

The development of the JPEG standard for compressing digital images proved incredibly vital for the growth of the internet in the early to mid-1990s. A form of lossy compression, JPEG made it possible to easily convert, transmit, and view images without the need for special hardware or software. Unfortunately, the nature of the lossy compression process, which uses inexact approximations and partial data discarding to reduce file size, proved unsuitable for some industry use cases. The medical industry, for instance, requires reversible compression techniques that do not degrade the original data of DICOM images when converting them to other formats.

One of the first open-source implementations of lossless JPEG compression came from Cornell University. Developed by Kongji Huan and Brian Smith, the Cornell codec source code was made freely available for public use in 1994. The codec used an automatic prediction selection value (PSV) selector to ensure the highest quality compression ratio for lossless JPEG.

Given the open source status of the codec, many developers quickly incorporated it into their applications to start compressing a variety of image files with no degradation in quality. Even large corporations turned to the Cornell codec (as well as a similar open source code library from Stanford) when implementing lossless compression into a broad range of products.

Unfortunately, there was a hidden problem that quickly became apparent. While the Cornell codec worked well for many lossless JPEG applications, it kept producing errors when trying to convert certain types of DICOM images. The specific error created invalid values in the Huffman table the codec used to facilitate JPEG compression, which rendered images unreadable when decompressed. For developers building applications working with the DICOM format, this apparent bug in the Cornell code created massive headaches, especially after it was already well-integrated into their software.

The Source of the Problem

As a longtime player in the digital imaging field, Accusoft (then Pegasus Imaging) first became aware of the Cornell bug through conversations with customers in the medical industry. Although it was still the early days of electronic health records (EHR) and picture archiving and communications systems (PACS) being implemented throughout the healthcare system, providers were already looking for software tools that could help them manage the massive image files generated by medical scanning equipment. Our image conversion and compression SDKs were an ideal solution, especially when development teams and healthcare providers started encountering problems with open source tools like the Cornell codec.

After investigating the recurring errors, our team quickly discovered that the issue was being caused by the way the codec handled 16-bit images during the compression process. Since a large proportion of image files are 8-bit, the Cornell library had no trouble handling most formats, but the DICOM imaging standard posed a serious challenge.

Although DICOM image files can be written in a variety of bit ranges, MRIs, CT scans, and X-Rays require higher resolution grayscale images for medical professionals to make an accurate diagnosis. They are typically between 12 and 16-bit, so any lossless JPEG compression solution needs to be incredibly precise when converting them down to an 8-bit JPEG image. 

Unfortunately, the Cornell lossless codec appeared to have misinterpreted the specifications for 16-bit DICOM images, which created an overflow error in the Hoffman table. The bug created two distinct problems for customers. In the first place, they had to implement a solution to ensure they were compressing 16-bit DICOM files properly. More importantly, however, they needed a way to decompress files sent to them that had been compressed using the Cornell code library.

The Accusoft Solution

After encountering the DICOM error in a variety of circumstances, the Accusoft team eventually developed a code-based workaround that allowed our customers to successfully decompress images they feared were corrupted beyond repair. Now that we’ve identified the problem and are able to recognize the error when it occurs, our PICTools Medical SDK integration can easily reconstruct the DICOM image to its original condition.

Although more than twenty years have passed since the Cornell DICOM bug first became apparent, healthcare organizations are still encountering the problem today. That’s because many software applications are still using elements of the Cornell lossless JPEG codec library and often don’t even realize it. Thanks to our PICTools Medical solution, we can quickly address the issue to help organizations access their compressed files.

To learn more about how PICTools Medical can resolve the infamous Cornell DICOM bug and enhance your medical imaging workflow, have a look at our developer resources and download a free trial of the SDK today.