What Is a Linearized PDF and Why Are They Important?
When it comes to downloading or viewing documents over the internet, PDFs have long served as a de facto standard for most organizations. Since PDFs are not a proprietary file format, there’s rarely any risk that someone will be unable to open them. However, just because PDFs have become so commonplace doesn’t mean that they all share the same characteristics. For anyone who has ever wondered why some PDFs seem to take so much longer to load than others, the answer often has less to do with connection and processing speeds as it does with the way the PDF’s content is organized.
More specifically, it’s a matter of whether or not the document is a linearized PDF.
What Is a Linearized PDF?
Sometimes called “fast web view,” linearization is a special way of saving a PDF file that organizes its internal components to make them easier to read when the file is streamed over a network connection. While a standard, non-linearized PDF stores information associated with each page across the entire file, linearized PDFs use an object tree format to consolidate page elements in an ordered, page by page basis. When a reader opens a linearized PDF, then, all of the information needed to render the first page is readily available, allowing it to load the page quickly without having to search the entire document for a specific object like an embedded font.
Originally introduced with the PDF 1.2 standard in 1996, linearized PDFs were critical to the format’s early internet success. In order to view a non-linearized PDF, the entire document needs to be downloaded or read via HTTP request-response transactions. Given the bandwidth limitations of early internet connections (often still between 28.8k and 33.6k in 1996), this created a serious bottleneck problem when it came to document viewing. While it was possible to view a document without downloading it, the multiple HTTP requests needed to do so could easily be disrupted if the connection was lost, something that was all too common in the days before reliable broadband connections were introduced.
Non-Linearized vs Linearized PDFs
To visualize the difference between a non-linearized PDF and a linearized PDF, imagine two separate people sitting down to file their business taxes. One person has all of their receipts, invoices, and financial documents scattered across their office, with some stacked in unordered piles, others crammed into unlabeled folders, and even more stuffed into assorted drawers and file cabinets. Finding and organizing all of this documentation would take almost as much time as actually filing the taxes themselves! The second person, however, has all of the records they need stored in a neatly labeled file cabinet, allowing them to retrieve everything quickly and easily.
The first example is similar to a non-linearized PDF, while the second shows how much easier it is for a reader to access the information it needs to render the file. Even better, since each page is organized in the same way, jumping to a different page in a multi-page PDF doesn’t require the reader to reload the entire file. It can simply read the current page and get everything necessary to display the PDF correctly.
Why Linearized PDFs Are Still Valuable
In a world dominated by high speed internet connections, it’s fair to wonder whether or not PDF linearization is still necessary. For small PDFs that are only a few pages, linearization may not be essential, but when it comes to larger documents, linearization can still deliver substantial performance and user experience benefits.
Consider, for instance, a document that consists of several hundred, or even several thousand, pages. Loading that entire document and keeping it cached may be possible, but it’s an inefficient use of processing and bandwidth resources. With a linearized PDF, a reader typically encounters a linearization directory and hint tables at the top of the document, which provides it with instructions on where to locate any necessary resources within the file. After loading the hint tables and the first page, the reader stops the download process rather than opening the entire file. When the user navigates to another page, the reader can quickly reference the hint tables and jump to that page.
This ensures that the reader is only ever loading the pages that actually need to be displayed, which helps to conserve memory, processing resources, and bandwidth. For mobile devices with limited file and cache storage, linearized PDFs are much easier to manage than their non-linearized counterparts. They also provide some protection against network interruptions, which could make it difficult to download and view an entire document.
How to Linearize PDFs
Although the linearization process is well laid out in the current PDF standards documentation, many PDFs are created using software that doesn’t automatically linearize the content. More importantly, some linearized PDFs are “broken” by a process called incremental saving, which saves minor updates at the end of the file, rather than changing existing structure. Over time, too much incremental saving can undermine the effectiveness of a linearized PDF.
The best way to resolve such problems and linearize the PDF is to save a new, linearized version of the file using PDF editing and conversion tools like Accusoft’s ImageGear PDF or PDF Xpress. Saving a linearized version renumbers objects within the file for more efficient web viewing.
Take Control of PDFs with Accusoft SDKs
Accusoft’s ImageGear SDK collection provides a broad range of document and image functionality that allows applications to more effectively create, convert, and compress PDF files. In addition to linearization and annotation support, ImageGear PDF allows developers to use encryption, password, and signature tools to strengthen PDF security.