Technical FAQs

Question

In PrizmDoc, my document appears to be small on the page relative to the viewer. How can I fix this?

enter image description here

Answer

By default, PrizmDoc renders a PDF file according to the MediaBox, which is normally the same as CropBox, though sometimes this is not the case. The larger area you see in the PrizmDoc Viewer is the size of the MediaBox. Please note that the product provides the fileTypes.pdf.pageBoundaries control option (or useCropBox in the older versions) to change the default behavior. Try setting the option to cropBox in the Central Configuration File in order to get the PDF content rendered according to the CropBox. You can read more about configuring image frame rendering in our documentation here.

For additional reading, see 7.7.3.3 on “User Space” of Adobe’s PDF 1.7 specification:

https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

Note: In some older versions of PrizmDoc, there exists an issue where setting the pageBoundaries field to cropBox can cause light blurring/distorting on the page. This issue was addressed in version 13.4.

Question

In PrizmDoc, my document appears to be small on the page relative to the viewer. How can I fix this?

enter image description here

Answer

By default, PrizmDoc renders a PDF file according to the MediaBox, which is normally the same as CropBox, though sometimes this is not the case. The larger area you see in the PrizmDoc Viewer is the size of the MediaBox. Please note that the product provides the fileTypes.pdf.pageBoundaries control option (or useCropBox in the older versions) to change the default behavior. Try setting the option to cropBox in the Central Configuration File in order to get the PDF content rendered according to the CropBox. You can read more about configuring image frame rendering in our documentation here.

For additional reading, see 7.7.3.3 on “User Space” of Adobe’s PDF 1.7 specification:

https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

Note: In some older versions of PrizmDoc, there exists an issue where setting the pageBoundaries field to cropBox can cause light blurring/distorting on the page. This issue was addressed in version 13.4.

When it comes to downloading or viewing documents over the internet, PDFs have long served as a de facto standard for most organizations. Since PDFs are not a proprietary file format, there’s rarely any risk that someone will be unable to open them. However, just because PDFs have become so commonplace doesn’t mean that they all share the same characteristics. For anyone who has ever wondered why some PDFs seem to take so much longer to load than others, the answer often has less to do with connection and processing speeds as it does with the way the PDF’s content is organized.

More specifically, it’s a matter of whether or not the document is a linearized PDF.

What Is a Linearized PDF?

Sometimes called “fast web view,” linearization is a special way of saving a PDF file that organizes its internal components to make them easier to read when the file is streamed over a network connection. While a standard, non-linearized PDF stores information associated with each page across the entire file, linearized PDFs use an object tree format to consolidate page elements in an ordered, page by page basis. When a reader opens a linearized PDF, then, all of the information needed to render the first page is readily available, allowing it to load the page quickly without having to search the entire document for a specific object like an embedded font.

Originally introduced with the PDF 1.2 standard in 1996, linearized PDFs were critical to the format’s early internet success. In order to view a non-linearized PDF, the entire document needs to be downloaded or read via HTTP request-response transactions. Given the bandwidth limitations of early internet connections (often still between 28.8k and 33.6k in 1996), this created a serious bottleneck problem when it came to document viewing. While it was possible to view a document without downloading it, the multiple HTTP requests needed to do so could easily be disrupted if the connection was lost, something that was all too common in the days before reliable broadband connections were introduced.

Non-Linearized vs Linearized PDFs

To visualize the difference between a non-linearized PDF and a linearized PDF, imagine two separate people sitting down to file their business taxes. One person has all of their receipts, invoices, and financial documents scattered across their office, with some stacked in unordered piles, others crammed into unlabeled folders, and even more stuffed into assorted drawers and file cabinets. Finding and organizing all of this documentation would take almost as much time as actually filing the taxes themselves! The second person, however, has all of the records they need stored in a neatly labeled file cabinet, allowing them to retrieve everything quickly and easily.

The first example is similar to a non-linearized PDF, while the second shows how much easier it is for a reader to access the information it needs to render the file. Even better, since each page is organized in the same way, jumping to a different page in a multi-page PDF doesn’t require the reader to reload the entire file. It can simply read the current page and get everything necessary to display the PDF correctly.

Why Linearized PDFs Are Still Valuable

In a world dominated by high speed internet connections, it’s fair to wonder whether or not PDF linearization is still necessary. For small PDFs that are only a few pages, linearization may not be essential, but when it comes to larger documents, linearization can still deliver substantial performance and user experience benefits.

Consider, for instance, a document that consists of several hundred, or even several thousand, pages. Loading that entire document and keeping it cached may be possible, but it’s an inefficient use of processing and bandwidth resources. With a linearized PDF, a reader typically encounters a linearization directory and hint tables at the top of the document, which provides it with instructions on where to locate any necessary resources within the file. After loading the hint tables and the first page, the reader stops the download process rather than opening the entire file. When the user navigates to another page, the reader can quickly reference the hint tables and jump to that page.

This ensures that the reader is only ever loading the pages that actually need to be displayed, which helps to conserve memory, processing resources, and bandwidth. For mobile devices with limited file and cache storage, linearized PDFs are much easier to manage than their non-linearized counterparts. They also provide some protection against network interruptions, which could make it difficult to download and view an entire document.

How to Linearize PDFs

Although the linearization process is well laid out in the current PDF standards documentation, many PDFs are created using software that doesn’t automatically linearize the content. More importantly, some linearized PDFs are “broken” by a process called incremental saving, which saves minor updates at the end of the file, rather than changing existing structure. Over time, too much incremental saving can undermine the effectiveness of a linearized PDF.

The best way to resolve such problems and linearize the PDF is to save a new, linearized version of the file using PDF editing and conversion tools.

Take Control of PDFs with PrizmDoc

Accusoft’s PrizmDoc provides a broad range of document functionality that allows applications to more effectively create, convert, and compress PDF files.

For a closer look at PrizmDoc and to see its powerful document processing capabilities in action, download a free trial today.

Question

After searching a document, an error icon appears in the search results panel. Clicking on it displays the following error message: “x page(s) cannot be searched.” Why does this occur and how can I find out which specific pages couldn’t be searched?

Answer

When the PrizmDoc Viewer text-service cannot find any text for a given page in the document, it provides an array of all the pages without text in the response from searchTask results.

In short, the document is fine and simply contains pages without text. If you look at the pagesWithoutText array contained within the response data from searchTasks, you’ll see something like this:

[0, 1, 7, 17, 43, 45, 65, 67, 77, 79,…]

The values reported are pages that do not contain any text but instead are either blank or contain an image. This data can then be used to inform the user of how many pages are not searchable.

Question

After searching a document, an error icon appears in the search results panel. Clicking on it displays the following error message: “x page(s) cannot be searched.” Why does this occur and how can I find out which specific pages couldn’t be searched?

Answer

When the PrizmDoc Viewer text-service cannot find any text for a given page in the document, it provides an array of all the pages without text in the response from searchTask results.

In short, the document is fine and simply contains pages without text. If you look at the pagesWithoutText array contained within the response data from searchTasks, you’ll see something like this:

[0, 1, 7, 17, 43, 45, 65, 67, 77, 79,…]

The values reported are pages that do not contain any text but instead are either blank or contain an image. This data can then be used to inform the user of how many pages are not searchable.

scalable vector graphics

The scalable vector graphic (SVG) format continues to enjoy steady adoption across the web. According to data from W3Techs, SVG now accounts for 25 percent of website images worldwide. But it wasn’t always this way. In 1998, it became apparent that vector-based graphics had a future on the web, and the W3C received six different file format submissions from technology companies that year. Some were mere proposals ready for a complete revamp, while others were proprietary products that W3C wasn’t permitted to modify. Instead of forging a format from one of the submissions, however, W3C’s SVG working group decided to start from the ground up — and SVG was born.

While the file format had lofty ambitions, focusing on common use rather than specific syntax, the original iteration was cumbersome and complex. However, SVG has improved year after year after year. With increased support came more streamlined functionality and usable features. Now, SVG is often the first choice for meeting the evolving demands of scalable, responsive, and accessible web content.


What is a Scalable Vector Graphic (SVG) and how does it work?

Today, SVG is the de-facto standard for vector-based browser graphics. But what exactly is this file format, and how does it work?

Based on XML, SVG supports three broad types of objects: 

  • Vector graphics including paths and outlines that are both straight and curved
  • Bitmap images such as .jpeg, .gif, and .png
  • Text

What sets SVG apart from bitmap-based images is the use of lines and curves along the edges of graphical objects. Because bitmap images use a fixed set of pixels, scaling them up creates blurriness where the edges of pixels meet. In the case of vector images, meanwhile, a fixed-shape approach allows the preservation of smooth lines and curves no matter the image size.

SVG also offers the benefit of interoperability. Because it’s a W3C open standard, SVG plays well with both other image format and web markup languages including JavaScript, DOM, CSS, and HTML. This allows the format to easily support responsive design approaches that scale websites and web content based on the user device rather than defining standardized size parameters. Thanks to the curves and lines of SVG, scaling presents no problem for responsive designers looking to ensure consistency across device types.


The Benefits of SVG

While scalability is often cited as the biggest benefit of SVG, this format also offers other advantages, including:

  • Responsiveness — Images can be easily scaled up or down and modified as necessary to meet web design and development demands.
  • Accessibility — Since SVG is text-based, content can be indexed and searched, allowing both users and developers to quickly find what they’re looking for.
  • Performance Image rendering is quick and doesn’t require substantive resources, allowing sites to load quickly and completely.
  • Use in Web ApplicationsBrowser incompatibilities and missing functions often frustrate web design efforts, forcing developers to use multiple tool sets and spend time checking content and images for potential format conflicts. SVG, meanwhile, offers powerful scripting and event support, in turn allowing developers to leverage it as a platform for both graphically rich applications and user interfaces. The result? Better-looking sites that enhance the overall user experience.
  • InteroperabilityBecause SVG is based on W3C standards, the format is entirely interoperable, meaning developers aren’t tied to any specific implementation, vendor, or authoring tool. From building their own framework from the ground up to leveraging third-party SVG applications, web developers can find their format best-fit.

SVG in PrizmDoc Viewer

Accusoft’s PrizmDoc Viewer offers multiple ways for developers to make the most of SVG elements at scale, such as:

  • File TransformationConversion is critical for effective and efficient web design. If development teams need different file transformation tools for every format, the timeline for web projects expands significantly. PrizmDoc Viewer streamlines this process with support for the conversion of more than 100 file types — including PDFs, Microsoft Office files, HTML, EML, rich text, and images — into browser-compliant SVG outputs. In practice, this permits near-native document and image rendering that’s not only fast, but also accessible anytime, anywhere, and from any device.
  • HTML5 FunctionalityUsing SVG in PrizmDoc Viewer is made easier thanks to native HTML5 design. The use of HTML5-native framework not only improves load times with smaller document sizes but means that PrizmDoc Viewer works in all modern web browsers — while also dramatically enhancing document display quality.
  • Pre-Conversion One of the biggest challenges with viewing large documents in a browser is delay. Pages toward the end of the document may take longer to load and frustrate users looking to quickly find a specific image or piece of information. PrizmDoc Viewer solves this problem with a pre-conversion API that returns the first page as an SVG while the rest of the document is being converted, allowing users to interact with documents as conversion takes place and lowering the chance that files will experience format-based delays.

SVG hasn’t always been the go-to web image format. Despite a promising start based on open, interoperable standards, the lack of early support and specific use cases for vector-based file formats saw SVG sitting on the sidelines for decades. 

The advent of on-demand access requirements and mobile-first development realities has changed the conversation. SVG is now continuously gaining ground as companies see the benefit in this scalable, streamlined, and superior-quality file format. Get the big picture and see SVG in action with our online document viewing demo, or start a free PrizmDoc Viewer trial today!

Question

If I have a PDF document that only has an embedded image in it (no text objects, etc.), can PrizmDoc Viewer take it and create a searchable PDF file from it?

Answer

Yes. PrizmDoc’s Content Conversion Services can take an image-only PDF and create a searchable PDF file from it. This can be done by modifying the input.dest.pdfOptions.ocr options object; see our documentation here.

If you are attempting to make a searchable PDF from an existing PDF document, please note that the source PDF file should be an image-only PDF. PrizmDoc will not create a searchable file from already-existing vector content.

This feature was introduced in PrizmDoc 13.1, please see our Release Notes for more information.

Question

If I have a PDF document that only has an embedded image in it (no text objects, etc.), can PrizmDoc Viewer take it and create a searchable PDF file from it?

Answer

Yes. PrizmDoc’s Content Conversion Services can take an image-only PDF and create a searchable PDF file from it. This can be done by modifying the input.dest.pdfOptions.ocr options object; see our documentation here.

If you are attempting to make a searchable PDF from an existing PDF document, please note that the source PDF file should be an image-only PDF. PrizmDoc will not create a searchable file from already-existing vector content.

This feature was introduced in PrizmDoc 13.1, please see our Release Notes for more information.