Skip to Main Content

How to Optimize PrizmDoc for Large Document Viewing and Server-Side Search

Looking to achieve faster end-user interaction with source documents containing hundreds or thousands of pages? PrizmDoc’s new Large Document Viewing and Server-Side Search feature allows your users to do just that. If you have PrizmDoc, you may be wondering, “Well, that’s great and all, but how do I turn it on?”

If you’re more interested in learning how server-side search is set-up, the feature automatically turns on with a configuration setting. As a developer, you have to invoke the option to make it happen. PrizmDoc doesn’t perform server-side search by default. When you create or set up a viewing session, you specify whether or not you want to use the server-side search option by either selecting clientSearch API call or serverSearch API call.

Use Viewing Packages to Pre-Convert Content Whenever Possible

Ensure that PrizmDoc runs smoothly on your website or application by pre-converting content whenever possible. The most important thing you can do to make large documents load quickly in the browser is to make sure the document content has already been converted for viewing in the browser before an end user starts to view it. This is especially true for Microsoft Office documents.

If you are using PrizmDoc Application Services (PAS), take advantage of our Viewing Packages feature to comprehensively pre-convert an entire document for fast viewing in the browser. Once created, a viewing package persists until you explicitly delete it, and it allows the PrizmDoc Application Services to simply return static content for any page of a document, even if the document has thousands of pages.

Use Server-Side Search to View Large Documents

As we discussed above, PrizmDoc allows end users to easily search a document and navigate the results through server-side search. Server-side search has the ability to offload work to the server and populate the Viewing Client UI as results become available. This new feature enables processing of much larger documents than previously possible.

There are a variety of benefits to using server-side search. Previous versions of PrizmDoc could fail to convert or extract all text in large documents, such as PDFs over 1,000 pages and Word documents over 250 pages. If you process documents with a large number of pages, this feature will likely benefit you. Search results on large documents are much faster using server-side search mode.

However, converting large documents is very resource intensive, particularly with Office documents. Specifically, the number of system cores will have a direct effect on performance. For example, a 1,000-page Word document may take several minutes to complete text extraction. This process may be fully utilizing two cores during that conversion. So three or four concurrent users, each converting a 1,000-page document, would consume all conversion resources on that server during that time.

Subsequent users attempting to convert documents during this time may encounter errors until the system resources are released by the completed text extraction process. This may result in a sub-optimal user experience when resources are overloaded. We strongly recommend servers with a large number of cores if you are working frequently with large documents.

In addition, the text extraction process will continue if a user abandons a session before the text extraction process completes. This will be improved in a subsequent release, but it is important to know that a user abandoning a document does not necessarily release conversion resources on the server. Our performance tests in both modes reveal nominal performance and resource differences when viewing smaller documents. Converting Office documents with server-side search mode enabled will consume roughly 10 percent more resources.

Client-Side vs. Server-Side Search

Ideally, PrizmDoc would perform a client-side search whenever possible and a server-side search whenever necessary. In reality, we make an educated guess based on page count. By default, our Viewing Client will perform a client-side search if a document contains no more than 80 pages; otherwise, the Viewing Client will offload the search work to the server (when server-side search is enabled). For many kinds of documents this arbitrary 80-page threshold works. However, if you are using documents of 80 pages or less with a substantial amount of text, or if your end user’s browser is particularly memory constrained, you may find that this default is not aggressive enough in offloading search work to the server.

When constructing a viewer control, you can use the Viewer Control Options searchMethodPageCountThreshold property to adjust the maximum number of pages a document can have before the Viewing Client switches to server-side search. Additionally, you can use the searchMethodType property to force the Viewing Client to only use server-side search (or only use client-side search).

Use APIs to Manage Your Server-Side Search Capabilities

The following excerpt from the PrizmDoc documentation provides instructions on how to use APIs to customize your desired usage of server-side search.

clientSearch(searchQuery) → {PCCViewer.SearchRequest}

Searches the text of the document for the given searchQuery. The search is performed client-side, which requires requesting from the server text for each page. This is efficient for smaller documents, but for large documents, it is more efficient to use the PCCViewer.ViewerControl#serverSearch method instead.

This query can be a single search term or a hash specifying one or more terms and options. If only a single search term (string) is supplied, then default options are used. Search completes asynchronously. The returned PCCViewer.SearchRequest object, provides events for search progress and members to access search results.

serverSearch(searchQuery) → {PCCViewer.SearchRequest}

Searches the text of the document for the given searchQuery. The search is performed server-side. This is efficient for larger documents, but for smaller documents it is more efficient to use the PCCViewer.ViewerControl#clientSearch method instead.

This query can be a single search term or a hash specifying one or more terms and options. If only a single search term (string) is supplied, then default options are used. Search completes asynchronously. The returned PCCViewer.SearchRequest object, provides events for search progress and members to access search results.

To learn more about PrizmDoc’s Large Document and Server-Side Search feature, please visit the documentation.