Technical FAQs for "PrizmDoc"

Question

If I have a PDF document that only has an embedded image in it (no text objects, etc.), can PrizmDoc Viewer take it and create a searchable PDF file from it?

Answer

Yes. PrizmDoc’s Content Conversion Services can take an image-only PDF and create a searchable PDF file from it. This can be done by modifying the input.dest.pdfOptions.ocr options object; see our documentation here.

If you are attempting to make a searchable PDF from an existing PDF document, please note that the source PDF file should be an image-only PDF. PrizmDoc will not create a searchable file from already-existing vector content.

This feature was introduced in PrizmDoc 13.1, please see our Release Notes for more information.

Question

When loading PDF documents into PrizmDoc that contain embedded highlights, rather than appearing translucent, the highlights are appearing opaque and are covering the underlying text. Why is this happening?

Answer

Prior to version 13.4 of PrizmDoc, this was occurring due to limitations in web browsers. The SVG attribute comp-op="multiply" was not widely supported in modern browsers.

With PrizmDoc version 13.4, the way highlights were rendered was changed to resolve this issue.

Question

How can I get a document’s dimensions with PrizmDoc?

Answer

There are two methods you can use to do this with PrizmDoc:

The first method is using the requestPageAttributes() method from ViewerControl. This method allows you to get the width and height of a page in the document in pixels. Below is sample code on how to use requestPageAttributes() to get the attributes of page 1 of a document:

viewerControl.requestPageAttributes(1).then(function(attributes) {
    var pageWidth = attributes.width;
    var pageHeight = attributes.height;
});

The second method is done by making a GET request to the PrizmDoc server to get metadata for a page of the source document in a viewing session. The request is:

GET /PCCIS/V1/Page/q/{{PageNumber}}/Attributes?DocumentID=u{{viewingSessionId}}&ContentType={{ContentType}}

The content type needs to be set to “png” for raster content and “svgb” for SVG content. The request returns the data in a JSON object containing the image’s width and height. The units for the width and height are in pixels when the contentType is set to “png” and unspecified units when the content type is set to “svgb”.

The request also returns the horizontal and vertical resolution of raster content when the content type is set to “png”. This information is similar to pixels per inch, but the units are unspecified, so if you wanted to calculate the size of the document you can calculate it by width divided by horizontal resolution or height divided by vertical resolution. The resolution is hard-coded to 90 when contentType is set to “svgb”.

Question

I would like to be able to query my PrizmDoc Server for all documents currently in a state of processing. I want to be able to do this to determine if a document is “hanging” during conversion, to determine my system’s efficiency (My RAM and CPU are at X% with ten documents converting), or for other tasks. This is currently possible for individual processes if you know the process ID. Is this possible for all processes?

Answer

The current version of PrizmDoc does not have an API to determine if any file is currently converting on PrizmDoc Server. PrizmDoc provides viewingPackageCreator, contentConverter, redactionCreator, and markupBurner APIs that report the status of a specific process, and whether it is in progress or not. However, it is currently necessary to know a specific processId to obtain that information.

There is an active Feature Request for this item available for viewing here.

Question

We are interested to know how well the cloud offering
scales. For example, can we process thousands of transactions per second? Is there a limit to the number of transactions that we can use
concurrently?

Answer

The limitation will vary depending on the size of the documents and the type of work being performed on them.

PrizmDoc Cloud utilizes AWS servers to scale our services as demand increases, and we do currently rate limit total requests which should not exceed 100 requests within an eight second window. The window is rechecked every four seconds to determine if the rate limit is still in excess.

Question

What causes Session Expired errors in PrizmDoc?

Answer

Session expired errors normally occur when a viewing session is open for longer than 20 minutes. 20 minutes is the default value of viewing.sessionLifetime in the PrizmDoc central configuration file, and you can increase it if you would like your Viewing Sessions to be active longer. On PrizmDoc Cloud this value is set to 5 hours.

You can find the relevant config file using the following file paths:

  • Linux: /usr/share/prizm/prizm-services-config.yml
  • Windows: C:\Prizm\prizm-services-config.yml
Question

I have installed PrizmDoc based on the documentation against a clean CentOS 7/RedHat 7 system, and Prizm services starts and is showing healthy. However, one of two issues are occurring:

  1. I cannot view HTML or picture files but can view PDF files.
  2. I cannot view PDF, Excel, or Word documents but can view HTML and Picture files.
Answer

If you cannot view HTML or picture files but can view PDF files, it is often due to specific required libraries not being installed. The following procedure can be executed on CentOS/RedHat 7 to ensure all required PrizmDoc libraries are installed.

  1. Stop the Prizm service: sudo /usr/share/prizm/scripts/pccis.sh stop

  2. Copy and paste all of the library installers into a terminal and wait for them to finish:

    yum install -y libbz2* libc* libcairo* libcups* libdbus-glib-1* libdl* libexpat* libfontconfig* libfreetype* libgcc_s* libgif* libGL* libjpeg* libm* libnsl* libopenjpeg* libpixman-1* libpng12* libpthread* librt* libstdc++* libthread_db* libungif* libuuid* libX11* libXau* libxcb* libXdmcp* libXext* libXi* libXinerama* libxml2* libXrender* libXtst* libz* linux-vdso*
    
  3. Restart the server.

If you cannot view PDF, Excel, or Word documents but can view HTML and Picture files, this is often due to installing the Generic PrizmDoc installer, which ends in either client_x86_64.tar.gz or server_x86_64.tar.gz. To resolve this issue you will need to re-install using the links that end in client_x86_64.rpm.tar.gz and server_RHEL7.tar.gz.

Question

Why do the fonts in my document look different when rendered in PrizmDoc?

Answer

There are times when a document was created on a machine where a user installed a non-standard font (e.g., Maestro-Regular). When the document is viewed on another system, without that font installed, the text will look different.

PrizmDoc uses the fonts installed on the server where it is running when converting the document. If the document has a non-standard font, and the font is not installed on the server where PrizmDoc is running, then it will render the document with a default font.

To fix the issue, you need to install the font onto the PrizmDoc server that converted the document. The procedure below outlines how to install the font on a Windows server. You will need to ensure that you are logged in as the same UserId that the Prizm Service is using.

  1. Identify the name of the font that needs to be installed and download the font file associated with that font. The font file name will typically be in the format of fontname.ttf.
  2. Copy the .ttf file to the server running PrizmDoc.
  3. On the server, right-click the .ttf file and select Install from the menu.
  4. Stop all of the Prizm services.
  5. Clear all folders under \Prizm\Cache – this will clear the previously rendered files that had the wrong font rendered.
  6. Reboot the server and attempt to open the document again.
Question

How can I improve font fidelity?

Answer

We are working on providing you a way to provide us with your own licensed fonts to be used. Watch for this feature in an upcoming release.

Question

The PrizmDoc Office Conversion Service on my localhost:18681/admin page says it’s “Starting” or has failed to start. What could the issue be?

Answer

This indicates that the license key provided is not licensed for the Microsoft Office Conversion feature.

To fix this, modify the fidelity.msOfficeDocumentsRenderer property in prizm-services-config.yml.

For information on the supported values, see the Fidelity section in the Central Configuration topic.

Question

In PrizmDoc, why do I fail to load/convert Excel documents with the error “Exception from HRESULT: 0x800AC472”?

Answer

The error message Exception from HRESULT: 0x800AC472 is usually associated with a failure involving an Excel document, found in the MsOfficeConverter.log. Below are some known triggers of it:

If the user is logged in as "SYSTEM", "LocalSystem", or any other non-user-account variant, this will cause PrizmDoc to fail when using MSO services. This is expected behavior when working with Microsoft Office documents in PrizmDoc. Please see step 6 of the Windows Installation documentation regarding this:

http://help.accusoft.com/PrizmDoc/latest/HTML/webframe.html#windows-installation.html

"Specify the login account (account name and password) that PrizmDoc Server will run under. If you are using the Microsoft Office (MSO) Conversion add-on, please make sure that the "login account" is a real user account with Administrator rights. Running PrizmDoc under the LocalSystem user or another Microsoft Windows integrated service account is not supported for this option."

It’s also crucial that the copy of Microsoft Office on the system has been activated. A not-licensed, not-activated, expired, or trial license will all cause Microsoft Office to not work with PrizmDoc.

More information: https://help.accusoft.com/PrizmDoc/latest/HTML/windows-requirements.html

"The installed copy of Microsoft Office must be activated in order for PrizmDoc’s Microsoft Office Conversion Service to work properly. Not licensed, not activated, an expired or trial version of Microsoft Office will not work with PrizmDoc."

Your default printer must be the Microsoft XPS Document Writer when working with Excel documents in PrizmDoc. Specifying another printer could possibly lead to this exception.

More information: http://help.accusoft.com/PrizmDoc/latest/HTML/natively-render-mso-documents.html

"The Microsoft Office Conversion Service requires the Microsoft XPS Document Writer printer driver to be installed for the best conversion performance and rendering fidelity of MS Excel documents"

Ensure the Print Spooler service is started and the Microsoft XPS Document Writer is the default printer.

There is a known issue with version 13.3 of PrizmDoc where completely blank Excel files are not loadable in the Viewer. They will fail to load and throw the aforementioned HRESULT exception. This has been fixed in PrizmDoc version 13.6.

In short, please set up the PrizmDoc service correctly to run with a real user account, ensure the copy of Microsoft Office has been activated, and make sure the default printer is set to "Microsoft XPS Document Writer", then restart the service. This should fix this particular issue in most cases.


For more reading on considerations that Microsoft recommends when running their client-side MSO applications on the server, see this article:

Considerations for server-side Automation of Office

Question

After searching a document, an error icon appears in the search results panel. Clicking on it displays the following error message: “x page(s) cannot be searched.” Why does this occur and how can I find out which specific pages couldn’t be searched?

Answer

When the PrizmDoc Viewer text-service cannot find any text for a given page in the document, it provides an array of all the pages without text in the response from searchTask results.

In short, the document is fine and simply contains pages without text. If you look at the pagesWithoutText array contained within the response data from searchTasks, you’ll see something like this:

[0, 1, 7, 17, 43, 45, 65, 67, 77, 79,…]

The values reported are pages that do not contain any text but instead are either blank or contain an image. This data can then be used to inform the user of how many pages are not searchable.