Technical FAQs

Question

Can I host multiple PrizmDoc viewers on a single page?

Answer

It is possible to host multiple viewers on a single page. The following example leverages Bootstrap’s tab implementation:

<!DOCTYPE html>

<html lang="en">
<head>
    <!-- Metadata -->
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no" />
    <meta name="description" content="" />

    <!-- Title -->
    <title>AccuSample</title>

    <!-- Libraries -->
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/normalize/7.0.0/normalize.min.css">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.0.0/css/bootstrap.min.css">

    <!-- PrizmCSS -->
    <link rel="stylesheet" href="https://pcc-demos.accusoft.com/static/viewer-latest/css/viewercontrol.css">
    <link rel="stylesheet" href="https://pcc-demos.accusoft.com/static/viewer-latest/css/viewer.css">

    <!-- Inline Stylesheet -->
    <style>
        body {
            overflow-y: hidden;
        }
        #viewer1, #viewer2 {
            height: calc(100vh - 3em);
            width: 100%;
        }
    </style>

</head>
<body>
    <!-- #main -->
    <main class="container">
        <ul class="nav nav-tabs" role="tablist">
            <li class="nav-item">
                <a class="nav-link active" id="viewer1-tab" data-toggle="tab" href="#viewer1">Viewer 1</a>
            </li>
            <li class="nav-item">
                <a class="nav-link" id="viewer2-tab" data-toggle="tab" href="#viewer2">Viewer 2</a>
            </li>
        </ul>

        <div class="tab-content">
            <div class="tab-pane fade show active" id="viewer1" role="tabpanel">
                <div id="viewer1">
                </div>
            </div>
            <div class="tab-pane fade" id="viewer2" role="tabpanel">
                <div id="viewer2">
                </div>
            </div>
        </div>
    </main>

    <!-- Libraries -->
    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.1/umd/popper.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.0.0/js/bootstrap.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/underscore.js/1.8.3/underscore-min.js"></script>

    <!-- PrizmJS -->
    <script src="https://api.accusoft.com/v1/docstore/viewer/assets/classic/bundle.js"></script>
    <script src="https://pcc-demos.accusoft.com/static/viewer-latest/js/jquery.hotkeys.min.js"></script>
    <script src="https://pcc-demos.accusoft.com/static/viewer-latest/js/viewercontrol.js"></script>
    <script src="https://pcc-demos.accusoft.com/static/viewer-latest/js/viewer.js"></script>

    <!-- Inline Script -->
    <script>
        var viewingSessionId1;
        var viewerControl1;
        var viewingSessionId2;
        var viewerControl2;

        $(document).ready(function() {
            $.ajax({
                "type": "post",
                "url": "https://api.accusoft.com/PAS/V1/ViewingSession",
                "headers": {
                    "acs-api-key": ""
                },
                "data": JSON.stringify({
                    "source": {
                        "type": "url",
                        "url": ""
                    }
                })
            }).done(function(response) {
                viewingSessionId1 = response["viewingSessionId"];

                // Initialize viewer
                viewerControl1 = $("#viewer1").pccViewer({ 
                    documentID: viewingSessionId1,
                    imageHandlerUrl: "https://api.accusoft.com/v2/viewers/proxy",
                    language: languageItems,
                    template: htmlTemplates
                }).viewerControl;
            });

            $.ajax({
                "type": "post",
                "url": "https://api.accusoft.com/PAS/V1/ViewingSession",
                "headers": {
                    "acs-api-key": ""
                },
                "data": JSON.stringify({
                    "source": {
                        "type": "url",
                        "url": ""
                    }
                })
            }).done(function(response) {
                viewingSessionId2 = response["viewingSessionId"];

                // Initialize viewer
                viewerControl2 = $("#viewer2").pccViewer({ 
                    documentID: viewingSessionId2,
                    imageHandlerUrl: "https://api.accusoft.com/v2/viewers/proxy",
                    language: languageItems,
                    template: htmlTemplates
                }).viewerControl;
            });
        });
    </script>
</body>
</html>

Banks and financial technology (fintech) companies commonly use document life-cycle management solutions to make their back-office functions run more smoothly. To take full advantage of these systems, organizations must be able to transform documents into a format they can work with.

 

PII Detection and Redaction

The landscape of legal content management is undergoing a transformative change, thanks to advancements in artificial intelligence (AI). Legal entities, burdened by the immense volumes of sensitive data they handle daily, are finding respite in AI-driven solutions for managing Personally Identifiable Information (PII).

By leveraging the innovative benefits of AI-enabled integrations, Independent Software Vendors (ISVs) can improve the case management, eDiscovery, and practice management software solutions they provide to law firms by securely identifying and redacting PII more efficiently than ever.

Navigating the Data Deluge with AI

Legal practices are inundated with vast quantities of PII, encompassing sensitive documents, client records, and case-related information. The manual management and protection of such extensive and intricate data pose a significant challenge.

AI technology is revolutionizing this process by automating the identification and categorization of PII within large datasets. This minimizes the risk of oversight or human error. Machine learning algorithms, integral to these AI systems, adapt to evolving data structures, ensuring comprehensive and up-to-date protection.

Ensuring Compliance with AI

Software applications used in the legal sector are tightly bound by various data protection regulations. Ensuring adherence to these complex and ever-evolving regulations is a daunting task for legal professionals.

AI solutions are adept at automating compliance checks, significantly reducing the burden on legal practices. These tools assist in adhering to the specific requirements of different data protection regulations, minimizing the risk of legal repercussions.

Revolutionizing Document Review and Redaction with AI

The manual review and redaction of sensitive information in legal documents are not only error-prone but also extremely time-consuming.

AI-powered tools are transforming this landscape by automatically identifying and redacting PII. This not only enhances accuracy and efficiency but also maintains the confidentiality of sensitive information.

Introducing PrizmDoc’s AI Capabilities

As we look towards the future, it’s exciting to introduce PrizmDoc’s new AI capabilities in identifying and flagging PII within documents. PrizmDoc’s AI stands out in its ability to identify sensitive or non-compliant content. Its functionality, accessible via APIs, enables the creation of workflow automations that are both efficient and secure.

Moreover, PrizmDoc offers user interface tools that extend AI functionality to end-users, making it more accessible and practical in everyday legal practice.

The Multifold Benefits of PrizmDoc AI for PII

Incorporating PrizmDoc’s AI capabilities into legal content management systems can lead to a multitude of benefits:

  • Reduced bottlenecks and faster decision-making processes.
  • Improved efficiency in handling and managing legal documents.
  • Enhanced data security, protecting sensitive client information.
  • Improved compliance with regulatory standards.
  • Overall enhancement in the quality and reliability of legal services.

The integration of AI in legal content management, especially with tools like PrizmDoc, is not just a step toward innovation; it’s a leap toward a more efficient, secure, and compliant legal practice. As technology continues to evolve, it’s clear that AI will play a crucial role in shaping the future of legal data management.

Question

I have a PDF of a form that I’m sending to PrizmDoc to have it auto-detect, but PrizmDoc does not find any fields in the document. What would cause this?

Answer

Currently only PDF files with embedded AcroForms will be auto-detected. If the PDF document
has an embedded image of a form, PrizmDoc will not find any results from auto-detection.

Question

What quality should my images be for processing form data and recognition using FormSuite?

Answer

In all cases, you want to have your images as clear and as clean as possible. For any particular procedure, please consider the following:

OCR and ICR: Capture images in at least 300 DPI resolution. Ideally, working in black and white allows the objects of interest on your image to be better defined and recognized. Free the image form all noise as much as possible. As if a human were reading it, you want the text objects on the image to be as legible as possible. For ICR, ensure that the characters are printed (no cursive text, etc).

Barcode recognition: As with OCR and ICR, capture images in at least 300 DPI and working with black and white content can provide excellent results. Ensure that the bars in the barcodes are clearly defined on the image and are not malformed (for example, the barcodes should have the proper start and stop sequence, etc). Clear as much noise from the image as possible.

Forms matching and registration: As with the prior 2 items above, capture your documents in at least 300 DPI. Ensure that your resolution is consistent between your form templates and incoming batch images. Form templates should only contain data that is common to every image that is being processed (i.e. Form fields, the text that appears on the blank form itself, etc). The template should not have filled-in field information as this will affect the forms matching process.

Question

When selecting download and including annotations, the resulting PDF does not include the comments. How can we obtain the file as a PDF with the annotation comments included?

Answer

By design, when using the download button, the comments to the annotations are not included in the download.

Workaround

  1. Under the View tab, select Print.
  2. From the Print dialog, select the More options drop-down.
  3. Under the Comments section, select either After Each Page or At End of Document.
  4. When selecting a printer, use the Print to PDF option and it will create a PDF with the comments included.
Question

I have EML and MSG files that contain attachments. I want to combine the original EML/MSG document with each of its attachments into a single PDF. How can I do this?

Answer

To do this you are going to need to create a viewing session for the EML or MSG file. Once you’ve created the viewing session you can get the viewingSessionIds of the attachments. Once you have the viewingSessionIds you can get the workFile ID associated with each viewing session. With the workfile IDs you can use the Content Conversion Service (CCS) to combine the email with its attachments into a single PDF.

Use the following API requests to do this:

Create a new viewing session with:

POST {server_base_url}/PCCIS/V1/ViewingSession 

This will give you a viewingSessionId which will be used in future requests.

The body for this request will be JSON:

    {
        "render": {
            "html5": {
                "alwaysUseRaster": false
            }
        }
    }

Upload the document with:

PUT {server_base_url}/PCCIS/V1/ViewingSession/u{viewingSessionId}/SourceFile?FileExtension={extension} 

The request body must be bytes of the source document. Make sure to include the FileExtension or PrizmDoc won’t be able to detect the file as an EML or MSG file.

Poll for and get the viewingSessionIds of the attachments with:

GET {server_base_url}/PCCIS/V1/ViewingSession/u{viewingSessionId}/Attachments

This will return the viewingSessionIds for each of the attachments.

Get the workfile ID for each of the viewing sessions with:

GET {server_base_url}/PCCIS/V1/ViewingSession/u{viewingSessionId}/FileId 

You’ll need to get a fileId for each attachment’s viewingSession and the email’s viewingSession.

With those workfile IDs, you can use the Content Conversion Service (CCS) to combine each of the workfiles with:

POST {server_base_url}v2/contentConverters

The body for this request will be JSON and need to include the workfileId for each of the attachments and the email itself. The body would look like this:

    {
        "input": {
            "sources": [
                { 
                    "fileId": "{EmailWorkFileId}"
                },
                { 
                    "fileId": "{Attachment1WorkFileId}"
                },
                { 
                    "fileId": "{Attachment2WorkFileId}"
                }
                
            ],
            "dest": {
                "format": "pdf"
            }
        }
    }

This will return a processId that you can poll using:

GET /v2/contentConverters/{processId}

Once the process is complete, the request will return a results array that will contain the fileId of the final document.

You can get the final document that contains all the attachments and original EML/MSG file combined into a single PDF with:

GET /PCCIS/V1/WorkFile/{fileId}

OCR vs ICR

The days of manually transcribing scanned documents into an editable, digital document are thankfully long behind most organizations. Error-prone manual processes have largely given way to automated document and forms processing technology that can turn scanned documents into a more manageable form with a much higher degree of accuracy. 

Much of transition was made possible by the proliferation of optical character recognition (OCR) and intelligent character recognition (ICR). While they perform very similar tasks, there are some key differences between them that developers need to keep in mind as they build their document and form processing applications.

How Does Character Recognition Technology Work?

Character recognition technology allows computer software to read and recognize text contained in an image and then convert it into a document that can be searched or edited. Since the process involves something that humans can do quite easily (namely, reading text), it’s easy to assume that this would be a rather trivial task for a computer to accomplish.

In reality, getting a computer program to correctly identify text and convert it into editable format is an incredibly complex challenge complicated by a wide range of variables. The problem is that when a computer examines an image, it doesn’t see people, backgrounds, or text as distinct images, but rather as a pattern of pixels. Character recognition technology helps computers distinguish text by telling them what patterns to look for.

Unfortunately, even this isn’t as straightforward as it sounds. That’s because there are so many different text fonts that depict the same characters in different ways. For example, a computer must be able to recognize that each of the following characters is an “a”:

When humans read text, they have a mental concept of what the letter “a” looks like, but that concept is incredibly flexible and can easily accommodate a broad range of variations. Computers, however, require precision. Programmers must provide them with clear parameters that help them to navigate unexpected variations and identify characters accurately.

Pattern Recognition

The earliest versions of character recognition developed in the 1960s relied on pattern recognition techniques, which scanned images and searched for pixel patterns that matched a backlog of font characters stored in memory. Once those patterns were located, the software could translate the characters into searchable, editable text in a document format. Unfortunately, the patterns had to be an exact pixel match, which severely limited how broadly the technology could be applied.

One of the first specialized fonts developed to facilitate pattern recognition was OCR-A. A simple monospace font (meaning that each character has the same width), OCR-A was used on bank checks to help banks quickly scan them electronically. Although pattern recognition libraries expanded over the years to incorporate common print fonts like Times New Roman and Arial, this still presented serious limitations, especially as the variety of fonts continued to grow. With one popular font finding website indexing more than 775,000 available fonts in 2021, pattern recognition needed to be supplemented by another approach to character recognition.

Feature Detection

Also known as feature extraction, feature detection focuses on the component elements of printed characters rather than looking at the character as a whole. Where pattern recognition tries to match characters to known libraries, this approach looks for very specific features that distinguish one character from another. A character that features two angular lines that come to a point and are crossed by a horizontal line in the middle, for instance, is almost always an “A,” regardless of the font used. Feature detection focuses on these qualities, which allows it to identify a character even the program has never encountered a particular font before. As the printed examples above demonstrate, however, this approach needs to take several ways of rendering the character “A” into consideration when setting parameters.

Most character recognition software tools utilize feature detection because it offers far more flexibility than pattern recognition. This is especially valuable for reading document images with faded ink or some degradation that could prevent an exact pattern match. Feature detection provides enough flexibility for a program to be able to identify characters under less than ideal circumstances, which is important for any application that has to deal with scanned images.

OCR vs ICR: What’s the Difference?

Optical character recognition (OCR) is typically understood to apply to any recognition technology that reads machine printed text. A classic OCR use case would involve reading the image of a printed document, such as a book page, newspaper clipping, or a legal contract, and then translating the characters into a separate file that could be searched and edited with a document viewer or word processor. It’s also incredibly useful for automating forms processing. By zonally applying the OCR engine to form fields, information can be quickly extracted and entered elsewhere, such as a spreadsheet or database.

When it comes to form fields, however, information is frequently entered by hand rather than typed. Reading hand-printed text adds another layer of complexity to character recognition. The range of more than 700,000 printed font types is insignificant compared to the near infinite variations in hand-printed characters. Not only must the recognition software account for stylistic variations, but also the type of writing implement used, the quality of the paper, mistakes, steadiness of hand, and smudges or running ink.

Intelligent character recognition (ICR) utilizes constantly updating algorithms to gather more data about variations in hand-printed characters to identify them more accurately. Developed in the early 1990s to help automate forms processing, ICR makes it possible to translate manually entered information into text that can be easily read, searched, and edited. It is most effective when used to read characters that are clearly separated into individual areas or zones, such as fixed fields used on many structured forms.

Both OCR and ICR can be set up to read multiple languages, although limiting the range of expected characters to fewer languages will result in more optimal recognition results. Critically, ICR does not read cursive handwriting because it must still be able to evaluate each individual character. With cursive handwriting, it’s not always clear where one character ends and another begins, and the individual variations from one sample to another are even greater than with hand-printed text. Intelligent word recognition (IWR) is a newer technology that focuses on reading an entire word in context rather than identifying individual characters.

To learn more about how OCR vs ICR technology and how they can transform your application when it comes to managing documents and automated forms processing, download our whitepaper on the topic today.