Technical FAQs

Question

In PrizmDoc Viewer, when viewing Excel documents that have pictures on certain spreadsheets within that document, the pictures are not displayed.

This appears to happen only if PrizmDoc has the Microsoft Office Conversion (MSO) feature enabled. This issue does not occur if PrizmDoc is using LibreOffice.

Why is this happening?

Answer

The issue is related to an Excel “Page Setup” option called “Black and white”. The option is located in Excel under File, Print, Page Setup and is only respected when PrizmDoc has the MSO feature enabled.

When using LibreOffice, this setting does not exist and is ignored, which is why you can see the pictures.

By default, this option is disabled in Excel, so that specific option would have to be manually set by the creator of the document.

As a workaround, ensure that the “Page Setup” option for “Black and white” is not checked on any spreadsheets in an Excel document that has pictures.

OnTask API
sTAMPA, Fla. – OnTask, a workflow automation and eSignature tool, announces the launch of their new product, OnTask API.

OnTask API is a solution created for software developers looking to integrate eSignature functionality into new or existing applications. Often, creating an in-house solution to solve for these needs is a costly, time-consuming process involving a large lift from team members. This new offering aims to cut down on deployment time, while scaling with the needs of growing businesses. 

The automated eSignature solution from OnTask can easily be embedded into applications, and is a white label product that is fully-configurable to company needs and branding guidelines. 

“We believe OnTask API is going to be a gamechanger for start-ups and businesses with less development resources,” states Steve Wilson, President. “This solution is going to help a lot of businesses save resources, while still being able to accomplish their digital document goals.”

OnTask API allows users to embed legally-binding eSignature technology, as well as workflows complete with document routing, fillable digital forms, and document upload features into their applications. Additionally, users have the ability to bulk launch workflows to collect large numbers of eSignatures and participant data at a single time.

“Our developers have taken all of the features users love about OnTask, and given them the ability to integrate it directly into the applications they’re already using,” says Wilson. “Users can still build complex workflows with conditional logic, but now they can be launched directly from their site or application.”

OnTask has helped small to mid-sized businesses across a variety of industries stay in compliance, collect legally-binding signatures, and save time where it matters most. The OnTask team is looking forward to the feedback from this latest release.

 

About OnTask

OnTask is a workflow automation tool that makes it easy for small to mid-sized businesses to digitally send and fill forms, get signatures on documents and automate overall business processes, saving time and resources. For more information on OnTask, visit www.ontask.io

October 11, 2023 – Tampa, FLAccusoft is pleased to announce the newest additions to PrizmDoc’s industry-leading document processing capabilities: video playback and an advanced optical character recognition (OCR) API integration. These new additions allow PrizmDoc to provide even more support to developers looking to add essential features to their applications.

PrizmDoc’s new video playback feature makes it easy for clients to natively embed videos into their software without having to rely on external hosting platforms or third-party plug-ins. The feature not only enhances security, but also delivers a seamless user experience that today’s customers expect from their applications. 

With PrizmDoc’s new advanced OCR API, web developers can now access Accusoft’s optical character recognition technology that was previously only accessible via an SDK.  The features included in the new OCR API enable full page and zonal recognition for document and forms processing as well as support for location and confidence information for each character. With a simple API call, PrizmDoc can extract searchable text from any supported raster file. The new OCR API add-on option for PrizmDoc also offers support for 60+ languages plus an option for Asian languages. 

“Today’s applications need more than the ability to view and manage documents,” says Jack Berlin, CEO of Accusoft. “By enabling video playback and allowing developers to tap into our proven OCR technology with a simple API call, we’re making it easier for PrizmDoc customers to deliver an all-in-one solution for their customers that provides a better overall user experience.”

To learn more about PrizmDoc or to download a free trial and experience the new video playback and OCR API features first-hand, visit our website.

About Accusoft

Founded in 1991, Accusoft is a software development company specializing in document processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve th most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

Question

I am trying to perform OCR on a PDF created from a scanned document. I need to rasterize the PDF page before importing the page into the recognition engine. When rasterizing the PDF page I want to set the bit depth of the generated page to be equal to the bit depth of the embedded image so I may use better compression methods for 1-bit and 8-bit images.

ImGearPDFPage.DIB.BitDepth will always return 24 for the bit depth of a PDF. Is there a way to detect the bit depth based on the PDF’s embedded content?

Answer

To do this:

  1. Use the ImGearPDFPage.GetContent() function to get the elements stored in the PDF page.
  2. Then loop through these elements and check if they are of the type ImGearPDEImage.
  3. Convert the image to an ImGearPage and find it’s bit depth.
  4. Use the highest bit depth detected from the images as the bit depth when rasterizing the page.

The code below demonstrates how to do detect the bit depth of a PDF page for all pages in a PDF document, perform OCR, and save the output while using compression.

private static void Recognize(ImGearRecognition engine, string sourceFile, ImGearPDFDocument doc)
    {
        using (ImGearPDFDocument outDoc = new ImGearPDFDocument())
        {
            // Import pages
            foreach (ImGearPDFPage pdfPage in doc.Pages)
            {
                int highestBitDepth = 0;
                ImGearPDEContent pdeContent = pdfPage.GetContent();
                int contentLength = pdeContent.ElementCount;
                for (int i = 0; i < contentLength; i++)
                {
                    ImGearPDEElement el = pdeContent.GetElement(i);
                    if (el is ImGearPDEImage)
                    {
                        //create an imGearPage from the embedded image and find its bit depth
                        int bitDepth = (el as ImGearPDEImage).ToImGearPage().DIB.BitDepth; 
                        if (bitDepth > highestBitDepth)
                        {
                            highestBitDepth = bitDepth;
                        }
                    }
                }
                if(highestBitDepth == 0)
                {
                    //if no images found in document or the images are embedded deeper in containers we set to a default bitDepth of 24 to be safe
                    highestBitDepth = 24;
                }
                ImGearRasterPage rasterPage = pdfPage.Rasterize(highestBitDepth, 200, 200);
                using (ImGearRecPage recogPage = engine.ImportPage(rasterPage))
                {
                    recogPage.Image.Preprocess();
                    recogPage.Recognize();
                    ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions() { VisibleImage = true, VisibleText = false, OptimizeForPdfa = true, ImageCompression = ImGearCompressions.AUTO, UseUnicodeText = false };
                    recogPage.CreatePDFPage(outDoc, options);
                }
            }
            outDoc.SaveCompressed(sourceFile + ".result.pdf");
        }
    }

For the compression type, I would recommend setting it to AUTO. AUTO will set the compression type depending on the image’s bit depth. The compression types that AUTO uses for each bit depth are: 

  • 1 Bit Per Pixel – ImGearCompressions.CCITT_G4
  • 8 Bits Per Pixel – ImGearCompressions.DEFLATE
  • 24 Bits Per Pixel – ImGearCompressions.JPEG

Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them.

However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images.

The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.

InsurTech SDK

The insurance market is booming. As noted by research firm Deloitte, the property and casualty (P&C) sector saw a massive income uptick in 2018 and steady growth last year that’s predicted to carry forward through 2020. To help manage the influx of new clients and handle more claims, many firms are spending on insurance technology (insurtech) — digital services and solutions that make it possible to reduce error rates and enhance operational efficiency. InsurTech SDKs are important components of this transformation.

Both in-house insurtech solutions and third-party platforms often excel in specific areas but come up short in others, putting insurance firms at risk of writing off potential gains. While solution switching and ground-floor rebuilds offer one route to success, there’s another option that’s more custom to your business needs: software development kits (SDKs). Here’s a look at three top SDKs that offer customized functionality potential.


FormSuite for Structured Forms: Solving for Data Capture

Time is money. The faster insurance companies accurately complete and file documents, the greater their revenue potential. And as noted by KPMG, the need for speed is more pressing than ever. Many insurance sectors have seen substantial increases in both claims and new applications as the COVID-19 crisis evolves. 

As a result, accurate and agile forms processing is critical to keep up with demand. If current insurance software can’t quickly capture forms data, recognize standard form fields, and let users easily create standard form libraries, policy processing falls behind.

FormSuite for Structured Forms makes it easy for developers to build in form identification and data capture that includes comprehensive form field detection with OCR, ICR, and OMR functionality and the ability to automatically identify scanned forms and match them to existing templates.

ImageGear for .NET and C/C++: Simplifying Conversion

Conversion is critical for insurance firms. Depending on the type and complexity of insurance claims, companies are often dealing with everything from Word documents for initial client assessments and .GIF or .JPG images of existing damage to contractor-specific PDFs or spreadsheets that detail necessary materials, time, and labor costs. The result? A mash-up of multiple file types that forces adjusters to spend valuable time searching for specific data instead of helping clients get their claims process up and running. This makes it difficult to recognize value from emerging digital initiatives. 

Accusoft’s ImageGear for .NET and ImageGear for C/C++ empower developers to integrate enterprise-class file viewing, annotation, conversion, and image processing functions into existing applications, allowing staff to both quickly collaborate on key tasks and find essential data across a single, easy-to-search document.

 


ImageGear: Streamlining PDF Capabilities

While insurance technology offers substantive opportunities for end-users to capture, convert, and retain data, this technology can also come with the challenge of increased complexity. According to recent research from PWC, for example, firms looking to capitalize on insurtech potential must be prepared to rapidly develop new product offerings and embrace the expectations

As a result, companies need applications that streamline current functions and allow them to focus on creating cutting-edge solutions. For example, PDF is a file format that is still used by enterprises worldwide to maintain document format consistency and maximize security. When it comes to converting multiple files into a PDF, software can be expensive and introduce data security issues. 

This can all be solved with an SDK like ImageGear, which makes it possible to integrate the total PDF package into any document management application, both reducing overall complexity and freeing up time for staff to work on new insurance initiatives.

Insurtech forms the framework of functional futures in policy applications, claims processing, and compliance reporting, but existing software systems may not provide the complete capability set companies need to make the most of digital deployments. These top SDKs offer insurance IT teams the ability to integrate key services, improve speed, and boost security at scale. Learn more about Accusoft’s SDKs at www.accusoft.com/products

There are 2 ways to password protect a document

  • Protected sheet for locked cells – we display the document as is and respect locked and hidden cells, but we don’t allow inputting the password to unlock it.
  • Password protected document for opening – we currently don’t support this use case; instead, it will fail to process the file with UnsupportedFileFormat.

When the time comes to extract data from standard forms, simply scanning the entire document isn’t an ideal solution. This is especially true of forms that include instructional text, since you probably don’t want to keep capturing “Directions” from every form. Even when looking only at fillable information, there can be a lot of text to capture. Optical character recognition makes it simple to automate data extraction as part of a forms processing workflow, but the most effective frameworks utilize a specialized form of recognition known as zonal OCR. 

What Is Zonal OCR?

While zonal OCR still identifies machine-printed text and matches it to existing character sets before handing it off to another stage of a predetermined workflow, what sets the process apart is the way it goes about reading a document page. A typical standard form often features multiple fillable boxes where someone can enter their information. It could also include drop-down menus with predetermined responses (suffix, state, and country are all common examples of this). Trying to recognize all of that text at once greatly increases the number of possible results, which could impact both accuracy and performance.

Zonal OCR addresses this challenge by splitting the page up into several distinct zones, each of which typically corresponds to a form field (although it doesn’t have to). Instead of reading the entire page, then, the OCR engine selectively recognizes the text in these zones. It can also be combined with form image dropout, which removes text and graphical elements that don’t need to be read and might interfere with the recognition process. By reducing the amount of text that needs to be matched, zonal OCR and significantly improve recognition speed and accuracy.

Limiting Recognition Results

The most effective OCR solutions then go a step further by designating the type of information that should be found within those zones. This reduces the range of potential outcomes, which makes it easier for the OCR engine to return an accurate reading.

For example, the letter “Z” bears superficial similarities to the number “2.” If the OCR engine needs to take into account all possible responses, it may struggle to distinguish between the two accurately, especially if an unusual font was used to complete the form. However, if developers stipulate that a particular “zone” should only include numerical values, the OCR engine suddenly goes from having to consider dozens of letters and special characters to just ten numbers. This makes it much easier to obtain an accurate recognition result.

For hand-printed form responses, applying the same zonal strategy to Intelligent Character Recognition (ICR) is especially helpful. Going back to the “Z” and “2” example, the distinctions between the two characters are often much more subtle in the case of hand-printing. If a form field includes the date to be printed out in a month/day/year format, there is no reason to include a “Z” in the list of potential characters that might be found in that field because no month includes a “Z.” When the ICR engine comes across a “2,” then, it’s more likely to identify it correctly because there are fewer potential alternative characters.

By constraining possible recognition results over a smaller range of defined character sets, zonal OCR and ICR both greatly improve accuracy when it comes to forms processing. The list of potential results is typically referred to as a data validation list.

In addition to constraining character sets, regular expressions can also be applied to different zones to specify what kind of data is expected to be found there. A regular expression is simply a string pattern that sets rules for how characters are formatted, such as a phone number, Social Security number, or credit card number.

Setting Up Zonal OCR

Integrating zonal OCR capabilities into a forms processing workflow first requires the creation of specialized templates that map out the location of each field that contains data. In any organization, the various types of standard forms received should always be built as templates within the solution. This allows the application to both match incoming forms to existing templates, but also align them to ensure that everything is in the proper location. 

The alignment step is extremely important for effective data extraction. Zonal OCR is set up to read only specific areas on a document page. These zones have clear boundaries, and anything caught outside that boundary will not be read while any character that’s only partly within the field will likely return an error result of some kind.

Accusoft’s SmartZone OCR/ICR integration, for instance, works most effectively when paired with the FormFix SDK, which handles form template creation, identification, and alignment. As part of the broader FormSuite solution, these integrations are extremely effective when it comes to streamlining data capture.

Improve Data Capture Accuracy with SmartZone

With OCR and ICR support for multiple languages, SmartZone is a powerful data extraction tool that can be incorporated into an application individually or with the rest of the FormSuite collection. It provides fast, accurate text recognition on both a zonal and a full-page basis. Developers can set up expected character patterns for fields and designate different regular expressions for all of them to deliver results that are significantly more accurate.

SmartZone not only provides out-of-the box support for pre-defined character sets, such as upper and lower case characters, arithmetic symbols, and currency symbols, it also allows developers to edit those sets to improve accuracy, confidence, and speed.

Find out how the SmartZone OCR/ICR can enhance your application’s forms processing data extraction today by downloading a free trial.

TAMPA, FLA. (Dec. 1, 2021) Last night, the Tampa Bay Software CEOs (TBSC) met for their quarterly social meeting at the Current Hotel in the Rox Rooftop Bar. Hosted by Accusoft, this networking event provided a great venue for discussion about attracting top talent to the area’s innovative businesses, driving growth opportunities for the high tech industry in Tampa, and sharing mutual problems and uncovering solutions.

“Our tech community and the opportunity for further innovation in Tampa Bay is growing”,  said Jack Berlin, CEO at Accusoft. “With this great collaboration of software CEOs, we can bring the local tech community together, to attract top talent, effectively communicate to our leaders what our tech community needs, and learn from each other to drive further growth.”

The Software CEO Council comprises the area’s premier businesses, executives, and entrepreneurs of Tampa Bay’s technology community. Its mission is to create the largest communal ecosystem for tech startups in the state of Florida and put Tampa Bay on the map as a beacon for innovation and success, to foster talent and fuel growth. Council companies include A-LIGN, Accusoft, Applied Data Corporation, ComplianceQuest, CrossBorder Solutions, Digital Hands, Geographic Solutions, Haneke Design, MercuryWorks, Sourcetoad, Spirion, Transcendent and Vendita.

Pictured above left to right: Greg Ross-Munro, Prashanth Rajendran, Kevin Coppins, Dan Gaertner, Chris Karlo, Shamus Hines, Charlotte Baker, Jack Berlin, Jody Haneke.

For more information about TBSC, visit the group’s website at https://www.tampasoftwareceos.com/.

About Tampa Bay Tech

Tampa Bay Tech is a 501(c)6 non-profit technology council that has been engaging and uniting the local technology community for 20 years. With over 100 companies representing thousands of tech employees – as well as thousands of students within the area’s colleges and universities – Tampa Bay Tech provides programming and initiatives to support all those in the technology space. Through their membership and partnerships, their mission is to build a radically connected, flourishing tech hub where opportunity is abundant for all. Join the TBTech community at tampabay.tech and follow us on Facebook, Linkedin, Instagram, and Twitter.

About Accusoft

Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

Question

What features does PAS provide that I don’t get from exclusively using the backend?

Answer

The following features are provided through PAS:

  • Viewing Packages
  • Annotation storage/retrieval
  • Form Definition storage/retrieval
  • Simplified API to communicate with the backend
  • Affinity token management (clustered mode)

It is also important to note that all new feature development will involve PAS.

Features not offered through PAS include:

  • Content Conversion output to anything other than PDF.
Question

Where does PrizmDoc store E-Signatures and how can I retrieve them?

Answer

PrizmDoc does not store E-Signatures on the server. However, PrizmDoc does store them in the browser’s local storage so that an end user can use the same signature across multiple documents and multiple sessions within the same browser.

In the Viewer Sample, end users are able to save their individual signatures for their own record using the “Download Signature” button under the Manage E-Signatures menu. This will download a plain-text JSON file of the selected signature.

In the E-Signing Sample, if you want to retrieve the E-Signature from your own browser, you can open the developer tools (F12 on Chrome), go to the application tab, select local storage, and inside you can find the JSON of the E-Signature in the value of pccvEsignSignatures.