Technical FAQs

Question

I am trying to perform OCR on a PDF created from a scanned document. I need to rasterize the PDF page before importing the page into the recognition engine. When rasterizing the PDF page I want to set the bit depth of the generated page to be equal to the bit depth of the embedded image so I may use better compression methods for 1-bit and 8-bit images.

ImGearPDFPage.DIB.BitDepth will always return 24 for the bit depth of a PDF. Is there a way to detect the bit depth based on the PDF’s embedded content?

Answer

To do this:

  1. Use the ImGearPDFPage.GetContent() function to get the elements stored in the PDF page.
  2. Then loop through these elements and check if they are of the type ImGearPDEImage.
  3. Convert the image to an ImGearPage and find it’s bit depth.
  4. Use the highest bit depth detected from the images as the bit depth when rasterizing the page.

The code below demonstrates how to do detect the bit depth of a PDF page for all pages in a PDF document, perform OCR, and save the output while using compression.

private static void Recognize(ImGearRecognition engine, string sourceFile, ImGearPDFDocument doc)
    {
        using (ImGearPDFDocument outDoc = new ImGearPDFDocument())
        {
            // Import pages
            foreach (ImGearPDFPage pdfPage in doc.Pages)
            {
                int highestBitDepth = 0;
                ImGearPDEContent pdeContent = pdfPage.GetContent();
                int contentLength = pdeContent.ElementCount;
                for (int i = 0; i < contentLength; i++)
                {
                    ImGearPDEElement el = pdeContent.GetElement(i);
                    if (el is ImGearPDEImage)
                    {
                        //create an imGearPage from the embedded image and find its bit depth
                        int bitDepth = (el as ImGearPDEImage).ToImGearPage().DIB.BitDepth; 
                        if (bitDepth > highestBitDepth)
                        {
                            highestBitDepth = bitDepth;
                        }
                    }
                }
                if(highestBitDepth == 0)
                {
                    //if no images found in document or the images are embedded deeper in containers we set to a default bitDepth of 24 to be safe
                    highestBitDepth = 24;
                }
                ImGearRasterPage rasterPage = pdfPage.Rasterize(highestBitDepth, 200, 200);
                using (ImGearRecPage recogPage = engine.ImportPage(rasterPage))
                {
                    recogPage.Image.Preprocess();
                    recogPage.Recognize();
                    ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions() { VisibleImage = true, VisibleText = false, OptimizeForPdfa = true, ImageCompression = ImGearCompressions.AUTO, UseUnicodeText = false };
                    recogPage.CreatePDFPage(outDoc, options);
                }
            }
            outDoc.SaveCompressed(sourceFile + ".result.pdf");
        }
    }

For the compression type, I would recommend setting it to AUTO. AUTO will set the compression type depending on the image’s bit depth. The compression types that AUTO uses for each bit depth are: 

  • 1 Bit Per Pixel – ImGearCompressions.CCITT_G4
  • 8 Bits Per Pixel – ImGearCompressions.DEFLATE
  • 24 Bits Per Pixel – ImGearCompressions.JPEG

Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them.

However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images.

The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.

InsurTech SDK

The insurance market is booming. As noted by research firm Deloitte, the property and casualty (P&C) sector saw a massive income uptick in 2018 and steady growth last year that’s predicted to carry forward through 2020. To help manage the influx of new clients and handle more claims, many firms are spending on insurance technology (insurtech) — digital services and solutions that make it possible to reduce error rates and enhance operational efficiency. InsurTech SDKs are important components of this transformation.

Both in-house insurtech solutions and third-party platforms often excel in specific areas but come up short in others, putting insurance firms at risk of writing off potential gains. While solution switching and ground-floor rebuilds offer one route to success, there’s another option that’s more custom to your business needs: software development kits (SDKs). Here’s a look at three top SDKs that offer customized functionality potential.


FormSuite for Structured Forms: Solving for Data Capture

Time is money. The faster insurance companies accurately complete and file documents, the greater their revenue potential. And as noted by KPMG, the need for speed is more pressing than ever. Many insurance sectors have seen substantial increases in both claims and new applications as the COVID-19 crisis evolves. 

As a result, accurate and agile forms processing is critical to keep up with demand. If current insurance software can’t quickly capture forms data, recognize standard form fields, and let users easily create standard form libraries, policy processing falls behind.

FormSuite for Structured Forms makes it easy for developers to build in form identification and data capture that includes comprehensive form field detection with OCR, ICR, and OMR functionality and the ability to automatically identify scanned forms and match them to existing templates.

ImageGear for .NET and C/C++: Simplifying Conversion

Conversion is critical for insurance firms. Depending on the type and complexity of insurance claims, companies are often dealing with everything from Word documents for initial client assessments and .GIF or .JPG images of existing damage to contractor-specific PDFs or spreadsheets that detail necessary materials, time, and labor costs. The result? A mash-up of multiple file types that forces adjusters to spend valuable time searching for specific data instead of helping clients get their claims process up and running. This makes it difficult to recognize value from emerging digital initiatives. 

Accusoft’s ImageGear for .NET and ImageGear for C/C++ empower developers to integrate enterprise-class file viewing, annotation, conversion, and image processing functions into existing applications, allowing staff to both quickly collaborate on key tasks and find essential data across a single, easy-to-search document.

 


ImageGear: Streamlining PDF Capabilities

While insurance technology offers substantive opportunities for end-users to capture, convert, and retain data, this technology can also come with the challenge of increased complexity. According to recent research from PWC, for example, firms looking to capitalize on insurtech potential must be prepared to rapidly develop new product offerings and embrace the expectations

As a result, companies need applications that streamline current functions and allow them to focus on creating cutting-edge solutions. For example, PDF is a file format that is still used by enterprises worldwide to maintain document format consistency and maximize security. When it comes to converting multiple files into a PDF, software can be expensive and introduce data security issues. 

This can all be solved with an SDK like ImageGear, which makes it possible to integrate the total PDF package into any document management application, both reducing overall complexity and freeing up time for staff to work on new insurance initiatives.

Insurtech forms the framework of functional futures in policy applications, claims processing, and compliance reporting, but existing software systems may not provide the complete capability set companies need to make the most of digital deployments. These top SDKs offer insurance IT teams the ability to integrate key services, improve speed, and boost security at scale. Learn more about Accusoft’s SDKs at www.accusoft.com/products

TAMPA, Fla. – Accusoft, the leader in document and imaging solutions for developers, is proud to announce its beta release testing program, which provides participants with real-time access to its latest product developments.

Customer input is a key factor in Accusoft’s mission to build better software integrations that deliver functionality like OCR, image cleanup, forms processing, file manipulation, and viewing solutions. Thanks to the new beta program, participants will get early access to brand new products and have the opportunity to provide feedback on the latest features for existing products. Developers can also customize what types of betas they would like to opt into so they can focus on products most relevant to their business.

“Our previous betas for PrizmDoc Editor and PrizmDoc Cells were extremely beneficial for everyone involved, “ says Mark Hansen, Product Manager. “Our team received rapid feedback that helped make our products better, while participants had the opportunity to shape those products to meet their specific requirements.”

By signing up for the beta program now, you can participate in the active beta for PrizmDoc Forms integration, which will allow you to repurpose (or use) your PDF forms to easily create, customize, and deploy as web forms anywhere. You’ll also be the first to know about new product offerings and have the ability to opt into beta releases for Accusoft’s existing products, such as ImageGear, FormSuite for Structured Forms, and PrizmDoc Suite.

To learn more about Accusoft’s exciting new beta program, please visit our website at https://www.accusoft.com/company/customers/beta-release-program.

About Accusoft:

Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

Organized each year by ALM, LegalTech is one of the most important events for the legal industry. The conference brings together a broad variety of experienced legal professionals and innovative LegalTech providers to highlight the business, regulatory, technology, and talent trends in the market. In previous years, LegalTech was held in New York City and attended by more than 8000 people.

LegalTech 2021 Is Now Legalweek(year)

This year, however, the COVID-19 pandemic has forced the organizers to take a different approach. The first decision involved shifting LegalTech from an in-person conference to a fully virtual event in order to protect the health of both attendees and organizers. While many industry events have made a similar transition, the LegalTech team went a step further by breaking the conference into a series of five interactive virtual events held over the course of 2021. This new virtual series was dubbed Legalweek(year) and aims to provide legal professionals with a powerful resource for working through an unprecedented era.

“This decision was made to address the needs of our legal community during these trying times of COVID-19 and to provide the type of innovative education, solutions, and connections that is so crucial to legal leaders,” said ALM’s Mark Fried. “The 2021 series will set the stage for a resurgence in the legal sector and a big ‘Welcome Back’ to attendees for our in-person Legalweek event (in 2022).”

The first virtual Legalweek(year) event is scheduled for February 2-4, 2021 and will feature bestselling author and political leader Stacey Abrams, legal AI expert Josua Walker, and former New Jersey governor and federal prosecutor Chris Christie as keynote speakers. Attendees will not only be able to participate remotely, but they will also have an additional six months worth of on-demand access to virtual content following each event.

Visit the Accusoft Legalweek(year) Virtual Booth

As a longtime sponsor of LegalTech, Accusoft is proud to participate in this groundbreaking series of virtual events. The conference has historically been a great opportunity for us to speak directly with the independent software vendors and legal IT professionals about the latest industry trends and LegalTech applications. 

This year, we’ll be hosting a “virtual booth” through the Legalweek(year) event site. Whether you’re a developer looking to solve a particular software challenge or a project manager building an in-house solution for your firm, you’ll find plenty of resources and support at the Accusoft booth. Read through our numerous case studies and LegalTech whitepapers or schedule a meeting with one of our product specialists to learn more about our SDK and API integrations for legal software. You can even chat with someone in real time if you need a quick answer!

After completing registration, Legalweek(year) attendees can access the Accusoft virtual booth during the event simply by logging into their account.

Visit the Accusoft Virtual Booth

Our LegalTech Solutions

Accusoft’s combination of content processing and conversion integrations help today’s innovative LegalTech applications reach their full potential. As law firms and legal departments incorporate more technology into their everyday operations, they need software tools capable of automating workflows, simplifying eDiscovery, and facilitating secure collaboration.

PrizmDoc Viewer

Our feature-rich HTML5 document viewer allows users to seamlessly view a variety of document and image files within their secure web application. Thanks to PrizmDoc Viewer’s powerful REST APIs, developers can provide additional functionality, such as annotations and redactions, that is essential for legal organizations.

PrizmDoc Editor

In addition to allowing users to edit DOCX files within the secure confines of their LegalTech applications, PrizmDoc Editor’s automated document assembly features streamlines the contract creation process to improve efficiency and accuracy. Documents can be assembled programmatically, incorporating commonly used or specific clauses, special language, and client data to eliminate “cut and paste” errors. Once documents are assembled, PrizmDoc Editor’s sharing tools allow firms to control access and ensure that everyone is working from the same up-to-date version.

ImageGear

With the ability to read, convert, and compress a wide range of files, our ImageGear SDK integration provides LegalTech applications with the tools they need to manage almost any type of file collected during the eDiscovery process. Powerful optical character recognition (OCR) capabilities allow ImageGear to read a wide variety of languages from around the world and convert scanned documents into searchable plain text or PDF files.

LegalTech in 2021 and Beyond

As legal organizations continue to make strides toward achieving true digital transformation, they will need versatile LegalTech applications capable of adapting along with them. Accusoft’s family of SDK and API integrations can help developers leverage the power of their innovative software tools and free up resources to focus on improving their core capabilities.

We hope you’ll join us at Legalweek(year) on February 2-4, 2021. Our booth will be available throughout the virtual event, so stop by to find out how Accusoft can help you realize the potential of your LegalTech applications.

Sept. 7, 2022 – TAMPA, Fla.Accusoft, a software development company specializing in content processing, conversion, and automation solutions, and Snowbound, a leader in document viewing and conversion SDK solutions, announced today that they have entered into a definitive agreement under which Accusoft will acquire Snowbound. In the largest acquisition in its 30-year history, the transaction will significantly expand Accusoft’s presence and product portfolio.

Snowbound’s VirtualViewer® technology, supported by its powerful RasterMaster® SDK, supports numerous formats including PDF, MS Office, AFP, DWG, TIFF, email, video, audio files, and more within one universal interface. Its REST API and RESTful content handler provide a more flexible development and deployment capability enabling it to be easily integrated into most applications. In addition, the company offers connectors for IBM FileNet, Alfresco, and Pega. This acquisition will enable Accusoft to expand into new viewing and collaboration technologies offering customers a more robust web-based document viewing experience. 

“Today, we celebrate the joining of two companies who have both driven significant innovation for web-based viewing, conversion, and imaging SDK technologies. I have always had the utmost respect for Snowbound’s leadership team and their employees as we have competed against one another for sales opportunities over the decades.  I am honored to bring Snowbound into the Accusoft family,” said Jack Berlin, CEO of Accusoft.

“We were incredibly selective as we looked for the right acquisition partner. We were deliberate in selecting an organization with a leadership team and product portfolio that would be compatible with our own, and that would continue to grow, develop and nurture what we have built at Snowbound. We have proudly driven 26 years of innovation in the way that companies securely share, collaborate, and process documents and images.  With the acquisition, our technology will expand RasterMaster®’s and VirtualViewer®’s Java-based feature set and allow continued empowerment to customers as they navigate the ever-changing world of digital transformation and the complexities of document management,” Simon Wieczner, CEO Snowbound.

While the acquisition is complete, Accusoft will wait until January 2023 to take full operational control of Snowbound. In the meantime, the two leadership teams will partner to close out a strong 2022 and transition the team and its assets.

For more information about Accusoft, please visit https://www.accusoft.com/.

About Accusoft: 

Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

About Snowbound

For over two decades, Snowbound Software has been the independent leader in document viewing and conversion technology. It plays an integral role in enhancing and speeding company workflows for the Fortune 2000, including insurance claims processing, financial transactions, and more. Snowbound excels in providing customers with powerful solutions for capturing, viewing, processing, and archiving hundreds of different document and image types. Thanks to its pure Java technology and multi-environment support, Snowbound’s products operate across all popular platforms and can be integrated into new or existing enterprise content management systems. Nine of the 10 largest banks in the United States (seven of 10 in the world), as well as some of the biggest healthcare providers, government agencies, and insurance companies rely on Snowbound for their mission-critical needs. For more information, contact us at 617-607-2010 or info@snowbound.com, or visit www.snowbound.com