Technical FAQs

Question

I am trying to perform OCR on a PDF created from a scanned document. I need to rasterize the PDF page before importing the page into the recognition engine. When rasterizing the PDF page I want to set the bit depth of the generated page to be equal to the bit depth of the embedded image so I may use better compression methods for 1-bit and 8-bit images.

ImGearPDFPage.DIB.BitDepth will always return 24 for the bit depth of a PDF. Is there a way to detect the bit depth based on the PDF’s embedded content?

Answer

To do this:

  1. Use the ImGearPDFPage.GetContent() function to get the elements stored in the PDF page.
  2. Then loop through these elements and check if they are of the type ImGearPDEImage.
  3. Convert the image to an ImGearPage and find it’s bit depth.
  4. Use the highest bit depth detected from the images as the bit depth when rasterizing the page.

The code below demonstrates how to do detect the bit depth of a PDF page for all pages in a PDF document, perform OCR, and save the output while using compression.

private static void Recognize(ImGearRecognition engine, string sourceFile, ImGearPDFDocument doc)
    {
        using (ImGearPDFDocument outDoc = new ImGearPDFDocument())
        {
            // Import pages
            foreach (ImGearPDFPage pdfPage in doc.Pages)
            {
                int highestBitDepth = 0;
                ImGearPDEContent pdeContent = pdfPage.GetContent();
                int contentLength = pdeContent.ElementCount;
                for (int i = 0; i < contentLength; i++)
                {
                    ImGearPDEElement el = pdeContent.GetElement(i);
                    if (el is ImGearPDEImage)
                    {
                        //create an imGearPage from the embedded image and find its bit depth
                        int bitDepth = (el as ImGearPDEImage).ToImGearPage().DIB.BitDepth; 
                        if (bitDepth > highestBitDepth)
                        {
                            highestBitDepth = bitDepth;
                        }
                    }
                }
                if(highestBitDepth == 0)
                {
                    //if no images found in document or the images are embedded deeper in containers we set to a default bitDepth of 24 to be safe
                    highestBitDepth = 24;
                }
                ImGearRasterPage rasterPage = pdfPage.Rasterize(highestBitDepth, 200, 200);
                using (ImGearRecPage recogPage = engine.ImportPage(rasterPage))
                {
                    recogPage.Image.Preprocess();
                    recogPage.Recognize();
                    ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions() { VisibleImage = true, VisibleText = false, OptimizeForPdfa = true, ImageCompression = ImGearCompressions.AUTO, UseUnicodeText = false };
                    recogPage.CreatePDFPage(outDoc, options);
                }
            }
            outDoc.SaveCompressed(sourceFile + ".result.pdf");
        }
    }

For the compression type, I would recommend setting it to AUTO. AUTO will set the compression type depending on the image’s bit depth. The compression types that AUTO uses for each bit depth are: 

  • 1 Bit Per Pixel – ImGearCompressions.CCITT_G4
  • 8 Bits Per Pixel – ImGearCompressions.DEFLATE
  • 24 Bits Per Pixel – ImGearCompressions.JPEG

Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them.

However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images.

The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.

TAMPA, Fla. – Accusoft, the leader in document and imaging solutions for developers, is proud to announce its beta release testing program, which provides participants with real-time access to its latest product developments.

Customer input is a key factor in Accusoft’s mission to build better software integrations that deliver functionality like OCR, image cleanup, forms processing, file manipulation, and viewing solutions. Thanks to the new beta program, participants will get early access to brand new products and have the opportunity to provide feedback on the latest features for existing products. Developers can also customize what types of betas they would like to opt into so they can focus on products most relevant to their business.

“Our previous betas for PrizmDoc Editor and PrizmDoc Cells were extremely beneficial for everyone involved, “ says Mark Hansen, Product Manager. “Our team received rapid feedback that helped make our products better, while participants had the opportunity to shape those products to meet their specific requirements.”

By signing up for the beta program now, you can participate in the active beta for PrizmDoc Forms integration, which will allow you to repurpose (or use) your PDF forms to easily create, customize, and deploy as web forms anywhere. You’ll also be the first to know about new product offerings and have the ability to opt into beta releases for Accusoft’s existing products, such as ImageGear, FormSuite for Structured Forms, and PrizmDoc Suite.

To learn more about Accusoft’s exciting new beta program, please visit our website at https://www.accusoft.com/company/customers/beta-release-program.

About Accusoft:

Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

Organized each year by ALM, LegalTech is one of the most important events for the legal industry. The conference brings together a broad variety of experienced legal professionals and innovative LegalTech providers to highlight the business, regulatory, technology, and talent trends in the market. In previous years, LegalTech was held in New York City and attended by more than 8000 people.

LegalTech 2021 Is Now Legalweek(year)

This year, however, the COVID-19 pandemic has forced the organizers to take a different approach. The first decision involved shifting LegalTech from an in-person conference to a fully virtual event in order to protect the health of both attendees and organizers. While many industry events have made a similar transition, the LegalTech team went a step further by breaking the conference into a series of five interactive virtual events held over the course of 2021. This new virtual series was dubbed Legalweek(year) and aims to provide legal professionals with a powerful resource for working through an unprecedented era.

“This decision was made to address the needs of our legal community during these trying times of COVID-19 and to provide the type of innovative education, solutions, and connections that is so crucial to legal leaders,” said ALM’s Mark Fried. “The 2021 series will set the stage for a resurgence in the legal sector and a big ‘Welcome Back’ to attendees for our in-person Legalweek event (in 2022).”

The first virtual Legalweek(year) event is scheduled for February 2-4, 2021 and will feature bestselling author and political leader Stacey Abrams, legal AI expert Josua Walker, and former New Jersey governor and federal prosecutor Chris Christie as keynote speakers. Attendees will not only be able to participate remotely, but they will also have an additional six months worth of on-demand access to virtual content following each event.

Visit the Accusoft Legalweek(year) Virtual Booth

As a longtime sponsor of LegalTech, Accusoft is proud to participate in this groundbreaking series of virtual events. The conference has historically been a great opportunity for us to speak directly with the independent software vendors and legal IT professionals about the latest industry trends and LegalTech applications. 

This year, we’ll be hosting a “virtual booth” through the Legalweek(year) event site. Whether you’re a developer looking to solve a particular software challenge or a project manager building an in-house solution for your firm, you’ll find plenty of resources and support at the Accusoft booth. Read through our numerous case studies and LegalTech whitepapers or schedule a meeting with one of our product specialists to learn more about our SDK and API integrations for legal software. You can even chat with someone in real time if you need a quick answer!

After completing registration, Legalweek(year) attendees can access the Accusoft virtual booth during the event simply by logging into their account.

Visit the Accusoft Virtual Booth

Our LegalTech Solutions

Accusoft’s combination of content processing and conversion integrations help today’s innovative LegalTech applications reach their full potential. As law firms and legal departments incorporate more technology into their everyday operations, they need software tools capable of automating workflows, simplifying eDiscovery, and facilitating secure collaboration.

PrizmDoc Viewer

Our feature-rich HTML5 document viewer allows users to seamlessly view a variety of document and image files within their secure web application. Thanks to PrizmDoc Viewer’s powerful REST APIs, developers can provide additional functionality, such as annotations and redactions, that is essential for legal organizations.

PrizmDoc Editor

In addition to allowing users to edit DOCX files within the secure confines of their LegalTech applications, PrizmDoc Editor’s automated document assembly features streamlines the contract creation process to improve efficiency and accuracy. Documents can be assembled programmatically, incorporating commonly used or specific clauses, special language, and client data to eliminate “cut and paste” errors. Once documents are assembled, PrizmDoc Editor’s sharing tools allow firms to control access and ensure that everyone is working from the same up-to-date version.

ImageGear

With the ability to read, convert, and compress a wide range of files, our ImageGear SDK integration provides LegalTech applications with the tools they need to manage almost any type of file collected during the eDiscovery process. Powerful optical character recognition (OCR) capabilities allow ImageGear to read a wide variety of languages from around the world and convert scanned documents into searchable plain text or PDF files.

LegalTech in 2021 and Beyond

As legal organizations continue to make strides toward achieving true digital transformation, they will need versatile LegalTech applications capable of adapting along with them. Accusoft’s family of SDK and API integrations can help developers leverage the power of their innovative software tools and free up resources to focus on improving their core capabilities.

We hope you’ll join us at Legalweek(year) on February 2-4, 2021. Our booth will be available throughout the virtual event, so stop by to find out how Accusoft can help you realize the potential of your LegalTech applications.

Developers have plenty of options for viewing PDFs in their applications. With so many solutions to choose from, it’s easy to put off thinking about PDF support until much later in the development process. But doing so is often a recipe for trouble, resulting in ad hoc workarounds and settling for third-party plug-ins or native browser support that could impact application performance and security.

Directly embedding a web-based PDF viewer provides developers with much more flexibility and control over how their application manages and presents PDFs. By integrating a PDF JavaScript (PDF JS) viewer early in the development process, it’s easier to build a better user experience that doesn’t force users to take additional steps in order to interact with PDFs.

5 Benefits of an Embedded Web PDF Viewer

1. Consistent Viewing Experience

One of the original intentions of the PDF format was to ensure that documents would look the same no matter where or how they were being viewed. Unfortunately, not every viewer renders documents in the same way. More importantly, there are so many different ways of building a PDF that it can be difficult to know if it contains certain elements that are difficult for certain viewers to manage. This is typically the case with fonts and other formatting issues. While flattening the PDF can often address many of these issues, there are many instances where rasterizing the document robs it of valuable functionality (fillable form fields, for example).

By embedding a web PDF viewer directly into an application with a PDF JS integration, developers can ensure that users will always have a consistent viewing experience. Since the application will automatically open PDFs rather than handing the viewing task off to a browser plug-in or an external program on the user’s device, the document should render exactly as it was intended to look. This helps to avoid confusion and helps to enhance the user’s overall experience within the application.

2. Control Over Files

Organizations put a lot of time and resources into safeguarding confidential assets, but they can quickly undermine those efforts by failing to maintain control over their documents when sharing them. Many PDF viewing solutions allow or even require someone to download the document without having to obtain any special permissions. While this typically isn’t a major concern for public-facing documents, it could be disastrous for any shared PDFs that contain sensitive data or private information.

When developers use an embedded web PDF viewer, they allow document owners to maintain control over what people can and cannot do with shared PDFs. If they simply want someone to be able to view a document, but not edit it or download it, they can set the right permissions and restrictions to maintain control over the file. Embedding a PDF JS viewer is essential to this approach because it creates the conditions of the viewing experience.

3. Responsive Viewing

Today’s PDF viewing solutions need to account for what the viewing experience looks like on multiple screen sizes and devices. Not everyone will be reading a document on a conventional computer screen. They may want to view PDFs on a tablet or smartphone, both of which call for different viewing controls due to the nature of the device interface.

Without an embedded web PDF viewer, mobile users may not be able to readily access PDF-based content within an application. For customer facing solutions, this can seriously compromise the user experience. It’s also a major obstacle for organizations seeking to leverage an application to support a collaborative workplace. By integrating that viewing support, developers can ensure that users will be able to view PDF documents easily no matter what device they’re using.

4. No Dependencies

Many PDF viewing solutions offer extensive features, but at the cost of impacting application performance and security. That’s because they require cumbersome, memory-intensive plug-ins or complex server configurations. Even worse, they may be completely separate third-party solutions that require PDF files to be shared outside a developer’s secure application environment.

With the right PDF JS library, developers can easily integrate web PDF viewing capabilities directly into the browser without resorting to any external dependencies. Since JavaScript PDF viewers are so lightweight, they can be installed with a small amount of code that doesn’t have an impact on application performance. And since a PDF JS viewer can render PDFs within the solution, there’s no reason to risk exposing them to external software environments.

5. Easy Annotation and eSignature

Many organizations have understandably come to expect PDF viewers to come fully equipped with annotation tools that allow them to mark up files without having to transfer them to another program. They also frequently need the ability to sign documents as part of ongoing business dealings. For customer facing applications, these features are incredibly valuable because they streamline many processes for users. 

By embedding a web PDF viewer capable of supporting annotations and eSignatures, developers can quickly provide that functionality without having to build a new solution from scratch. Many annotation tools require complex backend server dependencies, so having those essential features integrated within a lightweight JavaScript PDF library can greatly improve web application performance.

Embed a Web PDF Viewer Today with Accusoft PDF Viewer

As a lightweight JavaScript PDF library, Accusoft PDF Viewer allows developers to add dynamic PDF support to their applications in a snap. While many PDF JS solutions require complicated coding to integrate properly, Accusoft PDF Viewer delivers PDF functionality to a web application with just ten lines of code.

See for yourself:

<div id="viewer">
</div> <script>
   (async () => {
     const pdfViewer = await window.Accusoft.PdfViewerControl.create({
       sourceDocument: 'https://MyURL.com/MyPDF.pdf',
       container: document.getElementById('viewer')
     });
   })();
 </script>

Accusoft PDF Viewer builds upon the versatility and reliability of the popular PDF.js open-source library, which serves as the foundation of many commercial PDF viewing solutions. From there, we used our extensive imaging technology expertise to push the boundaries in terms of rendering performance and usability. Optimized for speed and ease of use, the Standard Version of this JavaScript-based PDF viewer provides multiple benefits to developers looking to add robust viewing support without bogging down their development cycle:

  • Responsive UI: Easily view and interact with PDFs on any screen size thanks to optimized mobile controls.
  • Powerful Rendering: Smart rendering technology ensures that images remain crisp at all zoom levels.
  • Lightning-Fast Search: Get near-instant search results when trying to locate specific text, even when viewing large documents.
  • 100% Client-Side: With no server configurations or plug-ins, all viewing sessions remain entirely within a secure application environment.

For developers looking for expanding functionality, Accusoft PDF Viewer Professional version adds a number of important features:

  • UI Customization: Adjust the PDF viewer UI by adding or removing toolbar elements to create a better viewing experience.
  • Annotation: Markup PDF files with multiple annotation tools, then store or retrieve markups with an API.
  • eSignature: Create freehand signatures to sign documents on computers, tables, or smartphones.
  • White Labeling: Add customized branding to the viewer for a more consistent experience.

Accusoft PDF Viewer Standard Version can be downloaded today at no cost to quickly embed PDF features into any web-based application with just a few lines of code. When it’s time to add expanded functionality, we make it quick and easy to upgrade to Professional Version.

Check out the Accusoft PDF Viewer fact sheet for a detailed breakdown of the two versions. If you’re ready to get started, you can download Standard Version right now to try it for yourself.

Accusoft PDF Viewer builds upon the versatility and reliability of the popular PDF.js open-source library, which serves as the foundation of many commercial PDF viewing solutions. From there, we used our extensive imaging technology expertise to push the boundaries in terms of rendering performance and usability. Optimized for speed and ease of use, the Standard Version of this JavaScript-based PDF viewer provides multiple benefits to developers looking to add robust viewing support without bogging down their development cycle:

  • Responsive UI: Easily view and interact with PDFs on any screen size thanks to optimized mobile controls.
  • Powerful Rendering: Smart rendering technology ensures that images remain crisp at all zoom levels.
  • Lightning-Fast Search: Get near-instant search results when trying to locate specific text, even when viewing large documents.
  • 100% Client-Side: With no server configurations or plug-ins, all viewing sessions remain entirely within a secure application environment.

For developers looking for expanding functionality, Accusoft PDF Viewer Professional version adds a number of important features:

  • UI Customization: Adjust the PDF viewer UI by adding or removing toolbar elements to create a better viewing experience.
  • Annotation: Markup PDF files with multiple annotation tools, then store or retrieve markups with an API.
  • eSignature: Create freehand signatures to sign documents on computers, tables, or smartphones.
  • White Labeling: Add customized branding to the viewer for a more consistent experience.

Accusoft PDF Viewer Standard Version can be downloaded today at no cost to quickly embed PDF features into any web-based application with just a few lines of code. When it’s time to add expanded functionality, we make it quick and easy to upgrade to Professional Version.

Check out the Accusoft PDF Viewer fact sheet for a detailed breakdown of the two versions. If you’re ready to get started, you can download Standard Version right now to try it for yourself.

Sept. 7, 2022 – TAMPA, Fla.Accusoft, a software development company specializing in content processing, conversion, and automation solutions, and Snowbound, a leader in document viewing and conversion SDK solutions, announced today that they have entered into a definitive agreement under which Accusoft will acquire Snowbound. In the largest acquisition in its 30-year history, the transaction will significantly expand Accusoft’s presence and product portfolio.

Snowbound’s VirtualViewer® technology, supported by its powerful RasterMaster® SDK, supports numerous formats including PDF, MS Office, AFP, DWG, TIFF, email, video, audio files, and more within one universal interface. Its REST API and RESTful content handler provide a more flexible development and deployment capability enabling it to be easily integrated into most applications. In addition, the company offers connectors for IBM FileNet, Alfresco, and Pega. This acquisition will enable Accusoft to expand into new viewing and collaboration technologies offering customers a more robust web-based document viewing experience. 

“Today, we celebrate the joining of two companies who have both driven significant innovation for web-based viewing, conversion, and imaging SDK technologies. I have always had the utmost respect for Snowbound’s leadership team and their employees as we have competed against one another for sales opportunities over the decades.  I am honored to bring Snowbound into the Accusoft family,” said Jack Berlin, CEO of Accusoft.

“We were incredibly selective as we looked for the right acquisition partner. We were deliberate in selecting an organization with a leadership team and product portfolio that would be compatible with our own, and that would continue to grow, develop and nurture what we have built at Snowbound. We have proudly driven 26 years of innovation in the way that companies securely share, collaborate, and process documents and images.  With the acquisition, our technology will expand RasterMaster®’s and VirtualViewer®’s Java-based feature set and allow continued empowerment to customers as they navigate the ever-changing world of digital transformation and the complexities of document management,” Simon Wieczner, CEO Snowbound.

While the acquisition is complete, Accusoft will wait until January 2023 to take full operational control of Snowbound. In the meantime, the two leadership teams will partner to close out a strong 2022 and transition the team and its assets.

For more information about Accusoft, please visit https://www.accusoft.com/.

About Accusoft: 

Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

About Snowbound

For over two decades, Snowbound Software has been the independent leader in document viewing and conversion technology. It plays an integral role in enhancing and speeding company workflows for the Fortune 2000, including insurance claims processing, financial transactions, and more. Snowbound excels in providing customers with powerful solutions for capturing, viewing, processing, and archiving hundreds of different document and image types. Thanks to its pure Java technology and multi-environment support, Snowbound’s products operate across all popular platforms and can be integrated into new or existing enterprise content management systems. Nine of the 10 largest banks in the United States (seven of 10 in the world), as well as some of the biggest healthcare providers, government agencies, and insurance companies rely on Snowbound for their mission-critical needs. For more information, contact us at 617-607-2010 or info@snowbound.com, or visit www.snowbound.com