Technical FAQs

Question

I am trying to perform OCR on a PDF created from a scanned document. I need to rasterize the PDF page before importing the page into the recognition engine. When rasterizing the PDF page I want to set the bit depth of the generated page to be equal to the bit depth of the embedded image so I may use better compression methods for 1-bit and 8-bit images.

ImGearPDFPage.DIB.BitDepth will always return 24 for the bit depth of a PDF. Is there a way to detect the bit depth based on the PDF’s embedded content?

Answer

To do this:

  1. Use the ImGearPDFPage.GetContent() function to get the elements stored in the PDF page.
  2. Then loop through these elements and check if they are of the type ImGearPDEImage.
  3. Convert the image to an ImGearPage and find it’s bit depth.
  4. Use the highest bit depth detected from the images as the bit depth when rasterizing the page.

The code below demonstrates how to do detect the bit depth of a PDF page for all pages in a PDF document, perform OCR, and save the output while using compression.

private static void Recognize(ImGearRecognition engine, string sourceFile, ImGearPDFDocument doc)
    {
        using (ImGearPDFDocument outDoc = new ImGearPDFDocument())
        {
            // Import pages
            foreach (ImGearPDFPage pdfPage in doc.Pages)
            {
                int highestBitDepth = 0;
                ImGearPDEContent pdeContent = pdfPage.GetContent();
                int contentLength = pdeContent.ElementCount;
                for (int i = 0; i < contentLength; i++)
                {
                    ImGearPDEElement el = pdeContent.GetElement(i);
                    if (el is ImGearPDEImage)
                    {
                        //create an imGearPage from the embedded image and find its bit depth
                        int bitDepth = (el as ImGearPDEImage).ToImGearPage().DIB.BitDepth; 
                        if (bitDepth > highestBitDepth)
                        {
                            highestBitDepth = bitDepth;
                        }
                    }
                }
                if(highestBitDepth == 0)
                {
                    //if no images found in document or the images are embedded deeper in containers we set to a default bitDepth of 24 to be safe
                    highestBitDepth = 24;
                }
                ImGearRasterPage rasterPage = pdfPage.Rasterize(highestBitDepth, 200, 200);
                using (ImGearRecPage recogPage = engine.ImportPage(rasterPage))
                {
                    recogPage.Image.Preprocess();
                    recogPage.Recognize();
                    ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions() { VisibleImage = true, VisibleText = false, OptimizeForPdfa = true, ImageCompression = ImGearCompressions.AUTO, UseUnicodeText = false };
                    recogPage.CreatePDFPage(outDoc, options);
                }
            }
            outDoc.SaveCompressed(sourceFile + ".result.pdf");
        }
    }

For the compression type, I would recommend setting it to AUTO. AUTO will set the compression type depending on the image’s bit depth. The compression types that AUTO uses for each bit depth are: 

  • 1 Bit Per Pixel – ImGearCompressions.CCITT_G4
  • 8 Bits Per Pixel – ImGearCompressions.DEFLATE
  • 24 Bits Per Pixel – ImGearCompressions.JPEG

Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them.

However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images.

The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.

Redacting documents is critically important for legal departments and government agencies. By removing sensitive information from a digital file before sharing it publicly, it’s possible to protect private data or classified materials from being exposed. 

In the days before digital documents, redaction involved a simple, if crude, process of covering text with a black marker. Since redactions were done by hand, it was easy for mistakes to be made, which could range from using insufficiently dark ink to leaving portions of text exposed. The development of high-powered photo enhancement has rendered this approach all but useless, as even inexpensive image processing technology can distinguish blacked-out text.

With the transition to digital documents, organizations finally have access to true redaction capabilities. Unfortunately, they still tend to make mistakes when it comes to flattened PDFs that could leave redacted context exposed and vulnerable.

What Is a Flattened PDF?

A modern PDF file consists of multiple layers, each of which can contain separate elements. One layer might feature text, another image, and yet another a fillable form. The flattening process removes all interactive elements from form fields and combines all of the document’s elements into a single layer. 

Organizations frequently used this process to “lock in” form content to prevent anyone from altering the information after a user completes the forms. It also removes elements like dropdown selections within form fields and can burn in other annotations or markups, making them a permanently visible element of the document.

Flattened PDF Redactions

Unfortunately, simply flattening a PDF is usually not sufficient to securely redact a document. That’s because obscured elements are still present in the document; they’re just not visible when the file is viewed and printed. 

Recovering improperly redacted content is actually quite trivial in many cases. Two of the most infamous recent examples include information released during the investigation of political campaign chairman Paul Manafort in 2019 and court documents related to Facebook’s use of personal data in 2017. In both cases, journalists were able to copy redacted text from PDF files and paste it into a text editor to reveal the obscured content.

There are typically two ways that improper redactions occur:

  1. Covering Text with Boxes: This frequent mistake occurs when people try to treat a digital document like a physical piece of paper. They place annotations over the sensitive content, usually in the form of a black box, and then save a flattened version of the PDF thinking that no one will be able to separate the text from the annotation element. As the Manafort and Facebook cases demonstrate, however, getting around these “redactions” is usually quite easy.
  2. Changing the Color of Text: Another common redaction error involves altering the color of the sensitive text to match the document background. Changing the text color to white, for instance, might make it invisible to the human eye, but it does nothing to alter the content itself. The text can be made visible again by using the copy/paste trick described above or by altering the background characteristics in another program. 

The only way to make these methods viable for true redactions would be to actually print the documents with the content hidden and then scan them back into digital form, where OCR could be used to reconstruct a new file. But even in this case, there’s a chance that a powerful OCR engine might be able to pick up the hidden elements.

Using Proper Redaction Prior to Flattening with PrizmDoc Viewer

In order to redact documents securely, applications need to have access to specialized redaction tools that are capable of actually removing content from the document itself before applying redaction indicators. PrizmDoc Viewer’s redaction API can find and extract key text while also providing single or multiple reasons for the removal. 

This not only allows organizations to redact documents quickly, but it also ensures that the redacted information won’t be exposed later because it no longer even exists within the document. More importantly, the outputted document is entirely new, so there is no deleted information to recover. 

While most people are familiar with the distinctive black bars that indicate redacted content, even this leaves behind significant context clues that could provide hints of what was removed. Consider, for instance, a document involving multiple parties where the names of conversation participants have been redacted.

The following information:

PDF Redaction

The length of the redaction, then, would at least indicate when the redaction did not involve one person or the other. There are also many instances involving government documents where the length of the redacted information in classified material might suggest its relevance or importance.

When it comes to GovTech applications that need to remove large portions of information for security reasons, it often helps to perform redaction BEFORE turning a document into a flattened PDF. The PrizmDoc Viewer redaction API can be used to quickly extract text from a document and then redact it as a plain text file

Unlike a static PDF document, plain text accounts for width variations, so all redactions can be replaced with a standardized <Text Redacted> marker that makes it impossible to know the length of the redacted content. The text could then be converted into a PDF after the redaction process is complete.

Take Control of PDFs with PrizmDoc Viewer

As a fully-featured HTML5 viewer, Accusoft’s PrizmDoc Viewer delivers powerful viewing, annotation, and conversion functionality to your web application. It provides a broad range of redaction capabilities that allow legal, financial, and government organizations to keep their sensitive data secure and protect their customers. 

By integrating these complex features into your applications, you can focus your development efforts on building the tools that set your solution apart from the competition while our proven technology powers your customers’ viewing and redaction needs. To learn more about PrizmDoc Viewer’s powerful capabilities, download a free trial and test how it can support and enhance your application.

learning management system LMS

Post-secondary schools look very different this year as colleges and universities embrace both blended learning and online-only approaches to content delivery and engagement. But this isn’t a one-off operation. Even as pandemic pressures ease, the shift to distance learning as the de facto solution for many students won’t disappear.  As a result, it’s critical for schools to develop and deploy learning management systems (LMSs) that both meet current needs and ensure they’re capable of keeping up with educational evolution. But what does this look like in practice? How do developers and team leaders build fully-functional LMS solutions that empower student success without breaking the bank?

 

Learning Management Systems (LMS) Challenges

When schools first made the shift to distance learning directives, speed was of the essence. While students were barred from campus for safety reasons, they’d paid for a full semester of instruction, and schools needed to deliver. As a result, patchwork programs became commonplace. Colleges and universities combined existing education software with video conferencing and collaboration tools to create “good enough” learning models that got them through to summer break. Despite best educational efforts, however, some students still went after schools with lawsuits, alleging that the quality of instruction didn’t align with tuition totals.

So it’s no surprise that as fall semesters kick off, students aren’t willing to put up with learning management systems that barely make the grade. They want full-featured distance learning that helps them engage with instructors and connect with new content no matter how, where, or when they access campus networks. 

As a result, development teams can’t simply correct for current COVID conditions. Instead, they need to create systems that deliver both blended and purely online interactions, and have the power to ensure students that choose to continue with digital-first learning can still stay connected even after returns to campus become commonplace.

 

How to Create a Functional LMS Framework

So what does a fully-functional LMS framework look like in practice? Six features are critical for ongoing success. Let’s explore how these features can enhance your learning management system and set your end-users up for success in the classroom and at home:

 

Diverse Document Viewing

As schools make the shift to distance learning, the ability to view multiple document types is critical for long-term LMS success. From standard Word documents, Excel spreadsheets, and PowerPoint presentations to more diverse image types — such as those used in medical educational programming or manufacturing courses — students and instructors need the ability to both send and view diverse document types on-demand. 

While both free and paid solutions for viewing exist outside LMS ecosystems, choosing this route creates two potential problems. Students with diverse technological and economic backgrounds may face challenges in finding and using these tools, and data security may be compromised. This is especially critical as schools handle greater volumes of students’ personal and financial information. If document viewing happens outside internal systems, private concerns become paramount.

 

In-Depth Annotations

With students now submitting assignments and exams via educational software, viewing isn’t enough. Staff also need the ability to annotate assets as they arrive. Here, professors and teaching assistants are best-served by built-in tools that allow them to quickly redline papers or projects, add comments, highlight key passages, and quickly markup documents with specific instructions or corrections. 

Without this ability, staff have two equally unappealing choices. They can either print out, manually correct, and then re-scan documents, or send all comments as separate email attachments. Both are problematic, since they limit the ability of students and teachers to easily interact with the same document.

 

Comprehensive Conversion

File conversion is critical for effective learning management systems (LMSs). Specifically, schools need ways to quickly convert multiple document types into single, searchable PDFs. Not only do PDFs offer the ability to control who can edit, view, or comment on papers or exams, they make it easy for teachers to quickly find specific content. The permissions-based nature of PDFs makes them ideal for post-secondary applications and a must-have for any education software solution. 

 

Cutting-Edge OCR and ICR

Optical character recognition and intelligent character recognition also forms a key part of distance learning directives. With some students still more comfortable with hand-written hard copies and some classes that require students to show specific work, OCR can help bridge the gap between form and function. By integrating tools with the ability to recognize and convert multiple character types and sets, schools are better equipped to deal with any document type. Search is also bolstered by cutting-edge OCR; instead of forcing staff to manually examine documents for key data, OCR empowers digital discovery.

 

Complete Data Capture

Forms are a fundamental part of university and college life — but the myriad of digital documents can quickly overwhelm legacy education software. Integrating tools with robust form-field detection allow schools and staff to streamline the process of complete data capture, both increasing the speed of information processing and reducing the potential for human error.

 

Barcode Benefits

As campuses shift to hybrid learning models, students occupy two worlds, both physical and digital. But this duality introduces complexity when it comes to tracking who’s on campus, when, and why. These are currently key metrics for schools looking to keep students safe in the era of social distancing. 

By deploying full-featured barcode scanning solutions as part of LMS frameworks, colleges and universities can get ahead of this complexity curve. From scanning ID cards to take attendance and track resource use to using barcodes as no-contact purchase points or metric measurements for ongoing analytics, barcode solutions are an integral part of LMS solutions.

 

Automation Advantages

The sheer volume of digital documents now generated and handled by post-secondary schools poses the problem of practicality. Teachers and administrators simply don’t have time to evaluate and enter data at scale and speed while also ensuring accuracy. By automating key processes including document conversion, capture, and character recognition, schools can reduce the time required to process documents, leaving more room for student engagement.

 

Building an LMS Product for Teachers & Students

The bottom line for LMS solutions? If they don’t work for end-users, they won’t work for the broader school system as a whole. Gone are the days of invisible IT infrastructure. Now, students and staff alike are school stakeholders with evolving expectations around technology.

By deploying distance learning solutions that prioritize end-user outcomes with enhanced document viewing, editing, data capture, and automation, developers can create LMS tools capable of both solving immediate issues and offering sustained student success over time. Learn more about these functionality integrations for your learning management system at accusoft.com/products

Document image cleanup is a vital step in building an efficient and accurate processing workflow. In a perfect world, every file an organization receives would be in pristine, high-resolution condition so it could be processed quickly and easily. Unfortunately, the reality is that documents come in all sizes, conditions, and formats. Companies can receive vital information in the form of email, traditional mail, fax, or even text. Documents scanned into a crooked, low-resolution file are just as likely to be received alongside digital versions submitted entirely through a web application.

This poses a significant challenge for software developers building the next generation of automation solutions. Without some way of cleaning up document images, companies that still rely upon manual processes will struggle to read and process files. More importantly, poor image quality interferes with optical character recognition (OCR) engine accuracy, making more human interaction necessary to verify recognition results. By integrating document image cleanup tools into their applications, developers can enhance the speed and accuracy of their automated processes and help their customers leverage the full potential of digital transformation.

7 Essential Document Image Cleanup Features Your Application Needs

There are a few essential document image cleanup tools that should be considered absolutely essential for any application that has to manage multiple file formats. To see these tools in action and understand why they’re so vital, let’s take a look at how these features work in ImageGear, Accusoft’s powerful document and image processing SDK integration.

1. Despeckling

Speckles can appear on document images for a variety of reasons. In some cases, they are unwanted image noise created during the original scanning process (the classic “salt and pepper” noise), but in other instances, they’re simply the result of dust particles on the surface of a scanned document or on the scanner itself. They are frequently encountered when converting old documents into digital form. Speckling not only interferes with OCR engine performance, but can also make it difficult to maintain image fidelity when compressing or converting files. 

ImageGear can reduce or eliminate speckling as part of the document image cleanup process. There are two ways to approach speckle removal:

  • Despeckle Method: This function removes color noise from 1-bit images by taking the average color value in a square area around the speckle and replacing its pixels with that value.
  • GeomDespeckle Method: This function uses the Crimmins algorithm to send the image through a geometric filter, reducing the undesired noise while preserving edges of the original image. This process is applied only to 8-bit grayscale images.

2. Image Inversion

With so many documents being scanned, converted, and transferred between applications, there’s a greater likelihood of something going wrong along the way. One of the most frequent problems is image inversion, which swaps pixel colors and turns a standard white background with black text into a black background with white text. This mix-up can render documents completely unreadable by OCR engines.

ImageGear can be configured to automatically recognize when image inversion is necessary. The invert method can also be used to immediately change the color of each pixel contained in the entire image, turning white to black and black to white.

3. Deskewing

Skewed document images are both cumbersome to manage and challenging for OCR engines to read accurately. Unfortunately, manually scanned documents are often uneven, and the problem is only becoming worse now that many people are using their phone cameras as makeshift document scanners. That’s why the first step in the document image cleanup process is often deskewing, which rotates and aligns the images to enhance recognition accuracy.

The deskewing process often involves more than just rotating a document, especially where images taken by a digital camera are concerned. ImageGear’s 3D deskew feature corrects for perception distortion, which can occur whenever a document is scanned by a handheld camera, using a sophisticated algorithm.

4. Blank Page Detection

Many documents converted into digital format contain information on both sides. If they are fed into a scanner along with single page documents, the resulting file will contain multiple blank pages. This might not seem like much of a problem, but if there is enough speckling or noise around the edge of the image, an application may try to apply an OCR engine to it and generate an error result. Blank page detection can quickly identify any image that is blank or mostly white and flag it for deletion.

5. Line Removal

Although they may not seem very troublesome at first glance, lines can create a number of problems for OCR engines. When lines and printed text overlap, it can be difficult for the engine to distinguish between the two. In some instances, the engine may even misread a line as a letter or number. Removing lines from a document prior to OCR reading ensures that the remaining text will be recognized more quickly and analyzed more accurately.

ImageGear supports both solid line removal and dotted line removal. The first method automatically detects and removes any horizontal and vertical lines contained in the document (like frames or tables), while the second method determines which dotted lines to remove by measuring the number and diameter of dots.

6. Border Removal

When scanned documents don’t align properly with the boundaries of the scanner or were copied onto paper that was larger than the original image at some point, the remaining space is often filled in with black. These borders are not only unsightly, but they also interfere with other document image cleanup processes. Although they can usually be cropped out easily, the cropping process alters the proportions of the image, which could create more problems later.

Removing these large black regions is easy with ImageGear’s CleanBorders option. It focuses on the areas near the edge of the page, which typically should not contain any important image data. 

7. Remove Hole Punches

Important documents were often stored in binders before they were prepared for digitization. When scanned, the blank space from the hole punch leaves a large, black dot along the edge of the document. Unfortunately, these holes sometimes overlap with text or could be picked up as filled-in bubbles by an optical mark recognition (OMR) engine.

ImageGear can identify and remove punch holes created by common hole punchers, including two, three, and five hole configurations. The RemovePunchHoles method can be adjusted to account for differing hold diameters in addition to different locations.

Unlock Your Application’s Document Image Cleanup Potential with ImageGear

Although ImageGear can perform a variety of document handling functions such as viewing, conversion, annotation, compression, and OCR processing, its document image cleanup capabilities help applications overcome key content management challenges and enhance performance in other areas. Improved document image quality allows data to be extracted more quickly, enhances the viewing experience, and reduces complications when it comes to file compression and conversion.

Learn more about the ImageGear collection of SDKs to discover how they can deliver versatile document and image processing to your applications.

top coding trends

The software development industry is changing more rapidly than ever before. With new technology hitting the market on a regular basis, software vendors need to become flexible enough to adapt to the top coding trends if they want to remain competitive.

After a tumultuous 2020, the industry has seen a number of key trends emerge in the first half of 2021. Here are some of the top coding trends worth watching in the second half of the year.

Top 5 Coding Trends of 2021 (So Far)

1. Open-Source Evolution

Developers have been turning to open-source solutions for some time now as a quick way to integrate new features into their applications. While there are a lot of great benefits to using open-source code, it’s not always the simple solution that it appears to be. Substantial work may need to be done to implement the specific features an application requires. More importantly, open-source solutions rarely offer much in the way of support or security updates, and there can also be complicated intellectual property issues to consider when incorporating open source code into a proprietary application.

That’s why many innovative developers are using stable open-source solutions as a foundation for creating more feature-rich software SDKs. For teams building new applications, it’s often much easier to implement one of these integrations because it will require far less configurations and additional coding to get up and running. They can also get the benefits of dedicated support and not have to worry about whether their new integration will create any legal issues down the road.

2. UX Design

With the proliferation of Software as a Service (SaaS) platforms and the widespread use of open source development resources, it’s becoming easier for organizations to find the applications that suit their business needs. What they can’t always find, however, is a solution that’s easy for their employees and customers to use. That’s why the quality of an application’s user experience (UX) is quickly becoming a key differentiator in the software market.

Rather than implementing UX features at a later stage of the coding process, developers need to consider how users will interact with their solution from the very beginning. Software needs to be intuitive and easy to implement out-of-the-box. This applies equally to end-user products and developer-focused SDK integrations. No one has time to struggle with software that’s difficult to use. If a solution proves too cumbersome and hard to implement, customers will likely turn to a competing product that offers a better user experience. The more time developers spend considering their software’s UX, the better they’ll be able to adapt it to customer needs in the future.

3. Responsive Mobile Support

For many years, there was a somewhat artificial distinction between mobile software development and desktop development. But in a world where half of all internet activity comes from mobile devices, no developer working on web-based applications can afford to consider their software “just” for desktops. Just as website designers have been building pages that respond dynamically to different screen sizes and control interfaces, developers must also account for the unique characteristics of mobile devices.

The unique characteristics of mobile screens present specific challenges regarding the application’s user interface (UI). Simply providing standard desktop controls is bound to result in a frustrating mobile experience. Mobile responsive applications can accommodate touch-specific controls (such as pinch-to-zoom) without compromising the desktop experience at the same time. Developers must think about what kinds of devices their software solutions will be used on if they’re to build features and tools that will truly benefit their customers.

4. API Integrations

Today’s developers no longer need to build every feature their application might require from scratch. Thanks to a new generation of web API technology, it’s easier than ever to find software integrations that can quickly and easily add vital features without having to dedicate weeks of development time to building them. Understanding which web application features can be incorporated via a REST API helps development teams to focus their limited resources and time on the truly unique features that will help set them apart from the competition.

Utilizing web API technologies can streamline sprints and shorten development time significantly. That’s because much of the “trial and error” work of building a new feature is eliminated. Rather than designing and testing new capabilities for months, developers can simply implement a tested and proven web API integration within a matter of days. That helps to keep budgets under control and development schedules on track to make targeted launch days.

5. Remote Work

When the COVID-19 pandemic struck the world in early 2020, many software developers transitioned to a remote workplace arrangement. As other industries begin to tentatively return to the office, tech workers seem to have become quite accustomed to working remotely. According to a late 2020 survey conducted by Indeed, nearly half of participants reported that they now have the option to work remotely on a permanent basis, with 95 percent of them planning to do so. Perhaps even more telling, however, was the finding that 60 percent of tech workers are willing to take a pay cut in order to keep working from home.

Software vendors will have to accommodate these expectations if they hope to remain competitive when it comes to finding and retaining talent. Project managers should not expect work patterns to go back to the way they were before the pandemic. They will be better served focusing on how to organize remote work efficiently and how to provide the resources developers need to be productive while working from home. Transitioning to a more remote workforce is also allowing organizations to tap into a much broader pool of talent, which will help to bring more diverse voices and experiences into the development process.

Keeping an Eye on Future Trends

The software development teams at Accusoft are always looking ahead to see where today’s coding trends are leading the industry. That’s why we’ve been building easy-to-implement, lightweight SDKs like the free-to-use Accusoft PDF Viewer alongside our stable of versatile API solutions like PrizmDoc Viewer. We also continue to make ongoing improvements to our products to provide a better user experience for customers.

Our collection of software integrations can help development teams keep up with today’s top coding trends. Whether you’re looking to quickly integrate new features into an existing application or are looking for the right tools to support your next project, we have the API and SDK resources to keep you on-budget and on-time. Check out the Accusoft Resource Center to learn more.

Accusoft’s FormSuite for Structured Forms is a powerful SDK that allows you to integrate character recognition, form identification, document cleanup, and data capture capabilities into your software applications. You can set up unique form templates based on your processing needs and then design customized output architecture to extract data for delivery to a database or other downstream applications, helping you get to production faster or bring a new level of functionality to your legacy systems.

Setting all of that functionality up, however, can be a daunting task, especially if you’re working with a wide variety of form types. That’s why our FormSuite enablement services team is available to help you implement the features you need to ensure lasting results. Whether you’re facing bandwidth constraints or lack the resources to build expertise quickly, our FormSuite experts bridge the gap to make your project a success. Our enablement services team takes a five step approach to every engagement.

The Accusoft Approach to Enablement Services

Step 1: Thorough Architecture Review

We start by conducting a top to bottom analysis of your production or operational environment. Our review not only evaluates your system architecture and data workflow, but also breaks down the details of your potential use cases and existing work samples. 

Step 2: Identifying the Right Fit

Next, we determine the best FormSuite options based on your unique requirements and build you a custom enablement plan that will equip you with the instruction and assistance you need for a successful implementation.

Step 3: Training Your Team

Armed with information about your application’s specific requirements, we develop a customized training program to give your team a solid foundation for future development and ongoing maintenance. From guidance on form template creation and image enhancement to working with the forms API, we provide you with targeted guidance designed to help you solve potential challenges unique to your application environment.

Step 4: Implementation Support

Once the training is complete, you’ll have the foundational knowledge required to build the forms processing workflows your application requires. Our FormSuite experts remain on call to answer your questions so you can achieve your integration faster and ensure that you’re processing forms accurately.

Step 5: Preparing for Long-Term Success

Our enablement services prepare you to manage your implementation over the long term. We not only show you how to maintain the current environment, but also identify potential opportunities to deploy new features as your application scales in the future.

Keep the Partnership Going

Following your integration, we also provide ongoing support options to our customers whether or not they’ve utilized our enablement services. You get free Upgrade Support for 90 days after initial purchase, which includes email support and product upgrades. After that period, you can extend Upgrade Support, or elect to transition to our Standard Support or Priority Support annual plans.

To learn more about FormSuite for Structured Forms enablement services, talk to one of our solutions engineers. We’re ready to help you get your integration started!

InsurTech SDK

The insurance market is booming. As noted by research firm Deloitte, the property and casualty (P&C) sector saw a massive income uptick in 2018 and steady growth last year that’s predicted to carry forward through 2020. To help manage the influx of new clients and handle more claims, many firms are spending on insurance technology (insurtech) — digital services and solutions that make it possible to reduce error rates and enhance operational efficiency. InsurTech SDKs are important components of this transformation.

Both in-house insurtech solutions and third-party platforms often excel in specific areas but come up short in others, putting insurance firms at risk of writing off potential gains. While solution switching and ground-floor rebuilds offer one route to success, there’s another option that’s more custom to your business needs: software development kits (SDKs). Here’s a look at three top SDKs that offer customized functionality potential.


FormSuite for Structured Forms: Solving for Data Capture

Time is money. The faster insurance companies accurately complete and file documents, the greater their revenue potential. And as noted by KPMG, the need for speed is more pressing than ever. Many insurance sectors have seen substantial increases in both claims and new applications as the COVID-19 crisis evolves. 

As a result, accurate and agile forms processing is critical to keep up with demand. If current insurance software can’t quickly capture forms data, recognize standard form fields, and let users easily create standard form libraries, policy processing falls behind.

FormSuite for Structured Forms makes it easy for developers to build in form identification and data capture that includes comprehensive form field detection with OCR, ICR, and OMR functionality and the ability to automatically identify scanned forms and match them to existing templates.

ImageGear for .NET and C/C++: Simplifying Conversion

Conversion is critical for insurance firms. Depending on the type and complexity of insurance claims, companies are often dealing with everything from Word documents for initial client assessments and .GIF or .JPG images of existing damage to contractor-specific PDFs or spreadsheets that detail necessary materials, time, and labor costs. The result? A mash-up of multiple file types that forces adjusters to spend valuable time searching for specific data instead of helping clients get their claims process up and running. This makes it difficult to recognize value from emerging digital initiatives. 

Accusoft’s ImageGear for .NET and ImageGear for C/C++ empower developers to integrate enterprise-class file viewing, annotation, conversion, and image processing functions into existing applications, allowing staff to both quickly collaborate on key tasks and find essential data across a single, easy-to-search document.

 


ImageGear: Streamlining PDF Capabilities

While insurance technology offers substantive opportunities for end-users to capture, convert, and retain data, this technology can also come with the challenge of increased complexity. According to recent research from PWC, for example, firms looking to capitalize on insurtech potential must be prepared to rapidly develop new product offerings and embrace the expectations

As a result, companies need applications that streamline current functions and allow them to focus on creating cutting-edge solutions. For example, PDF is a file format that is still used by enterprises worldwide to maintain document format consistency and maximize security. When it comes to converting multiple files into a PDF, software can be expensive and introduce data security issues. 

This can all be solved with an SDK like ImageGear, which makes it possible to integrate the total PDF package into any document management application, both reducing overall complexity and freeing up time for staff to work on new insurance initiatives.

Insurtech forms the framework of functional futures in policy applications, claims processing, and compliance reporting, but existing software systems may not provide the complete capability set companies need to make the most of digital deployments. These top SDKs offer insurance IT teams the ability to integrate key services, improve speed, and boost security at scale. Learn more about Accusoft’s SDKs at www.accusoft.com/products

For today’s healthcare organizations, having a versatile electronic health records (EHR) system is essential for running an efficient practice and connecting to other medical providers. Thanks to EHRs, practices can ensure that they’re getting a complete picture of a patient’s health and treatment history, which allows them to deliver much better care outcomes. As developers continue to refine the usability of these systems, they need to consider how they can improve core features like healthcare electronic document management and medical imaging support.

Managing Medical Documents

A typical EHR system has to be able to handle quite a lot of document types. Anyone who has visited a healthcare provider is quite familiar with the myriad forms used to gather patient information. Many of those forms end up being converted into digital formats that need to be managed within the EHR system. Then there are digital versions of lab reports, physician notes, invoices, and financial documents. 

While EHR systems may utilize databases to store much of the information they need, healthcare providers still need to be able to produce physical documents and view digital files in many situations. This could include communicating information to patients, complying with regulatory requests, or filing a financial claim of some kind. More importantly, they also rely on digital documents to enter data into the EHR system. The push toward interoperability between EHR systems has improved information sharing, but there are still many instances where medical records are delivered in the form of a document that needs to be managed securely.

Document Conversion

If an EHR application lacks the right file conversion capabilities, viewing and extracting data from those documents could prove difficult. The last thing a practice wants to do is actually remove them from the secure EHR system to open and convert the files using separate software that may not be compliant when it comes to handling healthcare information. Even if the external application is secure, transferring files over, converting them, and then transferring them back is both inefficient and creates unnecessary risk (especially if someone forgets to delete the original file or move it back into the EHR environment).

ImageGear Medical has a document conversion feature that supports a wide range of file types, allowing developers to build EHR applications capable of quickly converting incoming documents. They can even set up their solution to perform conversion tasks programmatically to help streamline workflows and minimize human error. This helps practices to get a better handle on document management, ensuring that they will be able to do everything they need with files completely within the EHR application.

Other Essential Document Features

But ImageGear Medical’s document capabilities go far beyond just conversion. With full annotation support, developers can provide markup tools within the EHR system that allow physicians to make notes and comments on various documents. This allows them to share information much more easily. If a physician has a question about a diagnosis or a prescription, for instance, they can simply leave an annotation note directly on the document rather than referring to it in a separate message.

ImageGear Medical also allows applications to perform full-page optical character recognition (OCR), which can quickly read and extract text from document and image files. This feature is especially useful for capturing text from scanned images of documents, which can then be used to create a searchable PDF or fill form fields within the EHR system. The OCR engine not only reads most Western languages, but also detects and reads several Eastern language characters.

Managing DICOM Files

One of the biggest challenges healthcare organizations face is with managing medical imaging files. When providers need to send X-Rays, MRIs, or CT Scans, they use a standardized file format known as Digital Imaging and Communications in Medicine (DICOM) files. These files are more than just image files, however. They contain extensive datasets that provide a patient’s information along with image pixel data for multi-dimensional medical scans. A DICOM file can be quite large due to the high-resolution image data used by most medical imaging equipment.

Although most EHR systems are capable of transmitting DICOM files (via a DICOM out or DICOM send feature), they usually can’t actually view them in their native format. Since Windows doesn’t recognize them as image files, additional viewing software is typically needed to open and view them. This is why physical storage, like discs and flash drives, are often used to transfer DICOM files along with the necessary viewing software.

ImageGear Medical helps to solve the DICOM dilemma thanks to its extensive conversion and compression capabilities. By decoding the complex data contained within the file, ImageGear Medical can convert DICOM files into image formats that are much easier to view and manage. This is especially useful for smaller practices that don’t have a picture archiving and communication system (PACS) capable of storing, retrieving, distributing, and viewing high-quality medical images. 

Converting DICOM files makes it possible for healthcare professionals to view them on any device connected with their EHR system. That could include tablets or other IoT devices that healthcare technology companies are rolling out to put critical medical data on the front lines of everyday care. Developers can also use ImageGear Medical’s conversion tools to allow their EHR system to share viewable versions of diagnostic scans with patients, allowing practices to make good on the promise of providing patients access to their essential health data at all times. 

The sheer size of DICOM files makes them difficult for many practices to manage. Simply compressing them tends to degrade the image data, which can create significant problems when files are unpacked and opened for viewing. Losing even a small degree of image quality can make it much harder to render an accurate diagnosis. In some cases, poorly designed compression can even make it nearly impossible to uncompress again at all. Thanks to powerful lossless compression technology, ImageGear Medical makes it easier to share medical images between providers without damaging the integrity of the original data.

Expand EHR Capabilities with ImageGear Medical

Accusoft’s imaging, conversion, and compression technology has been supporting the needs of the healthcare industry for decades. As developers work to expand the capabilities of their EHR applications, our engineers are busy improving the medical SDKs that will provide them with the features they need to stand out in a competitive market. 

ImageGear Medical utilizes a combination of efficient code and elegant APIs to deliver the document and image processing tools EHR systems require. For a closer look at this dynamic SDKs capabilities, check out our extensive developer resources today or download a free trial to get started.

OCR API Capabilities

The Accusoft engineering team is always exploring ways to improve PrizmDoc’s document processing capabilities. We regularly consult with our active customers to ensure that we’re focusing on features that will help them push the boundaries of innovation and deliver a better experience to end users.

That’s why we’re excited to talk about PrizmDoc’s new OCR API feature, which allows Independent Software Vendors (ISVs) to tap into the power of Accusoft’s industry-leading optical character recognition technology to enhance their application’s document processing capabilities.

Wait, What Is OCR Again?

Optical Character Recognition (OCR) is a technology that converts different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. At its core, OCR works by analyzing the graphical elements of a document and recognizing the patterns of characters or symbols present in it.

Initially, the OCR software segments the document into elements like lines or words and then further breaks them down into individual characters. Using machine learning and pattern recognition, it then matches these individual graphical components to their corresponding textual elements in a pre-defined character database. This process allows for the extraction of textual data from images, enabling digital storage and efficient searching, which facilitates streamlined management and utilization of information across various sectors.

Benefits of PrizmDoc’s OCR API

Building OCR features into an application is a time-consuming and expensive process. The technology behind OCR is not only quite sophisticated, but it also requires access to complex and evolving language libraries that allow it to identify text accurately. Obtaining the licenses for these libraries, incorporating them into a new OCR solution, and keeping them updated can be a challenge for developers who are unfamiliar with OCR processing.

With PrizmDoc’s OCR API, ISVs can easily incorporate OCR capabilities into their applications with a simple API call. We’re constantly updating our OCR features to add new languages and forms of character recognition, all of which can be rolled directly into software applications as part of the PrizmDoc API integration.

What Makes Accusoft’s OCR Different?

Accusoft has long been an innovator in processing solutions that incorporate OCR technology. Where many solutions offer only full-page recognition, our OCR products support zonal field recognition, which allows applications to focus on predefined form field types to extract key data like names, dates, emails, and identification numbers.

Zonal OCR significantly increases processing speed, allowing applications to extract data from documents more quickly. It also enhances accuracy since the OCR engine is only reading specific areas of the page instead of scanning the entire page.

Of course, if your application needs to OCR an entire page or document, our OCR technology is more than capable of doing so quickly and accurately. We support multiple Western and Eastern languages, including Central European, Cyrillic, Baltic, and Asian characters. You can even set confidence levels for recognition results to incorporate manual reviews into your document process.

Industry Applications of OCR Technology

Fintech Applications

By integrating OCR technology into Fintech applications, financial institutions can automate the extraction of data from physical or digital documents, such as invoices, contracts, and bank statements, eliminating manual entry and reducing errors. This not only saves time but also enhances accuracy and efficiency, facilitating quicker decision-making processes. It can also aid in compliance and auditing tasks by easily retrieving information from a vast array of documents. By incorporating OCR APIs, Fintech applications can significantly enhance the finance industry’s service quality, fostering a more data-driven and customer-centric approach.

Legaltech

When integrated into a Legaltech application, lawyers, paralegals, and other professionals can utilize OCR technology to swiftly convert scanned documents, agreements, and legal briefs into searchable text. This can significantly expedite research and case preparation, allowing legal practitioners to efficiently sift through large volumes of text to locate pertinent information. It also enables the creation of digital databases that can be easily navigated and organized, enhancing the retrieval of case-related documents and fostering a more streamlined approach to legal work, thereby saving time and resources.

Insurtech

For ISVs building solutions to support insurance companies, an OCR API can serve as a pivotal tool in modernizing and streamlining the processing of numerous document types, including claims, policies, and supporting paperwork. It facilitates the quick conversion of scanned documents and images to searchable text formats, which can automate data extraction and reduce manual handling, minimizing the risk of errors and expediting claim processing times. By automating a significant portion of administrative tasks, insurance companies can focus more on developing customer-centric strategies and solutions, fostering greater efficiency and effectiveness within the industry.

Govtech

Governments handle a vast array of documents – from forms and applications to historical records. By implementing OCR technology into a Govtech application, governmental agencies can automate the data extraction process, thereby drastically reducing manual labor and minimizing errors. This makes the archival and retrieval of documents more efficient, fostering transparency and ease of access to public records. Furthermore, OCR can aid in analyzing data from various documents to formulate better policies and decisions based on historical and current data trends. Ultimately, integrating an OCR API can pave the way for more streamlined, cost-effective, and citizen-friendly governmental operations, promoting inclusivity and digital literacy.

Expand Your Application’s Potential with PrizmDoc OCR API

Incorporating advanced OCR capabilities into your application is easier than ever with the release of PrizmDoc’s OCR API feature. To learn more about how you can quickly add full-page and zonal character recognition that supports multiple languages, talk to one of our PrizmDoc experts today.