Achieving the Best Results with the Accusoft OCR Component in FormSuite
An automated forms processing solution can significantly improve accuracy and efficiency when it comes to managing large quantities of documents containing structured content. Whether an organization needs to digitize existing records or is continuously processing new documents within application workflows, having a versatile optical character recognition (OCR) component working to identify and extract text from multiple languages allows them to capture data more effectively.
Although a good OCR engine operates quickly and efficiently, the process of recognizing and extracting text is a highly complex undertaking that can be impacted by a variety of factors. Under optimal conditions, for example, the OCR component within Accusoft’s FormSuite can generate results quickly and accurately, with the ability to read several languages from around the world. However, if an application’s forms processing workflow is not set up efficiently or overlooks a few important considerations, recognition performance may suffer in terms of speed and accuracy.
6 Ways to Achieve the Best Results with the Accusoft OCR Component in FormSuite
1. Pay Attention to Image Resolution
As a general rule, OCR components should be provided with high resolution images so the recognition engine is able to distinguish the details that would otherwise be missed on low resolution images. This helps them to recognize the differences between “l” and “i” or “O” and “0” (zero), which results in better, more accurate results.
However, there could be a problem if the image resolution is too high. These images require much more time to process without delivering any benefits since the required letter properties are clearly distinguishable in a lower resolution.
To strike a balance between speed and accuracy, it’s better to scan all images in a 150-400 dots-per-pixel range. This allows the recognition engine to identify all possible letter properties and avoid being bogged down with analyzing a lot of data at the same time.
2. Don’t Lose Image Properties While Preparing to Recognize
To achieve the best results, it’s important to provide the recognition engine with a few helpful hints. In some cases, resolution properties may be lost while an image is being prepared for recognition, leading to worse than expected results. This happens most frequently when working with System.Drawing.Image or SystemDrawing.Bitmap classes directly during operations like clipping, merging, or reducing the bit depth.
In this case, the best solution is to make sure that HorizontalResolution and VerticalResolution properties are set correctly and reflect initial image resolution values. The ScanFix component within FormSuite can perform this task automatically and is designed to be compatible with the OCR component to help achieve better recognition results.
3. Clean Up Underlined Text Before Recognition
Specks, dirt, and other imperfections within the source image can significantly reduce recognition quality. Sometimes, however, even a seemingly good image can be recognized incorrectly when there are underlined words like URLs, emails, or specifically formatted generic text.
From the software’s point of view, this kind of text isn’t very different from other types of image distortion. ScanFix’s LineRemovalOptions can clean up the text by eliminating lines that could interfere with recognition. The API also features special parameters that ensures characters with low hanging elements (such as “j” or “y”) will be restored after line removal to avoid another potential recognition problem.
4. Use Long-Living Objects to Avoid Recognition Performance Drop
Creating a new instance requires OCR engine initialization and loading neural network data suitable for specific recognition parameters. This process is not resource free because of the data complexity and may cause delays from ~200 msec to 2 sec depending on the hardware and recognition properties.
Existing Accusoft OCR instances may be reused to recognize other images with different properties. This will speed up the overall process because initialization will be done only once during the first AnalyzeField call and subsequent calls will be much cheaper in terms of computing resources.
5. Assign Instances to Their Own Worker Threads
Objects are thread safe and can be called from different threads. However, assigning an object to its own thread can avoid extra locking. One of the simplest ways to do this is to use C# Parallel.ForEach loop and create ConcurrentQueue with the pre-allocated objects.
This ensures that the number of threads will not exceed the number of available CPUs. Any available instance can then be automatically assigned to recognize the images in their own thread while extra possible threads will wait until busy instances will be free to acquire.
Other common patterns are producer-consumer and map-reduce, which are more complex to implement but provide better flexibility when managing input data.
6. Dispose Objects to Avoid Memory High Memory Consumption
This is a generic rule for the C# to call a Dispose for the objects which use non-managed resources. FormSuite’s OCR component uses an external recognition engine, so it is highly recommended to call Dispose when the instance will not be required anymore. This can avoid a situation where the memory will not be available for different parts of the application, especially when a high amount of data exists for post-processing or the amount of available memory is low because of the different processes running in parallel.
Get Accurate OCR Data Capture Results with FormSuite
When properly configured and incorporated into a forms processing workflow, the FormSuite OCR component can accelerate automated data capture and reduce manual errors. Its zonal field recognition capabilities allow it to hone in on predefined field types to improve processing speed and accuracy. Developers can also adjust confidence values for recognition results to determine how frequently manual review is necessary.
To get a hands-on look at how FormSuite incorporates OCR seamlessly into its collection of forms processing tools, schedule a free trial today.