ImageGear Features: OCR

ImageGear has a variety of features and functionalities for all of our platforms. See what options are available in your preferred platform.

Laptop

OCR

OCR Library For .NET/C# or C/C++

Add highly-accurate optical character recognition (OCR) to your .NET (C#) or C/C++ application.

The ImageGear OCR SDK is available for multiple platforms and languages including C, C++, C# and other .NET languages on Windows. ImageGear provides full page optical character recognition (OCR) for both Western and Asian languages such as Chinese, Japanese and Korean. ImageGear’s automatic language detection (.NET OCR library only) feature enables OCR completion.

OCR can be purchased as an add-on to provide a complete document imaging library for your application development. Our C# OCR library:

  • Detects and reads Chinese, Korean, and Japanese
  • Recognizes characters from multiple languages within a single image
  • Samples are provided for C# OCR, VB.NET OCR, C and C++

 

Full Page OCR

With our auto-zoning and segmentation, your users have the ability to:

  • Automatically segment a page into individual zones for processing
  • Process an entire image or individual region of the page
  • Define zones by a user, loaded from a file, or detected automatically by the engine

 

Image Pre-Processing for Maximum Accuracy

What happens before OCR? Take a look at the OCR pre-processing steps:

  • Advanced image processing methods are available to improve OCR accuracy
  • Auto inversion functionality detects if the image needs to be inverted for highest accuracy
  • Automatic image orientation detects and adjusts images so they are properly oriented
  • Deskew methods detect image misalignment and automatically correct it, improving segmentation and recognition accuracy
  • Despeckling methods remove minor dots and imperfections in the image capture process

 

Superior Results Processing

When you get the OCR recognition details, each character is returned with a confidence level to show accuracy. Separate word confidence values provide an additional accuracy indication. Advanced font and location information allows the OCR library to create text representations of the original file with a similar layout.

The ImageGear OCR engine processes all data in a Unicode format. The data output can be formatted for a specific code page with multiple output options such as:

  • Image over PDF
  • Text-based PDF
  • XML