OCR Library For .NET/C# or C/C++
Add highly-accurate optical character recognition (OCR) to your .NET (C#) or C/C++ application.
The ImageGear OCR SDK is available for multiple platforms and languages including C, C++, C# and other .NET languages on Windows. ImageGear provides full page optical character recognition (OCR) for over 100 languages including both Western and Asian languages such as Chinese, Japanese and Korean. ImageGear’s automatic language detection (.NET OCR library only) feature enables OCR completion.
OCR can be purchased as an add-on to provide a complete document imaging library for your application development. Our C# OCR library:
- Includes over 100 different languages
- Detects and reads Chinese, Korean, and Japanese
- Recognizes characters from multiple languages within a single image
- Samples are provided for C# OCR, VB.NET OCR, C and C++
Full Page OCR
With our auto-zoning and segmentation, your users have the ability to:
- Automatically segment a page into individual zones for processing
- Assign a type to located zones based on flow, table, or graphic
- Detect tables with advanced technology to improve data result reconstruction
- Process an entire image or individual region of the page
- Define zones by a user, loaded from a file, or detected automatically by the engine
Image Pre-Processing for Maximum Accuracy
What happens before OCR? Take a look at the OCR pre-processing steps:
- Advanced image processing methods are available to improve OCR accuracy
- Auto inversion functionality detects if the image needs to be inverted for highest accuracy
- Automatic image orientation detects and adjusts images so they are properly oriented
- Deskew methods detect image misalignment and automatically correct it, improving segmentation and recognition accuracy
- Despeckling methods remove minor dots and imperfections in the image capture process
- Resolution enhancement improves the quality of the low resolution images
Predefined and Customizable Dictionaries
ImageGear’s OCR SDK uses predefined dictionaries and data dictionaries when scanning your document. ImageGear uses advanced spell check for 17 different languages, each in a specific dictionary. Each of the 17 dictionaries contain between 100,000 to 200,000 entries. Vertical dictionaries improve spell checking and OCR accuracy for medical and legal industries. You can even customize validation by defining user dictionaries with values specific to your needs and validate results using regular expressions.
Superior Results Processing
When you get the OCR recognition details, each character is returned with a confidence level to show accuracy. Separate word confidence values provide an additional accuracy indication. Advanced font and location information allows the OCR library to create text representations of the original file with a similar layout.
The ImageGear OCR engine processes all data in a Unicode format. The data output can be formatted for a specific code page with multiple output options such as:
- Image over PDF
- Text-based PDF
- Microsoft Office (Word, Excel, Powerpoint)
OCR Editions: Functionality Options for ImageGear
ImageGear’s OCR library has three different functionality options that you can choose for your website or application. The primary difference between the three options is the output formats created by the OCR engine. The options for your development are as follows:
- Standard Edition
The standard edition creates output formats for Western languages such as English. The standard edition outputs text only files and generates a PDF. The file formats it includes are searchable text PDFs and text documents.
- Standard Plus
The standard plus edition creates formatted outputs for Western languages like English. The formatted output is created with recognition technology that identifies font detail, locates image zones, and recognizes table structure in order to create a representation of the original document. The file formats it includes are Word, Excel, HTML, searchable PDF, and text documents.
The Asian edition creates a formatted output for Asian languages like Chinese, Japanese, and Korean. This formatted output is created with the same recognition technology as the Standard Plus that identifies font detail, locates image zones, and recognizes table structure. It also creates a representation of the original file. Formats include Word, Excel, HTML, searchable PDF, and text documents.
Automatic language detection is currently available only in the .NET OCR SDK.
Over 120 languages supported
Over 120 languages supported, click to expand to see the full list.