OCR SDK Features

OCR Xpress provides users with all the features required to quickly add OCR to your applications. The OCR library provides accurate, full-page recognition results with a variety of output options to suit your application's needs. The SDK is packaged to allow you to get up to speed quickly through use of helpfiles and samples, and the simple API allows for rapid integration. The OCR engine supports a comprehensive list of languages.

Available for Node.js
Available for Java on 64-bit Linux
Available for C/C++ on 64-bit Linux
Share: Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInEmail this to someone

speed and ease of use

Simple, straightforward setup, with a clean, easy-to-use API for quick integration into your applications. High-level API allows developers to easily convert an image to text or searchable PDF with only nine lines of code. For more complex implementations, developers can easily access the recognized document information in data structures, such as page, paragraph, text line, and character data.

accuracy

Convert images into text and reduce manual data entry with high levels of accuracy that meet or exceed industry best standards. Automatically detect page orientation and recognize text correctly. Confidence values are returned with each character, enabling you to check your extraction results.

full page optical character recognition

Convert full-page and multi-page images to text output. Allows output to multi-page, searchable PDF files.

automatic image and text segmentation

Compatible with various page layouts, including interspersed photos and graphics within text. Image-over-text PDF output has searchable text that aligns with the text on the image, and adjusts for varying font sizes on the page.

generate text and PDF files

For images to be searchable, export to PDF image-over-text documents. Export to text files for use in updating metadata or tag data for image based documents in your ECM system.

output data structures

Access results through an output structure that provides information about layout, content, and recognition confidence. Developers can access the text at multiple levels including a text line, a word within that line, or a character within that line or word

supported languages for OCRXpress

Arabic
Bulgarian
Catalan
Chinese (Simplified)
Chinese (Traditional)
Croatian
Czech
Danish
Danish Fraktur script
Dutch
English
German
German Fraktur script
Greek
Finnish
French
Hebrew
Hindi
Hungarian
Indonesian
Italian

Japanese
Korean
Latvian
Lithuanian
Norwegian
Polish
Portuguese
Romanian
Russian
Serbian
Slovak
Slovak Fraktur script
Slovenian
Spanish
Swedish
Tagalog
Thai
Turkish
Ukrainian
Vietnamese