Pegasus Imaging Enhances Full-Page Text Recognition with OCR Xpress™ v2 SDK

OCR Xpress is ideal for extracting data from scanned images for indexing, and converting images to searchable image-over-text documents


TAMPA, Fla. - November 19, 2008 - Pegasus Imaging Corporation, the leader in document imaging technology, today announced the release of OCR Xpress version 2. This Software Development Kit (SDK) provides Optical Character Recognition (OCR) for the integration of text recognition and searchable document creation into document imaging, e-discovery, knowledge management, and records management solutions. This major upgrade dramatically improves recognition accuracy and adds a pattern matching engine that finds character patterns for automatic redaction, indexing, or highlighting of phone numbers, social security numbers, and other data. OCR Xpress also includes additional recognition metrics for improved ease of use.

"OCR Xpress is an ideal solution for extracting text from scanned or FAXed documents," said Paul Firth, product manager at Pegasus Imaging. "The recognized text can be stored in a database, used for indexing, or inserted beneath the original document image in a new, searchable PDF file. The pattern matching engine lets you locate text that follows a typical pattern, such as social security, credit card, or phone numbers. You can automatically redact such data, to hide sensitive information. Our government and legal customers are particularly interested in this solution."

OCR Xpress combines multiple recognition engines to yield unmatched accuracy over a wide variety of fonts. It provides superior recognition of damaged characters and employs sophisticated analysis using both dictionaries and character probabilities. Additional recognition metrics include the provision of alternative characters, with relative confidence values for each. This is useful for dictionary-based error correction.

Pattern matching adds further enhancements to the new version of OCR Xpress. Users can find text even if it's not an exact match, using approximate regular expressions. Sample code is provided to search for phone numbers, social security numbers, and dates with ease. Using standards-compliant regular expression syntax, users may create their own search patterns. When found, users may redact, index, highlight, or replace the text.

OCR Xpress is available as a .NET SDK, an ActiveX SDK, and Activities for Windows Workflow Foundation (WF). It delivers:

  1. High-accuracy OCR
  2. Searchable document creation
  3. Full-page text recognition of scanned or FAXed images
  4. Pattern matching for automatic redaction, indexing, or highlighting of phone numbers, social security numbers, and other data
  5. Superior auto binarize, auto rotate, and auto deskew
  6. Support for 13 languages
  7. Preservation of photos and graphics


Pegasus is a registered trademark of Pegasus Imaging Corporation in the United States and/or other countries. OCR Xpress is a trademark of Pegasus Imaging Corporation in the United States and/or other countries. All other marks are the property of their respective owner(s) in the United States and/or other countries.


About Pegasus Imaging

Founded in 1991 and headquartered in Tampa, Florida, Pegasus Imaging Corporation delivers digital imaging software development components, image compression and image editing technologies. The company exceeds speed and quality requirements for document imaging, forms processing, medical imaging, color/photo imaging, video applications and more. Technology is delivered as Microsoft .NET controls, COM controls, Windows Workflow Activity Libraries, DLLs and applications. Multiple 32-bit and 64-bit platforms are supported, including Windows, Linux, Solaris and IBM AIX. Visit www.accusoft.com for more information.