Technical FAQs

Question

ImageGear .NET v24.6 added support for viewing PDF documents with XFA content. I’m using v24.8, and upon trying to open an XFA PDF, I get a SEHException for some reason…

SEHException

Why might this be happening?

Answer

One reason could be because you need to execute the following lines after initializing the PDF component, and prior to loading an XFA PDF:

// Allow opening of PDF documents that contain XFA form data.
IImGearFormat pdfFormat = ImGearFileFormats.Filters.Get(ImGearFormats.PDF);
pdfFormat.Parameters.GetByName("XFAAllowed").Value = true;

This will enable XFA PDFs to be opened by the ImageGear .NET toolkit.

developer coding in .NET Core

Is .NET or .NET Core the better bet for application development? Both frameworks are designed and supported by Microsoft and offer the ability to create apps capable of using multiple languages, abilities, and libraries, but they’re not the same. .NET Core is rapidly becoming the language of choice. For most app developers, .NET Core comes out ahead of its .NET counterpart. 

To understand why, let’s break down both .NET and .NET Core basics, examine the rise of .NET Core deployments, and dig into some key .NET Core benefits.


What is .NET?

Originally developed in the late 1990s and known as Next Generation Windows Services (NGWS), .NET reached beta release in 2000. As noted by Microsoft, .NET helps streamline application creation by allowing developers to write apps in C#, F#, or Visual Basic. By using a common language runtime (CLR) and class library, .NET manages system resources such as memory, thread execution, code execution, and security validation. This allows apps written in one language to talk with code written in other languages, significantly reducing the potential for interoperability issues.

Despite the benefits offered by .NET and work by Microsoft to standardize the software stack, concerns emerged among developers around the proprietary nature of .NET, especially in the realm of software patents. Microsoft has since changed its development approach to more closely follow open-source development models.

What is .NET Core?

.NET Core is an evolution of .NET that delivers both cross-code and cross-platform support. While Microsoft designed .NET to support Windows-based applications, .NET Core applications run on Windows, Linux, and macOS. It also uses MIT and Apache 2 licenses to deliver true open-source architecture and ensure consistency across operating environments.

While .NET remains the more popular framework by sheer volume of app development, .NET Core is quickly gaining ground. A recent survey found that .NET Core is now one of the “most loved” frameworks by developers.


Why is .NET Core becoming popular?

So, what’s driving the adoption of the .NET Core framework over .NET? Several factors now contribute to this uptake, including:

  1. Linux & Windows Deployment – One of the biggest reasons .NET Core is gaining popularity is due to its ability to deploy in both Linux and Windows. This provides vast cost savings to development teams.
  2. Run-Time and Performance Improvements  Whether your application users are internal employees or external partners and customers, runtime speed and performance concerns are always critical to adoption. Since Core is the future of the .NET platform, all future performance improvements will be implemented there and most will not be present in .NET Framework. .NET Core enables APIs and applications to serve up application interfaces and data faster, regardless of the mobile, IoT, or desktop device which is interacting with it.
  3. CommunityBecause .NET Core follows an open-source development model, there’s a large (and growing) community of devs and designers now supporting its ongoing evolution, allowing IT teams to leverage prebuilt .NET Core code rather than building their own from scratch.

.NET vs. .NET Framework

Beyond increased popularity driven by the shift in software development and standardization, the .NET Core framework also offers key benefits such as:

  • Ongoing ImprovementsIn September 2019, Microsoft released version 3.0 of .NET Core and followed it up with version 3.1 in November of the same year. In addition, the company plans to release a new version every year and remove the “Core” distinction between the two frameworks to deliver increased interoperability.
  • Improved Security From security-specific APIs to Microsoft’s Secret Manager and the Azure Key Vault Provider along with straightforward support for two-factor authentication (2FA), .NET Core improves application security without increasing complexity.
  • Cloud-Based Development SupportWith cloud now the de facto framework for everything from mobile applications, Internet of Things (IoTs), and responsive web applications, .NET Core makes it easy for developers to design cloud-native applications across multiple platforms and devices.
  • Easy Updates .NET Core updates are easily managed and applied, allowing developers to spend more time building great applications instead of dealing with complex and convoluted service upgrades.
  • Cross-Platform Code .NET Core code can run on not only Windows but Linux platforms, which can help greatly reduce your cloud deployment costs. Combined with Docker, developers are able to leverage a greater amount of modularity and flexibility than ever before.

The Accusoft Advantage

To help companies compete on the cutting edge of app development and integration, Accusoft is making .NET Core available for specific products, starting with Barcode Xpress for .NET Core. With just a few lines of code, companies can deploy multi-platform, open-source support for industry-leading barcode recognition. From reading damaged, broken, and incorrect barcodes to scanning multiple barcodes on one document in milliseconds, Barcode Xpress makes it possible to easily integrate cutting-edge barcode functionality into any application across any platform. 

Several of our other SDKs are on deck for .NET Core capabilities. While .NET broke new ground for interoperable frameworks, .NET Core offers the next iteration of interoperable development with cloud-based, cross-platform support for applications at scale. Learn more about our .NET Core plans here.

convert excel pdf

Companies have a love/hate relationship with PDFs. While Adobe’s portable file format has been around for decades and remains one of the most popular document types available, some of its best features are overshadowed by frustration around conversion. Faced with a barrage of read-only PDF files or looking for ways to ensure the integrity of critical document data, you can spend significant time and effort searching for the ideal PDF converter application.   This is particularly true when trying to convert Excel to PDF.

In some cases, this means ignoring IT best practices to leverage web-based “convert PDF free” tools that offer the benefit of speed, but could introduce potential security risk. In others, you might opt for large-scale document solutions that make the process of PDF conversion cumbersome and complex.

As noted by recent research from Deloitte, shifting market trends make both approaches problematic. Consider converting a familiar spreadsheet format — Excel — into PDF. What should be a simple task is often torturous and time-consuming and can significantly impact staff productivity. Let’s break down this situation further. In this blog, we’ll explore the operational impact of PDFs, consider the case for conversion, assess the spreadsheet-specific situation, and offer a step-by-step solution for potential PDF permutations.

 


The History of the PDF

  • A quick search turns up multiple articles for and against the use of PDFs for business documents. Detractors cite the sometimes cumbersome process of converting and modifying this format, while electronic evangelists focus on the consistency of content across PDF files. To understand the impact of PDFs, let’s take a quick historical detour. First developed in 1991 by Adobe co-founder Dr. John Warnock, the Camelot Project focused on document consistency across user, location, and device. By 1992, Camelot became PDF and introduced two key features that keep it front-and-center for businesses:
    • Preservation PDFs are designed to preserve all data in the original file in its original format. As a result, any content — from text to graphics to spreadsheets — remains consistent when converted to PDF.
    • StandardizationNot only do PDFs meet ISO 32000 standards for electronic document exchange, the format also includes specialty standards such as PDF/A for archiving, PDF/X for printing and PDF/E for engineering.

 


The Case for Conversion

While preservation and standardization speak to the benefits of PDF creation, why do so many companies prioritize conversion? First is the read-only nature of basic PDF files. Consider documents that contain customers’ personally identifiable information (PII) or employees’ HR data. Demands for intra-company interoperability mean these documents are often widely distributed across multiple departments and even outside the organization.

Storage is also a key consideration. While many files — including Excel spreadsheets — can quickly balloon in size as data volumes increase, compression comes standard with PDFs. This permits greater storage with a smaller footprint to help maximize the capacity of local storage infrastructure.

 


The Situation with Spreadsheets

Spreadsheets offer a specific situation for PDF conversion. With spreadsheets often the standard format for financial reporting and offering critical functionality for structured data analysis, Excel files are everywhere. The challenge? Ensuring the right people can access the right data at the right time — with the right context. Consider spreadsheets sent from a desktop to a mobile device that isn’t equipped with the same office software. What appears as tidy rows and columns on a computer monitor may be a contextually convoluted mess on mobile devices, forcing you to work against existing formats rather than finding common function. 

Excel to PDF conversion offers three benefits to help solve the spreadsheet situation:

  • Format Persistence  — From standard spreadsheets to charts and graphs, the original format of Excel files is maintained in PDF. As a result, recipients don’t need specific office software to read Excel documents — in-app or online PDF readers are the only requirement.
  • Content Curation With the right PDF conversion tools, staff can easily choose what to share and how to share it. From converting entire documents to specific pages, making comments, or adding redactions, sharing is secure and simple.
  • Password ProtectionSpeaking of security, PDFs also permit password protection for both access and editing. This both reduces the risk of unintended access and ensures that only authorized personnel can alter spreadsheet data.

The Market for Modification

Given the popularity of PDFs and the potential benefits of effective conversion, it’s no surprise that the market for modification is rapidly diversifying. From lightweight applications that allow users to convert PDFs for free to online PDF converters, there are now multiple options to make the move from spreadsheet files to portable document formats. The challenge? Finding your best fit. For example, while free online tools offer the benefit of quick conversion, they introduce potential security issues if spreadsheets are converted outside the confines of local networks. 

Robust and reliable options from well-known providers, meanwhile, offer ways to maximize security without losing speed. Solutions like Accusoft’s ImageGear integrates alongside your existing applications, allowing document conversion under the auspices of local networks, while the PrizmDoc Cloud Conversion API lets you leverage the power of cloud resources customized to meet your needs. Even better? Start converting PDFs for free right now with an ImageGear trial or 300 free transactions in the Accusoft Cloud.  

 


A Step-by-Step Guide: How to Convert an Excel File to PDF

Ready to start converting spreadsheets with us? It’s easy. If you’re using the PrizmDoc Cloud Conversion API, easy is the operative word. Simply select your source format, pick the pages you want to convert, and then define your destination format. Need pages 1-5 of your XLS document in a PDF? No problem. Looking to merge multiple pages into a single document? We’ve got you covered.

If SDKs are more your style, there’s a simple, step-by-step process to convert Excel files into PDFs:

Step 1: Create an instance of Microsoft Excel format after initializing ImageGear.NET

In C#:


ImGearFileFormats.Filters.Add(ImGearOffice.CreateExcelFormat());

 

Step 2: Modify the open dialog box to accept *.xlsx and *.xls extensions.

In C#


 // After installation make sure you are including the following using statements
 using ImageGear.Formats.PDF;
 using ImageGear.Formats;
 using ImageGear.Formats.Office;
 using ImageGear.Core;
 using System.IO;
 using ImageGear.Evaluation;
            
// If you are evaluating our product, initialize the evaluation license
 ImGearEvaluationManager.Initialize();
 
 // After some initializations, load the necessary ImGear filters to create an instance 
 // of Microsoft Word format for input and an instance of PDF format for output using 
 // code that looks like:    
 ImGearFileFormats.Filters.Add(ImGearOffice.CreateExcelFormat());
 ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat());
 
 // Next, the PDF library requires its own initialization:
 ImGearPDF.Initialize();
 
 // Then, simply read in all pages of the Word document using the 
 // ImGearFileFormats.LoadDocument() method:
 ImGearDocument igDocument;
 using (FileStream fileStream = new FileStream(inputFileName, FileMode.Open, FileAccess.Read, FileShare.Read))
 {
     igDocument = ImGearFileFormats.LoadDocument(fileStream);
 }
 
 // Finally, write out the document as PDF using the ImGearFileFormats.SaveDocument() 
 // method with the saving format set to ImGearSavingFormats.PDF and no special options:
 using (FileStream fileStream = new FileStream(outputFileName, FileMode.Create, FileAccess.ReadWrite))
 {
      ImGearFileFormats.SaveDocument(igDocument, fileStream, 0, ImGearSavingModes.OVERWRITE, ImGearSavingFormats.PDF, null);
 }

Ready to accelerate output and improve productivity? Keep conversion close to home with ImageGear, or opt for secure operational outsourcing with the PrizmDoc Cloud Conversion API.

Today’s organizations are inundated with a variety of document and image formats on a regular basis. By integrating comprehensive PDF functionality into their applications, developers can provide the tools to manage those files much more easily. Converting files into PDFs makes them easier to share, modify, and annotate without having to worry about compatibility issues across applications.

Simply converting documents or images into searchable PDF files is easy enough, but in many cases, several files need to be merged into a single document or one large file must be split into multiple documents. Accusoft’s ImageGear SDK gives applications the ability to process PDFs programmatically, allowing users to quickly prepare documents for viewing and collaboration.

How to Merge PDF Files with ImageGear Using C#

ImageGear can merge two multi-page PDF documents into a single document. This is especially useful for organizations that have multiple files associated with the same workflow or account, such as loan applications or medical records. The following steps will walk you through the merge PDF process using ImageGear.NET in C#.

Step 1: Initialize PDF Support

Before getting started, you’ll need to initialize PDF support within ImageGear.NET (if you haven’t done so already during deployment). This initialization will allow your application to load, save, and process PDF files.

After creating a new “Console Application” and adding the required assembly reference and resources, you can use the following code snippet to load and save PDF files.

 

using System.IO;
using ImageGear.Formats;
using ImageGear.Formats.PDF;
using ImageGear.Evaluation;
namespace MyPDFProject
{
    class Program
    {
        public void Initialize()
        {
            // Initialize evaluation license.
            ImGearEvaluationManager.Initialize();
            ImGearEvaluationManager.Mode = ImGearEvaluationMode.Watermark;
            // Initialize common formats.
            ImGearCommonFormats.Initialize();
            // Add support for PDF files.
            ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat());
            ImGearPDF.Initialize();
        }
        public void Terminate()
        {
            // Dispose of support for PDF files.
            ImGearPDF.Terminate();
        }
        public void LoadAndSave(string fileIn, string fileOut)
        {
            ImGearPDFDocument igPDFDocument = null;
            try
            {
                // Load the PDF document.
                using (FileStream inStream = new FileStream(fileIn, FileMode.Open))
                    igPDFDocument = (ImGearPDFDocument)ImGearFileFormats.LoadDocument(inStream, 0, (int)ImGearPDFPageRange.ALL_PAGES);
                // Save the PDF document to a new file.
                ImGearPDFSaveOptions pdfOptions = new ImGearPDFSaveOptions();
                using (FileStream outStream = new FileStream(fileOut, FileMode.Create))
                    ImGearFileFormats.SaveDocument(igPDFDocument, outStream, 0, ImGearSavingModes.OVERWRITE, ImGearSavingFormats.PDF, pdfOptions);
            }
            finally
            {
                igPDFDocument?.Dispose();
            }
        }
        static void Main(string[] args)
        {
            Program myProgram = new Program();
            myProgram.Initialize();
            myProgram.LoadAndSave(@"C:\PATHTOPDF\FILENAME.pdf", @"C:\PATHTOPDF\NEWNAME.pdf");
            myProgram.Terminate();
        }
    }
}

       

 

Step 2: Set the Merge PDF Parameters

You will need to determine what order the documents will be combined in and set the page numeration for the new document.

Step 3: Merge the PDF Documents

Once you’ve identified the files you want to merge and the order they should go in, you can use the “MergePdfDocuments” command to assemble the new PDF file. Here’s what the code snippet looks like in C#:

 

        // Merges two PDF document into a third PDF document.
        public void Merge(string fileInFirst, string fileInSecond, string fileOut)
        {
            // ImageGear uses zero-based page numbers.
            const int FIRST_PAGE_INDEX = 0;
            ImGearPDFDocument igPDFDocumentFirst = null;
            ImGearPDFDocument igPDFDocumentSecond = null;
            ImGearPDFDocument igPDFDocumentResult = null;
            try
            {
                // Load the source PDF documents.
                using (FileStream inStream = new FileStream(fileInFirst, FileMode.Open))
                    igPDFDocumentFirst = (ImGearPDFDocument)ImGearFileFormats.LoadDocument(inStream, 0, (int)ImGearPDFPageRange.ALL_PAGES);
                using (FileStream inStream = new FileStream(fileInSecond, FileMode.Open))
                    igPDFDocumentSecond = (ImGearPDFDocument)ImGearFileFormats.LoadDocument(inStream, 0, (int)ImGearPDFPageRange.ALL_PAGES);
                // Create the resulting PDF document.
                igPDFDocumentResult = new ImGearPDFDocument();
                // Copy all pages of first document into resulting PDF document.
                for (int pageIndex = FIRST_PAGE_INDEX; pageIndex < igPDFDocumentFirst.Pages.Count; pageIndex++)
                    igPDFDocumentResult.Pages.Add(igPDFDocumentFirst.Pages[pageIndex].Clone());
                // Copy all pages of second document into resulting PDF document.
                for (int pageIndex = FIRST_PAGE_INDEX; pageIndex < igPDFDocumentSecond.Pages.Count; pageIndex++)
                    igPDFDocumentResult.Pages.Add(igPDFDocumentSecond.Pages[pageIndex].Clone());
                // Save the resulting PDF document to a new file.
                ImGearPDFSaveOptions pdfOptions = new ImGearPDFSaveOptions();
                using (FileStream outStream = new FileStream(fileOut, FileMode.Create))
                    ImGearFileFormats.SaveDocument(igPDFDocumentResult, outStream, 0, ImGearSavingModes.OVERWRITE, ImGearSavingFormats.PDF, pdfOptions);
            }
            finally
            {
                igPDFDocumentFirst?.Dispose();
                igPDFDocumentSecond?.Dispose();
                igPDFDocumentResult?.Dispose();
            }
        }
        static void Main(string[] args)
        {
            Program myProgram = new Program();
            myProgram.Initialize();
            myProgram.Merge(@"C:\PATHTOPDF\FIRSTFILENAME.pdf", @"C:\PATHTOPDF\SECONDFILENAME.pdf", @"C:\PATHTOPDF\NEWNAME.pdf");
            myProgram.Terminate();
        }

 

After merging your files into a new document, you can begin working with the resulting PDF using ImageGear’s other PDF features:

How to Split PDF Files with ImageGear Using C#

While the merge PDF command is used to combine multiple documents into a new, single PDF file, the split PDF command saves pages from an existing document as a separate document. A three-page PDF file, for instance, can be broken into three, single-page PDF documents.

Once you’ve initialized PDF support for ImageGear.NET, you can split a PDF document by following a few simple steps:

Step 1: Read the PDF into a System.IO.Stream Object

This allows ImageGear to read the stream using ImGearFileFormats.LoadDocument(FILE NAME) command.

Step 2: Determine PDF Page Count

The ImGearPDFDocument.Pages property provides access to the document’s page array, which can then be used to assemble a new document.

Step 3: Create a New PDF Document

The ImGearPDFDocument object can be used to create an empty PDF document that will serve as the destination file for the split pages.

Step 4: Insert Pages into the New Document

The InsertPages command takes specific pages from the source document (the PDF you’re splitting), and inserts them into the destination document. After the pages are inserted, you can save the new PDF to disk or memory. Keep in mind that the original document will still contain all pages, so splitting it into two documents will require you to create two new documents.

Here is a what splitting a single PDF document into several single-page PDF documents looks like in C#:

 

public void Split(string fileIn, string directoryOut)
        {
            // ImageGear uses zero-based page numbers.
            const int FIRST_PAGE = 0;
            // Ensure output directory exists.
            if (!System.IO.Directory.Exists(directoryOut))
                Directory.CreateDirectory(directoryOut);
            // Load the source PDF document.
            using (FileStream inStream = new FileStream(fileIn, FileMode.Open))
            {
                using (ImGearPDFDocument igPDFDocument = ImGearFileFormats.LoadDocument(inStream, FIRST_PAGE, (int)ImGearPDFPageRange.ALL_PAGES) as ImGearPDFDocument)
                {
                    // Write each page in source PDF document to a separate PDF file.
                    for (int pageIndex = FIRST_PAGE; pageIndex < igPDFDocument.Pages.Count; pageIndex++)
                    {
                        // Construct the output filepath.
                        string outputFileName = string.Format("{0}_{1}.pdf", Path.GetFileNameWithoutExtension(fileIn), pageIndex + 1);
                        string outputPath = System.IO.Path.Combine(directoryOut, outputFileName);
                        // Create a new empty PDF document.
                        using (ImGearPDFDocument igPDFDocumentResult = new ImGearPDFDocument())
                        {
                            // Insert page into new PDF document.
                            igPDFDocumentResult.InsertPages((int)ImGearPDFPageNumber.BEFORE_FIRST_PAGE, igPDFDocument, pageIndex, 1, ImGearPDFInsertFlags.DEFAULT);
                            // Save new PDF document to file.
                            igPDFDocumentResult.Save(outputPath, ImGearSavingFormats.PDF, FIRST_PAGE, FIRST_PAGE, 1, ImGearSavingModes.OVERWRITE);
                        }
                    }
                }
            }
        }
        static void Main(string[] args)
        {
            Program myProgram = new Program();
            myProgram.Initialize();
            myProgram.Split(@"C:\PATHTOPDF\FILENAME.pdf", @"C:\PATHTOPDF\OUTPUTDIRECTORY");
            myProgram.Terminate();
        }

 

Enhance Your PDF Capabilities with ImageGear

Accusoft’s ImageGear SDK provides a broad range of document and image processing functions beyond the ability to split and merge PDFs. Whether you need powerful file conversion capabilities, multi-language OCR support, or image cleanup, correction, and transformation functions, ImageGear integrations can enhance your application’s performance and versatility. 

Learn more about the ImageGear collection of SDKs and see how they can help you shorten your development cycle and get your innovative products to market faster.

ocr optical character recognition

Effective document management is now a top priority for organizations, but for many, it remains a challenge. As noted by recent AIIM survey data, companies are struggling to handle both the documents they have and the rapid uptake of new information. In fact, 43 percent said their biggest priority is effectively leveraging the structured and unstructured content they already have, while 57 percent are focused on understanding the overwhelming big data.  Optical character recognition (OCR) is a critical component of document management.

For software development firms, this poses a particular challenge. Products are no longer feature complete without critical end-user functions such as advanced optical character recognition and powerful search. However, adding this functionality is not as easy as it sounds. Developers building out this comprehensive construct from the ground up requires both time, effort, and continued maintenance, which is a large undertaking for any company.

Accusoft’s ImageGear SDK offers a way to bridge the OCR gap with comprehensive image processing and manipulation capabilities that both streamline software development and deliver on end-user expectations.*


What is ImageGear?

ImageGear easily integrates into existing applications to deliver cutting-edge document management functionality at scale. Available for both .NET and C/C++ frameworks, ImageGear allows developers to quickly deploy and white-label key features including image processing, manipulation, conversion, and PDF and document search.

This add-on OCR functionality delivers highly-accurate optical character recognition to any .NET (C#) or C/C++ application. ImageGear’s OCR add-on provides full-page character recognition for more than 100 languages — including both Western and Asian languages such as Korean, Japanese, and Chinese character sets. It’s capable of recognizing multiple languages within a single image for enhanced document management. Other OCR features include:

  • Automatic page segmentation into individual zones for processing
  • Type assignment per zone based on defined flows, tables, or graphics
  • Table detection with advanced technology to enhance data reconstruction output
  • Entire page or individual region image processing
  • Zone definition by user, existing files, or detected automatically by the OCR engine

In addition, software developers can enhance ImageGear OCR functionality by leveraging both predefined and customizable dictionaries to ensure validated results using regular expressions. 


Why Optical Character Recognition (OCR) Matters to End-Users

Advanced OCR integration makes it easier for end-users to find what they’re looking for, when they’re looking for it. Instead of forcing users to find additional apps that deliver specific services, in-app OCR delivers increased satisfaction by streamlining user search functionality.

Common use cases include:

  • Legal eDiscoveryThe eDiscovery process is a critical — and often complex — stage of legal case preparation. Firms need to quickly find key terms, phrases, and images within legal documents to ensure they meet both client expectations and compliance obligations. With many forms now scanned and stored in non-standard file formats that contain form fields, text boxes, and digital imagery, OCR is essential to help lawyers streamline the process of eDiscovery at scale.

 

  • Financial Document ProcessingClients now expect loan applications and credit card applications to be processed at scale and speed. This is especially critical as firms embrace the idea of remote work — both staff at home and those in the office need end-to-end OCR functionality to deliver complete document management.

 

  • Insurance Documentation Assessment Insurance claims are both complex and comprehensive, requiring complete documentation from clients, contractors, and compliance agencies. As insurance firms move to tech-first frameworks to enhance document processing, speed, and accuracy, OCR makes it easy for staff to find specific data and ensure documentation is complete. 

Integrating OCR

Advanced OCR functionality won’t deliver expected outcomes if integration is cumbersome and complex. ImageGear streamlines this process with easy SDK implementation for both .NET and C/C++.

ImageGear .NET can be easily deployed on multiple platforms. These .NET deployments include ASP.NET functions such as image display, thumbnail display, annotation support, and cloud capture along with WPF printing and annotation support. ImageGear for C/C++, meanwhile, offers support for several platforms as well. Check out the developer resources section to see an updated list.


How Your Clients Use Optical Character Recognition (OCR)

PDFs remain the go-to file format for many industries, offering both standardized image and text conversion along with the ability to easily set or restrict document permissions. The problem? PDFs are notoriously difficult to search, making it hard for end-users to quickly find the text or data they need.

ImageGear makes it easy to OCR PDFs using the ImGearRecPage.Recognize Method, which leverages the zone list of the image to deliver accurate OCR — or, if this list is empty, automatically calls the page-layout decomposition process (auto-zoning) to complete the OCR process.

C# supports OCR to PDF.


using System.IO;
using ImageGear.Core;
using ImageGear.Formats;
using ImageGear.Evaluation;
using ImageGear.Recognition;

namespace ImageGearTest
{
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize evaluation license.
            ImGearEvaluationManager.Initialize();
            ImGearEvaluationManager.Mode = ImGearEvaluationMode.Watermark;

            // Initialize the Recognition Engine.
            ImGearRecognition igRecognition = new ImGearRecognition();

            // ImageGear assemblies require explicit initialization at application startup.
            ImGearCommonFormats.Initialize();

            // Open a FileStream for our output document.
            using (FileStream outputStream = new FileStream(@"c:\temp\outputDoc.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite))
            {
                // Open a FileStream for our source multi-page image.
                using (FileStream multiPageDocument = new FileStream(@"c:\temp\test.tif", FileMode.Open))
                {

                    // Load every page of the multi-page document. Starting at page 0 and loading the range of spaces specified.    
                    // Since the range is -1, that specifies that all pages shall be loaded.     
                    ImGearDocument doc = ImGearFileFormats.LoadDocument(multiPageDocument, 0, -1);

                    // Determine the amount of pages in the multi-page image.
                    int numPages = ImGearFileFormats.GetPageCount(multiPageDocument, ImGearFormats.UNKNOWN);

                    // Recognize each page of the multi-page document and add the results to outputStream.
                    for (int pageNumber = 0; pageNumber < numPages; pageNumber++)
                    {

                        // Cast the current page to a raster page and import that page.
                        using (ImGearRecPage igRecPage = igRecognition.ImportPage((ImGearRasterPage)doc.Pages[pageNumber]))
                        {

                            // Preprocess the page.
                            igRecPage.Image.Preprocess();

                            // Perform recognition.
                            igRecPage.Recognize();

                            // Add OCR results to the outputStream.
                            igRecognition.OutputManager.WriteDirectText(igRecPage, outputStream);

                        }
                    }
                }

            }
            // Dispose of objects we are no longer using.
            igRecognition.Dispose();
        }
    }
}

 


OCR Access and Analysis

Advanced OCR isn’t enough in isolation — developers must also empower end-users to quickly access and analyze OCR output. ImageGear offers multiple options to help streamline this process, such as:

  • Storage of Output as Code Pages
  • Export to Text Format
  • Export to PDF
  • Export to MRC PDF
  • Export to a Formatted Document

Find Your Best Fit

ImageGear OCR makes it easy for end-users to quickly search critical documents, find the data they need, and analyze optical character recognition output, but don’t take our word for it. Seeing is believing. Test ImageGear in your own environment and discover the difference of advanced OCR. 

*Optical character recognition is an ImageGear add-on and must be requested upon purchase of a license.

 

share confidential documents

Data privacy continues to be a significant concern for businesses, employees, customers, and stakeholders alike. Privacy breaches can expose problems with document management and digital document security practices. They can also pose significant risks and costs to companies and stakeholders.  The importance of ensuring the secure sharing of confidential documents can’t be stressed enough.

When developing an application with SDKs or APIs or integrating new features into a workflow, developers must be aware of the security risks. Project managers, security engineers, and architects must work in tandem to identify and address all potential security breaches. This holds especially true for commercially-confidential, highly-sensitive, or private documents while in transit.

The Risks of Document Sharing

Document sharing, in general, can present opportunities for malicious actors to attempt to gain access to a competitor’s documents. It could also pave the way for uploading data containing malware accidentally. Protecting the enterprise as a whole should be a priority to prevent loss or compromise of customer-sensitive information. This is vital because even minor damage to a company’s reputation can have a devastating impact. 

When building applications with document sharing capabilities, developers need to think about the inherent risks that come along with allowing users access to upload and edit documents. Fortunately, there are a number of practical steps that developers can take to share sensitive documents securely without putting confidential information or mission-critical data at risk. 

5 Ways to Ensure Confidential Documents Are Shared Securely

1. Strengthen Application Security

Any conversation about document security needs to start with a focus on the application’s cybersecurity architecture. If document management software contains multiple vulnerabilities or doesn’t provide the necessary controls to safeguard data, it will be difficult to share sensitive documents securely. Here are a few best practices developers should have in place to create a secure application ecosystem: 

  • Perform threat-modeling any time there is a major design change in the application or ecosystem to identify potential new threats.
  • Encrypt customer sensitive documents both in transit and in storage. Ideally, the keys will be held by clients with an emergency access vault backup system, so that even the software developer cannot access any sensitive customer data. This way, even if an application or data centers are breached, customer documents will still be protected.
  • Spend more time testing releases for weaknesses and allow security engineers and architects to weigh in on the product feature roadmap. Security patches and improvements should be given the same value as other new product features.
  • Conduct periodic audits or external penetration testing to ensure that applications and customer data cannot be compromised.

2. Design Applications with Segregated Access

Secure documents and sensitive information should only be available to the people authorized to view or edit it. Access to one document should not allow someone to access other documents stored in the same application. By segregating access to data and assigning specific user permissions, developers can provide the tools customers need to manage their assets and share sensitive documents securely.

3. Eliminate External Viewing Dependencies

Although many organizations use secure applications to manage their document workflows, they frequently open themselves up to risk by relying on external software for document viewing. Without some way of sharing and viewing documents within the application itself, files will inevitably be shared over email and opened on local devices that may not have the latest security updates in place. Developers can avoid this problem by integrating HTML5 viewing capabilities into their application. This ensures that documents never have to leave a secure environment, even when they’re being shared with people outside an organization.

4. Create Unique Viewing Sessions

One of the challenges with many cloud-based document management systems is that once someone is granted access to a file, they typically retain that access until it is manually changed at a later date. In most instances, those privileges are also associated with the source file itself. This can create a number of security gaps if an organization doesn’t closely monitor access privileges. By implementing an HTML5 viewer that can generate unique viewing sessions for individual users, developers can provide more control over how to share confidential documents. Viewing sessions can be set to expire after use, and since the session is viewing a rendered version of the document instead of the source document itself, system administrators have more control over what aspects of it are shared. They may decide, for instance, to share only certain pages rather than the entire document.

5. Implement Redaction Capabilities

Redaction has long been used to protect private or confidential information in documents. Although organizations still frequently make embarrassing mistakes when it comes to redaction, it remains one of the most effective tools for anyone who needs to share sensitive documents securely. By integrating true redaction capabilities that not only obscure, but also completely remove sensitive information, developers provide applications that have the ability to screen documents for privacy risks before they’re shared with anyone. Performing redactions within the application environment also has the benefit of further limiting external dependencies that could threaten security.

Protect Confidential Documents with Accusoft Integrations

Accusoft’s collection of processing integrations give developers with a variety of document management tools for controlling privacy within their applications. The HTML5 capabilities of PrizmDoc Viewer offer powerful redaction tools and make it easier for administrators to control viewing access. 

To learn more about how Accusoft SDKs and APIs can provide the document management features you need to protect confidential information and privacy, visit our products page today or talk to one of our integration specialists.

Accusoft Banner

Introduction

A large government agency with millions of employees needed to manage personnel records and payment information while also making documents and images easily accessible for all users. Each document trail spanned the duration of the employee’s time with the agency, resulting in a repository containing hundreds of millions of documents. The existing system used a combination of custom-developed, Windows-based image viewers that were cumbersome, difficult to maintain, and presented security risks. 

Overview

As the system struggled to keep up with demand, employees became increasingly frustrated with the system. Rather than viewing documents within the system, they began using workarounds like printing out files or saving them to another device, both of which presented serious security risks since the records contained personally identifiable information. The agency needed a solution that could manage hundreds of millions of documents while still providing users with simple, secure, and quick access to personnel records and payment information. 

Although the document repository’s legacy viewing solution was no longer able to meet the agency’s needs, developing a new document viewer was prohibitively expensive and required expertise its IT team did not possess. Rather than building a solution from scratch, it made more sense to evaluate viewing integrations available on the market. The agency set out to find a viewer that could support multiple document types, including TIFF and PDF/A, for both viewing and archival purposes. Whatever solution the IT team chose to adopt would have to integrate smoothly into the existing legacy system to avoid a costly and time-consuming overhaul. 

Challenges

After evaluating several options, the agency turned to Accusoft’s PrizmDoc® for Java, formerly VirtualViewer®. As a Java-based viewer that uses HTML5 to render files for viewing, PrizmDoc® for Java would allow the document repository to access files quickly and easily. PrizmDoc® for Java’s extensive file format support allowed the agency to render PDF, Word, Excel, AFP, DWG, TIFF, and more in one universal viewer. The integration’s built-in library eliminated the need for a third-party application or additional license, further simplifying the agency’s document management processes. 

Another key priority was giving employees the ability to access documents from multiple devices. In the past, the system’s legacy viewer only allowed them to open and view documents on a desktop with the right software installed. To streamline the employee experience, the agency required a viewer that could be accessed from any device that supports a web browser, including tablets and smartphones. PrizmDoc® for Java’s true cross-platform support and complete mobility provided the flexibility the agency needed to access documents from anywhere, at any time. The integration’s HTML5 technology and Java-based viewing allowed users to view and manipulate files within the browser, eliminating the need for an external application. 

PrizmDoc® for Java’s quick installation and integration process made it the ideal solution for the agency. Installing PrizmDoc® for Java’s document viewer took less than 10 minutes for proof of concept (POC) testing on any desktop, laptop, or virtual machine. The integration also provided APIs and developer tools to make integrating and leveraging the technology simple and easy, including RESTful content handlers that allowed for a more flexible development and deployment process.

Results

PrizmDoc® for Java’s high-speed viewing for large files allowed document rendering and processing to be split between the server and browser, delivering an extremely high-speed response. The integration’s advanced features, including annotation, redaction, splitting, merging, and more, provided far more functionality than was possible under the legacy viewer. PrizmDoc® for Java’s robust thumbnail panels also simplified working on large documents using full-panel thumbnail displays, as well as the option to adjust thumbnail size. 

By implementing PrizmDoc® for Java, the government agency was able to provide more streamlined access to personnel and payroll records, saving employees and HR personnel time and money while eliminating security and management issues associated with the previous viewing solution. Millions of users are now able to view critical information related to their records via any device that has access to a web browser. PrizmDoc® for Java’s robust document support and easy-to-use interface made it an essential tool for the agency’s document management and collaboration processes. The agency was able to continue using their existing repository system while also gaining the benefits of a modern, user-friendly document viewer.

About Accusoft

Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including Docubee, PrizmDoc Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.  

barcode xpress pricing

 

When you’re looking to integrate a toolkit into your application, your first priority is assessing the capabilities. Once you’ve found the right fit, you need to make sure that the pricing matches your budget and makes sense for your needs.

Historically, Accusoft has offered a variety of different licensing options for our clients. Products were always licensed by core or number of installs, depending on which runtime fits best. 

These practices have been great for our clients so far, but we realized a missing piece that could help bridge the gap for new clients with smaller projects that dynamically scale based on workload.


Introducing a New Way to License

Barcode Xpress is the first product on the Accusoft line to offer metered licensing based on transaction. Metered licensing helps a variety of different clients scale up or down, when they need to. While our traditional licensing plan is great for projects with defined scope, smaller projects that have potential to grow need more flexible pricing options.

“The main reason we wanted to introduce metered licensing is to address SDKs used in scalable environments,” says Mark Hansen, Product Manager of SDKs. “Clients can use metered licensing to spin up their projects based on the traffic they’re getting, without the need to obtain additional licenses. While many companies opt for an ongoing usage, this new model allows for estimating a specific amount of use, so they only pay for what they need.”


The Benefit of Metered Licensing

Why would a company choose to specify the usage from the start? Metered licensing is great for short-term projects with a limited lifespan, as clients only pay for what they use. It’s also great for clients who don’t know the scale of their project and need to be cautious with how much they spend. 

Metered licensing is based on a pay-per-transaction model. A transaction is defined by the page. For example, if Barcode Xpress reads 30 pages with five barcodes on each page, the transaction total is 30 instead of 150.

This transaction-based model makes for a very low cost of entry, and a no-stress agreement. With this value in mind, clients don’t have to worry about how much they will use the integration after a specific project is complete. In addition, if the project grows, metered licensing allows the client to scale up.


Reliability of Metered Licensing

Accusoft understands that our customers are building mission-critical apps with our SDKs and has carefully architected the system to be extremely reliable. Our server-side authorization and reporting system runs in Amazon AWS and is split into online and offline parts. 

The online portion is built on AWS CloudFront and S3, two of the most reliable systems available on the Internet. All of Accusoft’s backend code is in the offline portion of the system and an outage in that portion of the system will have no effect whatsoever on our customers, even if it were down for several days. Nevertheless, we have 24/7 alerting in place so we will be notified soon after an outage occurs and will quickly get it back up.

barcode xpress pricing

Metered licensing is now available for Barcode Xpress 13.1 and following versions.

convert pdf

PDFs are everywhere. Vice calls them “the world’s most important file format,” and that’s not far off the mark. The sheer number of documents converted to, from, and often back to PDFs is astounding. The hard truth? They’re also frustrating to work with. Start a Google search with the word “convert” and three of the top five results involve PDFs. 

While this portable document format lives up to its namesake by making it easy for users to attach and send documents across their organizations, PDFs often run into problems when it comes to conversion, collaboration, and communication. While many tools offer piecemeal PDF functionality, they lack a complete cadre of critical capabilities, in turn forcing software engineers to use multiple software solutions for seemingly simple tasks. 

ImageGear offers a different take on the standard software development kit (SDK) designed to help developers maximize their PDF potential. Here’s how it works. 


The Value of PDF Conversion

While PDF conversion is one of the top sought-after functionalities, there’s another area that’s often overlooked: modifying the characteristics of PDFs on-screen. With companies now handling PDFs from multiple sources that may include everything from computer-generated form data to handwritten information and images, it’s no surprise that staff encounter a wide variety of viewing issues.

ImageGear PDF helps solve these problems by allowing users to call the shots on PDF content at scale with features such as:

  • Conversion
  • Metadata Management
  • Content and Font Editing
  • Text Extraction
  • PDF Watermarking
  • Container, Dictionary, and Layer Creation
  • 3D Asset Modification

ImageGear PDF also helps improve document processing with document cleanup and advanced optical character recognition (OCR). With the ability to encrypt and decrypt entire images (or part of an image), automatic ImageClean correction of white text blocks, borders, and inverted images, plus intelligent re-sizing, any PDF can be cleaned and made more readable for the user. 

OCR support for almost any document type is also a benefit. This includes those produced on typewriters, dot-matrix printers, ink-jet printers, laser printers, and photocopied, scanned, and faxed documents. ImageGear PDF helps users control and customize multiple PDF variables, making it a fully functional PDF conversion solution for your application.


PDF Pain Points

One of the biggest PDF frustrations? The inability to break apart and combine PDF documents. Let’s imagine you have a massive legal PDF or in-depth medical file. In these circumstances, professionals only need a portion of the PDF, but without the right tools they’re stuck sending entire files when all they need is a single page. In other cases, employees might have a host of related PDFs that are part of the same project, but can’t be easily combined to save space and time.

ImageGear PDF has you covered with the ability to easily delete or insert PDF pages, render pages in a single PDF, split a PDF, merge two or more PDFs into a single file, or even merge specific pages from two or more PDFs into a single PDF. This not only makes a massive difference in time spent working with PDF documents, it helps reduce unnecessary storage and transmission of multiple files. 


Convert PDF: Multiple File Formats for Conversion

Conversion is critical for PDF success. Instead of creating complexity by forcing end-users to stick with original file formats, implementing an SDK with cutting-edge conversion empowers corporate consistency and saves on storage space. ImageGear PDF supports a host of common file formats for conversion including Microsoft Office, JPEG 2000, CAD, and SVG.

Of course, no feature forward PDF framework is complete without robust annotation, redaction, and commenting capabilities. These features make it easy for other users to see exactly what’s been changed, when, and why, along with providing a critical, auditable paper trail to meet evolving compliance and regulatory standards.


PDF Functionality for Your Application

Best of all, ImageGear isn’t designed to replace your current software, but integrate alongside existing workflows. Rather than adding another application to already-overloaded IT arsenals, straightforward SDK integration means everything happens within your own application, making it easy for everyone to find exactly what they’re looking for within familiar territory. Need help jumpstarting your SDK deployment? Check out our full list of ImageGear .NET samples for ASP.NET, CAD, OCR support, and more.

PDFs remain eternally popular and continually frustrating. Solve for document viewing, split and merge, and conversion issues and streamline employee efforts with ImageGear.

OCR form

An automated forms processing solution can significantly improve accuracy and efficiency when it comes to managing large quantities of documents containing structured content. Whether an organization needs to digitize existing records or is continuously processing new documents within application workflows, having a versatile optical character recognition (OCR) component working to identify and extract text from multiple languages allows them to capture data more effectively.  Solid OCR form capture is critical.

Although a good OCR engine operates quickly and efficiently, the process of recognizing and extracting text is a highly complex undertaking that can be impacted by a variety of factors. Under optimal conditions, for example, the OCR component within Accusoft’s FormSuite can generate results quickly and accurately, with the ability to read several languages from around the world. However, if an application’s forms processing workflow is not set up efficiently or overlooks a few important considerations, recognition performance may suffer in terms of speed and accuracy.

6 Ways to Achieve the Best Results with the Accusoft OCR Component in FormSuite

 

1. Pay Attention to Image Resolution

As a general rule, OCR components should be provided with high resolution images so the recognition engine is able to distinguish the details that would otherwise be missed on low resolution images. This helps them to recognize the differences between “l” and “i” or “O” and “0” (zero), which results in better, more accurate results.

However, there could be a problem if the image resolution is too high. These images require much more time to process without delivering any benefits since the required letter properties are clearly distinguishable in a lower resolution.

To strike a balance between speed and accuracy, it’s better to scan all images in a 150-400 dots-per-pixel range. This allows the recognition engine to identify all possible letter properties and avoid being bogged down with analyzing a lot of data at the same time.

2. Don’t Lose Image Properties While Preparing to Recognize

To achieve the best results, it’s important to provide the recognition engine with a few helpful hints. In some cases, resolution properties may be lost while an image is being prepared for recognition, leading to worse than expected results. This happens most frequently when working with System.Drawing.Image or SystemDrawing.Bitmap classes directly during operations like clipping, merging, or reducing the bit depth. 

In this case, the best solution is to make sure that HorizontalResolution and VerticalResolution properties are set correctly and reflect initial image resolution values. The ScanFix component within FormSuite can perform this task automatically and is designed to be compatible with the OCR component to help achieve better recognition results.

3. Clean Up Underlined Text Before Recognition

Specks, dirt, and other imperfections within the source image can significantly reduce recognition quality. Sometimes, however, even a seemingly good image can be recognized incorrectly when there are underlined words like URLs, emails, or specifically formatted generic text. 

From the software’s point of view, this kind of text isn’t very different from other types of image distortion. ScanFix’s LineRemovalOptions can clean up the text by eliminating lines that could interfere with recognition. The API also features special parameters that ensures characters with low hanging elements (such as “j” or “y”) will be restored after line removal to avoid another potential recognition problem.

4. Use Long-Living Objects to Avoid Recognition Performance Drop

Creating a new instance requires OCR engine initialization and loading neural network data suitable for specific recognition parameters. This process is not resource free because of the data complexity and may cause delays from ~200 msec to 2 sec depending on the hardware and recognition properties. 

Existing Accusoft OCR instances may be reused to recognize other images with different properties. This will speed up the overall process because initialization will be done only once during the first AnalyzeField call and subsequent calls will be much cheaper in terms of computing resources.

5. Assign Instances to Their Own Worker Threads

Objects are thread safe and can be called from different threads. However, assigning an object to its own thread can avoid extra locking. One of the simplest ways to do this is to use C# Parallel.ForEach loop and create ConcurrentQueue with the pre-allocated objects. 

This ensures that the number of threads will not exceed the number of available CPUs. Any available instance can then be automatically assigned to recognize the images in their own thread while extra possible threads will wait until busy instances will be free to acquire.

Other common patterns are producer-consumer and map-reduce, which are more complex to implement but provide better flexibility when managing input data.

6. Dispose Objects to Avoid Memory High Memory Consumption

This is a generic rule for the C# to call a Dispose for the objects which use non-managed resources. FormSuite’s OCR component uses an external recognition engine, so it is highly recommended to call Dispose when the instance will not be required anymore. This can avoid a situation where the memory will not be available for different parts of the application, especially when a high amount of data exists for post-processing or the amount of available memory is low because of the different processes running in parallel.

Get Accurate OCR Data Capture Results with FormSuite

When properly configured and incorporated into a forms processing workflow, the FormSuite OCR component can accelerate automated data capture and reduce manual errors. Its zonal field recognition capabilities allow it to hone in on predefined field types to improve processing speed and accuracy. Developers can also adjust confidence values for recognition results to determine how frequently manual review is necessary. 

To get a hands-on look at how FormSuite incorporates OCR seamlessly into its collection of forms processing tools, schedule a free trial today.

Question

I changed the value of viewingSessionTimeout

Answer

If you are using Windows, the value that you are looking to modify is the viewing.sessionLifetime in the central config file prizm-services-config.yml located in the root of the PrizmDoc installation directory (C:\Prizm on Windows, /usr/share/prizm on Linux).

Make sure it is uncommented and without any leading whitespace.

Additionally, please make sure the viewing.cacheLifetime is greater than the viewing.sessionLifetime value.

More information can be found here.

You must restart the PrizmDoc service in order for your changes to take affect.