Technical FAQs

Question

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

All PDF objects are based on underlying low-level PDF objects that are not controlled by .NET resource manager and garbage collector. Because of this, each PDF object that is created from scratch should be explicitly disposed of using that object’s Dispose() method.

Also, any ImGearPDEContent object obtained from ImGearPDFPage should be released using the ImGearPDFPage.ReleaseContent() in all cases.

This should cause all temp files to be cleared when the application is closed.

Question

ImageGear .NET v24.6 added support for viewing PDF documents with XFA content. I’m using v24.8, and upon trying to open an XFA PDF, I get a SEHException for some reason…

SEHException

Why might this be happening?

Answer

One reason could be because you need to execute the following lines after initializing the PDF component, and prior to loading an XFA PDF:

// Allow opening of PDF documents that contain XFA form data.
IImGearFormat pdfFormat = ImGearFileFormats.Filters.Get(ImGearFormats.PDF);
pdfFormat.Parameters.GetByName("XFAAllowed").Value = true;

This will enable XFA PDFs to be opened by the ImageGear .NET toolkit.

Question

I am combining multiple PDF documents together, and I need to create a new bookmark collection, placed at the beginning of the new document. Each bookmark should go to a specific page or section of the new document.
Example structure:

  • Section 1
    • Document 1
  • Section 2
    • Document 2

How might I do this using ImageGear .NET?

Answer

You are adding section dividers to the result document. So, for example, if you are to merge two documents, you might have, say, two sections, each with a single document, like so…

  • Section 1
    • Document 1
  • Section 2
    • Document 2

…The first page will be the first header page, and then the pages of Document 1, then another header page, then the pages of Document 2. So, the first header page is at index 0, the first page of Document 1 is at index 1, the second header is at 1 + firstDocumentPageCount, etc.

The following code demonstrates adding some blank pages to igResultDocument, inserting pages from other ImGearPDFDocuments, and modifying the bookmark tree such that it matches the outline above, with "Section X" pointing to the corresponding divider page and "Document X" pointing to the appropriate starting page number…

// Create new document, add pages
ImGearPDFDocument igResultDocument = new ImGearPDFDocument();
igResultDocument.CreateNewPage((int)ImGearPDFPageNumber.BEFORE_FIRST_PAGE, new ImGearPDFFixedRect(0, 0, 300, 300));
igResultDocument.InsertPages((int)ImGearPDFPageNumber.LAST_PAGE, igFirstDocument, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearPDFInsertFlags.DEFAULT);
igResultDocument.CreateNewPage(igFirstDocument.Pages.Count, new ImGearPDFFixedRect(0, 0, 300, 300));
igResultDocument.InsertPages((int)ImGearPDFPageNumber.LAST_PAGE, igSecondDocument, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearPDFInsertFlags.DEFAULT);

// Add first Section
ImGearPDFBookmark resultBookmarkTree = igResultDocument.GetBookmark();
resultBookmarkTree.AddNewChild("Section 1");
var child = resultBookmarkTree.GetLastChild();
int targetPageNumber = 0;
setNewDestination(igResultDocument, targetPageNumber, child);

// Add first Document
child.AddNewChild("Document 1");
child = child.GetLastChild();
targetPageNumber = 1;
setNewDestination(igResultDocument, targetPageNumber, child);

// Add second Section
resultBookmarkTree.AddNewChild("Section 2");
child = resultBookmarkTree.GetLastChild();
targetPageNumber = 1 + igFirstDocument.Pages.Count;
setNewDestination(igResultDocument, targetPageNumber, child);

// Add second Document
child.AddNewChild("Document 2");
child = child.GetLastChild();
targetPageNumber = 2 + igFirstDocument.Pages.Count;
setNewDestination(igResultDocument, targetPageNumber, child);

// Save
using (FileStream stream = File.OpenWrite(@"C:\path\here\test.pdf"))
{
    igResultDocument.Save(stream, ImGearSavingFormats.PDF, 0, 0, igResultDocument.Pages.Count, ImGearSavingModes.OVERWRITE);
}

...

private ImGearPDFDestination setNewDestination(ImGearPDFDocument igPdfDocument, int targetPageNumber, ImGearPDFBookmark targetNode)
{
    ImGearPDFAction action = targetNode.GetAction();
    if (action == null)
    {
        action = new ImGearPDFAction(
            igPdfDocument,
            new ImGearPDFDestination(
                igPdfDocument,
                igPdfDocument.Pages[targetPageNumber] as ImGearPDFPage,
                new ImGearPDFAtom("XYZ"),
                new ImGearPDFFixedRect(), 0, targetPageNumber));
        targetNode.SetAction(action);
    }
    return action.GetDestination();
}

(The setNewDestination method is a custom method that abstracts the details of adding the new destination.)

Essentially, the GetBookmark() method will allow you to get an instance representing the root of the bookmark tree, with its children being subtrees themselves. Thus, we can add a new child to an empty tree, then get the last child with GetLastChild(). Then, we can set the action for that node to be a new "GoTo" action that will navigate to the specified destination. Upon save to the file system, this should produce a PDF with the below bookmark structure…

Bookmarks example

Note that you may need to use the native Save method (NOT SaveDocument) described in the product documentation here in order to save a PDF file with the bookmark tree included. Also, you can read more about Actions in the PDF Specification.

Question

How do I remove XMP Data from my image using ImageGear .NET?

Answer

When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:

ImGearSimplifiedMetadata.Initialize(); 
doc.Metadata.XMP = new ImGearXMPMetadataRoot();

Or, you can traverse through the metadata tree and remove each node from the tree:

// Example code. Not thoroughly tested
private static void RemoveXmp(ImGearMetadataTree tree)
{
ArrayList toRemove = new ArrayList();
foreach (ImGearMetadataNode node in tree.Children)
{
    if (node is ImGearMetadataTree)
        RemoveXmp((ImGearMetadataTree)node);

    if (node.Format != ImGearMetadataFormats.XMP)
        continue;

    toRemove.Add(node);
}

foreach (ImGearMetadataNode node in toRemove)
    tree.Children.Remove(node);
}
Question

Why do I get a “File Format Unrecognized” exception when trying to load a PDF document in ImageGear .NET?

Answer

You will need to set up your project to include PDF support if you want to work with PDF documents. Add a reference to ImageGear24.Formats.Pdf (if you’re using another version of ImageGear, make sure you’re adding the correct reference). Add the following line of code where you specify other resources:

using ImageGear.Formats.PDF;

Add the following lines of code before you begin working with PDFs:

ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat());
ImGearPDF.Initialize();

The documentation page linked here shows how to add PDF support to a project.

Question

When using OCR in ImageGear .NET, is there any way to distinguish between a capital/uppercase letter O and the number 0?

Answer

Not without context or a font that makes the difference clear (such as one with a slashed 0). ImageGear will properly recognize Oliver and 1530 as containing O and 0, respectively, but cannot reliably distinguish it when letters and numbers are mixed. That is, ImageGear may not reliably distinguish between 1ABO0F3 and 1AB0OF3.

ocr optical character recognition

Effective document management is now a top priority for organizations, but for many, it remains a challenge. As noted by recent AIIM survey data, companies are struggling to handle both the documents they have and the rapid uptake of new information. In fact, 43 percent said their biggest priority is effectively leveraging the structured and unstructured content they already have, while 57 percent are focused on understanding the overwhelming big data.  Optical character recognition (OCR) is a critical component of document management.

For software development firms, this poses a particular challenge. Products are no longer feature complete without critical end-user functions such as advanced optical character recognition and powerful search. However, adding this functionality is not as easy as it sounds. Developers building out this comprehensive construct from the ground up requires both time, effort, and continued maintenance, which is a large undertaking for any company.

Accusoft’s ImageGear SDK offers a way to bridge the OCR gap with comprehensive image processing and manipulation capabilities that both streamline software development and deliver on end-user expectations.*


What is ImageGear?

ImageGear easily integrates into existing applications to deliver cutting-edge document management functionality at scale. Available for both .NET and C/C++ frameworks, ImageGear allows developers to quickly deploy and white-label key features including image processing, manipulation, conversion, and PDF and document search.

This add-on OCR functionality delivers highly-accurate optical character recognition to any .NET (C#) or C/C++ application. ImageGear’s OCR add-on provides full-page character recognition for more than 100 languages — including both Western and Asian languages such as Korean, Japanese, and Chinese character sets. It’s capable of recognizing multiple languages within a single image for enhanced document management. Other OCR features include:

  • Automatic page segmentation into individual zones for processing
  • Type assignment per zone based on defined flows, tables, or graphics
  • Table detection with advanced technology to enhance data reconstruction output
  • Entire page or individual region image processing
  • Zone definition by user, existing files, or detected automatically by the OCR engine

In addition, software developers can enhance ImageGear OCR functionality by leveraging both predefined and customizable dictionaries to ensure validated results using regular expressions. 


Why Optical Character Recognition (OCR) Matters to End-Users

Advanced OCR integration makes it easier for end-users to find what they’re looking for, when they’re looking for it. Instead of forcing users to find additional apps that deliver specific services, in-app OCR delivers increased satisfaction by streamlining user search functionality.

Common use cases include:

  • Legal eDiscoveryThe eDiscovery process is a critical — and often complex — stage of legal case preparation. Firms need to quickly find key terms, phrases, and images within legal documents to ensure they meet both client expectations and compliance obligations. With many forms now scanned and stored in non-standard file formats that contain form fields, text boxes, and digital imagery, OCR is essential to help lawyers streamline the process of eDiscovery at scale.

 

  • Financial Document ProcessingClients now expect loan applications and credit card applications to be processed at scale and speed. This is especially critical as firms embrace the idea of remote work — both staff at home and those in the office need end-to-end OCR functionality to deliver complete document management.

 

  • Insurance Documentation Assessment Insurance claims are both complex and comprehensive, requiring complete documentation from clients, contractors, and compliance agencies. As insurance firms move to tech-first frameworks to enhance document processing, speed, and accuracy, OCR makes it easy for staff to find specific data and ensure documentation is complete. 

Integrating OCR

Advanced OCR functionality won’t deliver expected outcomes if integration is cumbersome and complex. ImageGear streamlines this process with easy SDK implementation for both .NET and C/C++.

ImageGear .NET can be easily deployed on multiple platforms. These .NET deployments include ASP.NET functions such as image display, thumbnail display, annotation support, and cloud capture along with WPF printing and annotation support. ImageGear for C/C++, meanwhile, offers support for several platforms as well. Check out the developer resources section to see an updated list.


How Your Clients Use Optical Character Recognition (OCR)

PDFs remain the go-to file format for many industries, offering both standardized image and text conversion along with the ability to easily set or restrict document permissions. The problem? PDFs are notoriously difficult to search, making it hard for end-users to quickly find the text or data they need.

ImageGear makes it easy to OCR PDFs using the ImGearRecPage.Recognize Method, which leverages the zone list of the image to deliver accurate OCR — or, if this list is empty, automatically calls the page-layout decomposition process (auto-zoning) to complete the OCR process.

C# supports OCR to PDF.


using System.IO;
using ImageGear.Core;
using ImageGear.Formats;
using ImageGear.Evaluation;
using ImageGear.Recognition;

namespace ImageGearTest
{
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize evaluation license.
            ImGearEvaluationManager.Initialize();
            ImGearEvaluationManager.Mode = ImGearEvaluationMode.Watermark;

            // Initialize the Recognition Engine.
            ImGearRecognition igRecognition = new ImGearRecognition();

            // ImageGear assemblies require explicit initialization at application startup.
            ImGearCommonFormats.Initialize();

            // Open a FileStream for our output document.
            using (FileStream outputStream = new FileStream(@"c:\temp\outputDoc.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite))
            {
                // Open a FileStream for our source multi-page image.
                using (FileStream multiPageDocument = new FileStream(@"c:\temp\test.tif", FileMode.Open))
                {

                    // Load every page of the multi-page document. Starting at page 0 and loading the range of spaces specified.    
                    // Since the range is -1, that specifies that all pages shall be loaded.     
                    ImGearDocument doc = ImGearFileFormats.LoadDocument(multiPageDocument, 0, -1);

                    // Determine the amount of pages in the multi-page image.
                    int numPages = ImGearFileFormats.GetPageCount(multiPageDocument, ImGearFormats.UNKNOWN);

                    // Recognize each page of the multi-page document and add the results to outputStream.
                    for (int pageNumber = 0; pageNumber < numPages; pageNumber++)
                    {

                        // Cast the current page to a raster page and import that page.
                        using (ImGearRecPage igRecPage = igRecognition.ImportPage((ImGearRasterPage)doc.Pages[pageNumber]))
                        {

                            // Preprocess the page.
                            igRecPage.Image.Preprocess();

                            // Perform recognition.
                            igRecPage.Recognize();

                            // Add OCR results to the outputStream.
                            igRecognition.OutputManager.WriteDirectText(igRecPage, outputStream);

                        }
                    }
                }

            }
            // Dispose of objects we are no longer using.
            igRecognition.Dispose();
        }
    }
}

 


OCR Access and Analysis

Advanced OCR isn’t enough in isolation — developers must also empower end-users to quickly access and analyze OCR output. ImageGear offers multiple options to help streamline this process, such as:

  • Storage of Output as Code Pages
  • Export to Text Format
  • Export to PDF
  • Export to MRC PDF
  • Export to a Formatted Document

Find Your Best Fit

ImageGear OCR makes it easy for end-users to quickly search critical documents, find the data they need, and analyze optical character recognition output, but don’t take our word for it. Seeing is believing. Test ImageGear in your own environment and discover the difference of advanced OCR. 

*Optical character recognition is an ImageGear add-on and must be requested upon purchase of a license.

 

Question

How do I remove XMP Data from my image using ImageGear .NET?

Answer

When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:

ImGearSimplifiedMetadata.Initialize(); 
doc.Metadata.XMP = new ImGearXMPMetadataRoot();

Or, you can traverse through the metadata tree and remove each node from the tree:

// Example code. Not thoroughly tested
private static void RemoveXmp(ImGearMetadataTree tree)
{
ArrayList toRemove = new ArrayList();
foreach (ImGearMetadataNode node in tree.Children)
{
    if (node is ImGearMetadataTree)
        RemoveXmp((ImGearMetadataTree)node);

    if (node.Format != ImGearMetadataFormats.XMP)
        continue;

    toRemove.Add(node);
}

foreach (ImGearMetadataNode node in toRemove)
    tree.Children.Remove(node);
}
Question

Why do I get a “File Format Unrecognized” exception when trying to load a PDF document in ImageGear .NET?

Answer

You will need to set up your project to include PDF support if you want to work with PDF documents. Add a reference to ImageGear24.Formats.Pdf (if you’re using another version of ImageGear, make sure you’re adding the correct reference). Add the following line of code where you specify other resources:

using ImageGear.Formats.PDF;

Add the following lines of code before you begin working with PDFs:

ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat());
ImGearPDF.Initialize();

The documentation page linked here shows how to add PDF support to a project.