Technical FAQs

Question 1

Why do I get a “Non-supported resolution” exception when trying to import a page into ImageGear's recognition engine?

Answer

Question

I encounter an Unhandled Exception error, as shown below, in ImageGear when trying to load a page into the recognition engine.

Error Message: An unhandled exception of type
‘ImageGear.Core.ImGearException’ occurred in ImageGear22.Core.dll

Additional information: IMG_DPI_WARN (0x4C711): Non-supported
resolution. Value1:0x4C711

What is causing this and how can I fix it?

Answer

This is probably because the original image used to create the page didn’t have a Resolution Unit set.

To fix this, check if the page has a Resolution Unit set. If it does not, set it to inches. You should also set the DPI of the image as those values were probably not carried over from the original image since the Resolution Unit wasn’t set. The following code demonstrates how to do this.

// Open file and load page.
using (var inStream = new FileStream(@"C:\Path\To\InputImage.jpg", FileMode.Open, FileAccess.Read, FileShare.Read))
{
    // Load first page.
    ImGearPage igPage = ImGearFileFormats.LoadPage(inStream, firstPage);

    if (igPage.DIB.ImageResolution.Units == ImGearResolutionUnits.NO_ABS)
    {
        igPage.DIB.ImageResolution.Units = ImGearResolutionUnits.INCHES;
        igPage.DIB.ImageResolution.XNumerator = 300;
        igPage.DIB.ImageResolution.XDenominator = 1;
        igPage.DIB.ImageResolution.YNumerator = 300;
        igPage.DIB.ImageResolution.YDenominator = 1;
    }

    using (var outStream = new FileStream(@"C:\Path\To\OutputImage.jpg", FileMode.OpenOrCreate, FileAccess.ReadWrite))
    {
        // Import the page into the recognition engine.
        using (ImGearRecPage recognitionPage = recognitionEngine.ImportPage((ImGearRasterPage)igPage))
        {
            // Preprocess the page.
            recognitionPage.Image.Preprocess();

            // Perform recognition.
            recognitionPage.Recognize();

            // Write the page to the output file.
            recognitionEngine.OutputManager.DirectTextFormat = ImGearRecDirectTextFormat.SimpleText;
            recognitionEngine.OutputManager.WriteDirectText(recognitionPage, outStream);
        }
    }
}

View Detail >

Question 2

How do I remove XMP Data from my image using ImageGear .NET?

Answer

Question

How do I remove XMP Data from my image using ImageGear .NET?

Answer

When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:

ImGearSimplifiedMetadata.Initialize(); 
doc.Metadata.XMP = new ImGearXMPMetadataRoot();

Or, you can traverse through the metadata tree and remove each node from the tree:

// Example code. Not thoroughly tested
private static void RemoveXmp(ImGearMetadataTree tree)
{
ArrayList toRemove = new ArrayList();
foreach (ImGearMetadataNode node in tree.Children)
{
    if (node is ImGearMetadataTree)
        RemoveXmp((ImGearMetadataTree)node);

    if (node.Format != ImGearMetadataFormats.XMP)
        continue;

    toRemove.Add(node);
}

foreach (ImGearMetadataNode node in toRemove)
    tree.Children.Remove(node);
}

View Detail >

Question 3

How do I remove XMP Data from my image using ImageGear .NET?

Answer

Question

How do I remove XMP Data from my image using ImageGear .NET?

Answer

When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:

ImGearSimplifiedMetadata.Initialize(); 
doc.Metadata.XMP = new ImGearXMPMetadataRoot();

Or, you can traverse through the metadata tree and remove each node from the tree:

// Example code. Not thoroughly tested
private static void RemoveXmp(ImGearMetadataTree tree)
{
ArrayList toRemove = new ArrayList();
foreach (ImGearMetadataNode node in tree.Children)
{
    if (node is ImGearMetadataTree)
        RemoveXmp((ImGearMetadataTree)node);

    if (node.Format != ImGearMetadataFormats.XMP)
        continue;

    toRemove.Add(node);
}

foreach (ImGearMetadataNode node in toRemove)
    tree.Children.Remove(node);
}

View Detail >

Question 4

In ImageGear, why am I running into AccessViolationExceptions when I run my application in parallel?

Answer

Question

In ImageGear, why am I running into AccessViolationExceptions when I run my application in parallel?

Answer

This issue can sometimes occur if ImGearPDF is being initialized earlier in the application. In order to use ImGearPDF in a multi-threaded program, it needs to be initialized on a per-thread basis. For example, if you have something like this:

ImGearPDF.Initialize();
Parallel.For(...)
{ 
    // OCR code
}
ImGearPDF.Terminate();

Change it to this:

Parallel.For(...)
{
    ImGearPDF.Initialize();
    // OCR code
    ImGearPDF.Terminate();
}

The same logic applies to other ImageGear classes, such as ImGearPage instances or the ImGearRecognition class – you should create one instance of each class per thread, rather than creating a single instance and accessing it across threads. In the case of the ImGearRecognition class, you’ll have to use the createUnique parameter to make that possible e.g.:

ImGearRecognition recEngine = ImGearRecognition(true);

instead of

ImGearRecognition recEngine = ImGearRecognition();

View Detail >

Question 5

How can I re-order PDF pages using ImageGear .NET?

Answer

Question

I want to re-arrange the page order of a PDF. I’ve tried the following…

var page = imGearDocument.Pages[indx].Clone();

imGearDocument.Pages.RemoveAt(indx); //// Exception: "One or more pages are in use and could not be deleted."

imGearDocument.Pages.Insert(newIndx, page);

But an exception is thrown. Somehow, even though the page was cloned, the exception states that the page can’t be removed because it’s still in use.

What am I doing wrong here?

Answer

If you’re using an older version of ImageGear .NET, you may run into this exception when you clone the page. Some of the resources between the original and the clone are still shared, which is why this happens.

Starting with ImageGear .NET v24.8, this no longer happens, and the above code should work fine.

If you still need to use the earlier version, you can use the InsertPages method instead.

View Detail >

Question 6

How do I insert or modify PDF Bookmarks using ImageGear .NET?

Answer

Question

I am combining multiple PDF documents together, and I need to create a new bookmark collection, placed at the beginning of the new document. Each bookmark should go to a specific page or section of the new document.
Example structure:

Section 1
- Document 1
Section 2
- Document 2

How might I do this using ImageGear .NET?

Answer

You are adding section dividers to the result document. So, for example, if you are to merge two documents, you might have, say, two sections, each with a single document, like so…

Section 1
- Document 1
Section 2
- Document 2

…The first page will be the first header page, and then the pages of Document 1, then another header page, then the pages of Document 2. So, the first header page is at index 0, the first page of Document 1 is at index 1, the second header is at 1 + firstDocumentPageCount, etc.

The following code demonstrates adding some blank pages to igResultDocument, inserting pages from other ImGearPDFDocuments, and modifying the bookmark tree such that it matches the outline above, with "Section X" pointing to the corresponding divider page and "Document X" pointing to the appropriate starting page number…

// Create new document, add pages
ImGearPDFDocument igResultDocument = new ImGearPDFDocument();
igResultDocument.CreateNewPage((int)ImGearPDFPageNumber.BEFORE_FIRST_PAGE, new ImGearPDFFixedRect(0, 0, 300, 300));
igResultDocument.InsertPages((int)ImGearPDFPageNumber.LAST_PAGE, igFirstDocument, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearPDFInsertFlags.DEFAULT);
igResultDocument.CreateNewPage(igFirstDocument.Pages.Count, new ImGearPDFFixedRect(0, 0, 300, 300));
igResultDocument.InsertPages((int)ImGearPDFPageNumber.LAST_PAGE, igSecondDocument, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearPDFInsertFlags.DEFAULT);

// Add first Section
ImGearPDFBookmark resultBookmarkTree = igResultDocument.GetBookmark();
resultBookmarkTree.AddNewChild("Section 1");
var child = resultBookmarkTree.GetLastChild();
int targetPageNumber = 0;
setNewDestination(igResultDocument, targetPageNumber, child);

// Add first Document
child.AddNewChild("Document 1");
child = child.GetLastChild();
targetPageNumber = 1;
setNewDestination(igResultDocument, targetPageNumber, child);

// Add second Section
resultBookmarkTree.AddNewChild("Section 2");
child = resultBookmarkTree.GetLastChild();
targetPageNumber = 1 + igFirstDocument.Pages.Count;
setNewDestination(igResultDocument, targetPageNumber, child);

// Add second Document
child.AddNewChild("Document 2");
child = child.GetLastChild();
targetPageNumber = 2 + igFirstDocument.Pages.Count;
setNewDestination(igResultDocument, targetPageNumber, child);

// Save
using (FileStream stream = File.OpenWrite(@"C:\path\here\test.pdf"))
{
    igResultDocument.Save(stream, ImGearSavingFormats.PDF, 0, 0, igResultDocument.Pages.Count, ImGearSavingModes.OVERWRITE);
}

...

private ImGearPDFDestination setNewDestination(ImGearPDFDocument igPdfDocument, int targetPageNumber, ImGearPDFBookmark targetNode)
{
    ImGearPDFAction action = targetNode.GetAction();
    if (action == null)
    {
        action = new ImGearPDFAction(
            igPdfDocument,
            new ImGearPDFDestination(
                igPdfDocument,
                igPdfDocument.Pages[targetPageNumber] as ImGearPDFPage,
                new ImGearPDFAtom("XYZ"),
                new ImGearPDFFixedRect(), 0, targetPageNumber));
        targetNode.SetAction(action);
    }
    return action.GetDestination();
}

(The setNewDestination method is a custom method that abstracts the details of adding the new destination.)

Essentially, the GetBookmark() method will allow you to get an instance representing the root of the bookmark tree, with its children being subtrees themselves. Thus, we can add a new child to an empty tree, then get the last child with GetLastChild(). Then, we can set the action for that node to be a new "GoTo" action that will navigate to the specified destination. Upon save to the file system, this should produce a PDF with the below bookmark structure…

Note that you may need to use the native Save method (NOT SaveDocument) described in the product documentation here in order to save a PDF file with the bookmark tree included. Also, you can read more about Actions in the PDF Specification.

View Detail >

Question 7

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

Question

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

All PDF objects are based on underlying low-level PDF objects that are not controlled by .NET resource manager and garbage collector. Because of this, each PDF object that is created from scratch should be explicitly disposed of using that object’s Dispose() method.

Also, any ImGearPDEContent object obtained from ImGearPDFPage should be released using the ImGearPDFPage.ReleaseContent() in all cases.

This should cause all temp files to be cleared when the application is closed.

View Detail >

Question 8

MRC Compression with ImageGear

Answer

As more and more companies embrace digitization of paper documents, they’re looking for ways to minimize the virtual storage space necessary for these new files. They’re already benefiting from increased office space and improved workflow efficiency, but they also need data compression techniques to make access to digital information quicker and less costly.

One standard method is mixed raster content compression (MRC), which greatly reduces the size of stored PDF documents. Since the vast majority of data being digitized and archived is in PDF format, MRC compression is an ideal tool for converting high volumes of paper documents into compressed, searchable files.

MRC: The Basics

Mixed raster compression works by breaking down an image (typically a full page of a PDF document) into its various components: mostly text and color image files. This is done in order to utilize the most thorough compression algorithm available for each part of the page.

As a result, even though the color images are highly compressed into their core elements (background, color, grayscale, etc.) they retain vivid detail and can be displayed subsequently with essentially no loss of quality, and any text from the original page remains searchable.

Putting High-Grade Compression to Use

A real estate firm, for instance, can use MRC compression to efficiently store a database of home listings, each with potentially dozens of color images and a searchable street address, with little to no compromise in file fidelity.

Consider these numbers: MRC can compress a 20 MB .TIFF color image file (which is part of a source PDF) to anywhere between 96 and 170 KB, based on the level of compression desired. This range suggests, of course, a tradeoff between file size and image quality, but is a significant reduction from the source image in any case – with differences in the displayed file barely discernible.

The most recent MRC tools make its integration into existing applications simple as well, usually involving just adding a few lines of code to these make such dramatic compression ratios a reality. For all the talk of companies creating fully digitized records archives, MRC compression is one technique that’s making it more feasible than ever.

Let Us Help

If you’d like to learn more about mixed raster content compression, including how our ImageGear features highly refined MRC functionality, contact us and our experts will get in touch. We’re here to help developers make applications that put organizations on the forefront of digital content management.

View Detail >

Question 9

What dependencies are required to register the PDF component of ImageGear for C and C++ ActiveX?

Answer

Question

I am trying to deploy my ImageGear Pro ActiveX project and am receiving an error stating

The module igPDF18a.ocx failed to load

when registering the igPDF18a.ocx component. Why is this occurring, and how can I register the component correctly?

Answer

To Register your igPDF18a.ocx component you will need to run the following command:

regsvr32 igPDF18a.ocx

If you receive an error stating that the component failed to load, then that likely means that regsvr32 is not finding the necessary dependencies for the PDF component.

The first thing you will want to check is that you have the Microsoft Visual C++ 10.0 CRT (x86) installed on the machine. You can download this from Microsoft’s site here:

https://www.microsoft.com/en-us/download/details.aspx?id=5555

The next thing you will want to check for is the DL100*.dll files. These files should be included in the deployment package generated by the deployment packaging wizard if you included the PDF component when generating the dependencies. These files must be in the same folder as the igPDF18a.ocx component in order to register it.

With those dependencies, you should be able to register the PDF component with regsvr32 without issue.

View Detail >

Question 10

Tutorial: Create Your First ImageGear for Java PDF Project

Answer

In this tutorial, you’ll learn how to configure a Java project for a console application. You’ll also learn how to open a PDF and save as a new file.

1. Make sure that you have installed JDK and ImageGear for Java PDF properly. See System Requirements and Installation. You will need to copy a sample.pdf file inside the directory where you will be creating the tutorial sample.
2. Create a new Java file, and name it (e.g., MyFirstIGJavaPDFProject.java). Insert the following code there:

import com.accusoft.imagegearpdf.*;
public class MyFirstIGJavaPDFProject
{
   private PDF pdf;
   private Document document;
   static
   {
       System.loadLibrary("IgPdf");
   }
   // Application entry point.
   public static void main(String[] args)
   {
       boolean linearized = false;
       String inputPath = "sample.pdf";
       String outputPath = "sample_output.pdf";;
       MyFirstIGJavaPDFProject app = new MyFirstIGJavaPDFProject();
       app.loadAndSave(inputPath, outputPath, linearized);
   }
   // Load and save the PDF file.
   private void loadAndSave(String inputPath, String outputPath, boolean linearized)
   {
       try
       {
           this.initializePdf();
           this.openPdf(inputPath);
           this.savePdf(outputPath, linearized);
       }
       catch (Throwable ex)
       {
           System.err.println("Exception: " + ex.toString());
       }
       finally
       {
           this.terminatePdf();
       }
   }
   // Initialize the PDF session.
   private void initializePdf()
   {
       this.pdf = new PDF();
       this.pdf.initialize();
   }
   // Open input PDF document.
   private void openPdf(String inputPath)
   {
       this.document = new Document();
       this.document.openDocument(inputPath);
   }
   // Save PDF document to the output path.
   private void savePdf(String outputPath, boolean linearized)
   {
       SaveOptions saveOptions = new SaveOptions();
       // Set LINEARIZED attribute as provided by the user.
       saveOptions.setLinearized(linearized);
       this.document.saveDocument(outputPath, saveOptions);
   }
   // Close the PDF document and terminate the PDF session.
   private void terminatePdf()
   {
       if (this.document != null)
       {
           this.document.closeDocument();
           this.document = null;
       }
       if (this.pdf != null)
       {
           this.pdf.terminate();
           this.pdf = null;
       }
   }
}

Now, let’s go over some of the important areas in the sample code with more detail. The com.accusoft.imagegearpdf namespace:
- Allows you to load and save native PDF documents
- Allows rasterization of PDF pages by converting them to bitmaps and adding raster pages to a PDF document
- Provides multi-page read and write support for the entire document
To enable the com.accusoft.imagegearpdf namespace in your project, specify the following directive:
```
import com.accusoft.imagegearpdf.*;
```
To initialize and support processing of PDF files we need:
```
// Initialize the PDF session.
private void initializePdf()
{
   this.pdf = new PDF();
   this.pdf.initialize();
}
```
There is one main object that is used in this sample code: The Document that holds the entire loaded document.
```
private Document document;
…
this.document = new Document();
this.document.openDocument(inputPath);
```

You can save the loaded document using:

SaveOptions saveOptions = new SaveOptions();
// Set LINEARIZED attribute as provided by the user.
saveOptions.setLinearized(linearized);
this.document.saveDocument(outputPath, saveOptions);

See About Linearized PDF Files for more information.

Now, you can build and run your sample. Please make sure that you have a PDF file named sample.pdf in the same directory where your sample source resided, or change the inputPath in the sample code so that it points to any existing PDF file.
Now, open the terminal in the directory containing your source file and run the following commands:
1. Compile
```
javac -classpath $HOME/Accusoft/ImageGearJavaPDF1-64/java/IgPdf.jar MyFirstIGJavaPDFProject.java
```
  After running this command, you should see a file named MyFirstIGJavaPDFProject.class in your current directory.
2. Build
```
jar cfe MyFirstIGJavaPDFProject.jar MyFirstIGJavaPDFProject MyFirstIGJavaPDFProject.class
```
  After running this command, you should see a file named MyFirstIGJavaPDFProject.jar in your current directory.
3. Set the environment variable (you only have to do this one time)
```
export LD_PRELOAD=$HOME/Accusoft/ImageGearJavaPDF1-64/lib/libIGCORE18.so
```
4. run
```
java -classpath $HOME/Accusoft/ImageGearJavaPDF1-64/java/IgPdf.jar:. MyFirstIGJavaPDFProject
```

After running your sample, you should see a new PDF file, named sample_output.pdf, in your current directory.

View Detail >

Question 11

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

Question

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

All PDF objects are based on underlying low-level PDF objects that are not controlled by .NET resource manager and garbage collector. Because of this, each PDF object that is created from scratch should be explicitly disposed of using that object’s Dispose() method.

Also, any ImGearPDEContent object obtained from ImGearPDFPage should be released using the ImGearPDFPage.ReleaseContent() in all cases.

This should cause all temp files to be cleared when the application is closed.

View Detail >

Question 12

Using ImageGear .NET, how can I detect the bit depth of a PDF so I may better compress my OCR output?

Answer

Question

I am trying to perform OCR on a PDF created from a scanned document. I need to rasterize the PDF page before importing the page into the recognition engine. When rasterizing the PDF page I want to set the bit depth of the generated page to be equal to the bit depth of the embedded image so I may use better compression methods for 1-bit and 8-bit images.

ImGearPDFPage.DIB.BitDepth will always return 24 for the bit depth of a PDF. Is there a way to detect the bit depth based on the PDF’s embedded content?

Answer

To do this:

Use the ImGearPDFPage.GetContent() function to get the elements stored in the PDF page.
Then loop through these elements and check if they are of the type ImGearPDEImage.
Convert the image to an ImGearPage and find it’s bit depth.
Use the highest bit depth detected from the images as the bit depth when rasterizing the page.

The code below demonstrates how to do detect the bit depth of a PDF page for all pages in a PDF document, perform OCR, and save the output while using compression.

private static void Recognize(ImGearRecognition engine, string sourceFile, ImGearPDFDocument doc)
    {
        using (ImGearPDFDocument outDoc = new ImGearPDFDocument())
        {
            // Import pages
            foreach (ImGearPDFPage pdfPage in doc.Pages)
            {
                int highestBitDepth = 0;
                ImGearPDEContent pdeContent = pdfPage.GetContent();
                int contentLength = pdeContent.ElementCount;
                for (int i = 0; i < contentLength; i++)
                {
                    ImGearPDEElement el = pdeContent.GetElement(i);
                    if (el is ImGearPDEImage)
                    {
                        //create an imGearPage from the embedded image and find its bit depth
                        int bitDepth = (el as ImGearPDEImage).ToImGearPage().DIB.BitDepth; 
                        if (bitDepth > highestBitDepth)
                        {
                            highestBitDepth = bitDepth;
                        }
                    }
                }
                if(highestBitDepth == 0)
                {
                    //if no images found in document or the images are embedded deeper in containers we set to a default bitDepth of 24 to be safe
                    highestBitDepth = 24;
                }
                ImGearRasterPage rasterPage = pdfPage.Rasterize(highestBitDepth, 200, 200);
                using (ImGearRecPage recogPage = engine.ImportPage(rasterPage))
                {
                    recogPage.Image.Preprocess();
                    recogPage.Recognize();
                    ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions() { VisibleImage = true, VisibleText = false, OptimizeForPdfa = true, ImageCompression = ImGearCompressions.AUTO, UseUnicodeText = false };
                    recogPage.CreatePDFPage(outDoc, options);
                }
            }
            outDoc.SaveCompressed(sourceFile + ".result.pdf");
        }
    }

For the compression type, I would recommend setting it to AUTO. AUTO will set the compression type depending on the image’s bit depth. The compression types that AUTO uses for each bit depth are:

1 Bit Per Pixel – ImGearCompressions.CCITT_G4
8 Bits Per Pixel – ImGearCompressions.DEFLATE
24 Bits Per Pixel – ImGearCompressions.JPEG

Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them.

However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images.

The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.

View Detail >