Technical FAQs

Question

How do I use a Network Drive path for Image and ART storage in my ImageGear .NET web application?

Answer

In an ImageGear .NET web application, you have to define the location of the images and annotations directory in the storageRootPath and artStorageRootPath configuration property.
In the current version of ImageGear .NET, the storageRootPath and artStorageRootPath do not work with a network drive path \\SERVER-NAME\sharefilename.

The workaround for this would be to create a Symbolic link from a local directory to the network drive directory.

  • To create a symbolic link: Open “Command Prompt” as Administrator and type in > mklink /d "local path" \\SERVER-NAME\sharefilename
  • Pass in the path of the symbolic link as image or art storage root path in your web.config: storageRootPath="local path" artStorageRootPath="local path"
Question

How do I use a Network Drive path for Image and ART storage in my ImageGear .NET web application?

Answer

In an ImageGear .NET web application, you have to define the location of the images and annotations directory in the storageRootPath and artStorageRootPath configuration property.
In the current version of ImageGear .NET, the storageRootPath and artStorageRootPath do not work with a network drive path \\SERVER-NAME\sharefilename.

The workaround for this would be to create a Symbolic link from a local directory to the network drive directory.

  • To create a symbolic link: Open “Command Prompt” as Administrator and type in > mklink /d "local path" \\SERVER-NAME\sharefilename
  • Pass in the path of the symbolic link as image or art storage root path in your web.config: storageRootPath="local path" artStorageRootPath="local path"

Today’s organizations gather information from a variety of sources. Structured forms remain one of the most popular tools for collecting and processing data, and anyone who has filled out such a form recently has likely encountered the familiar bubbles or squares used to indicate some form of information. Whether these marks are used to identify marital status, health conditions, education level, or some other parameter, optical mark recognition plays an important role in streamlining forms processing and data capture.

What is Optical Mark Recognition?

Optical mark recognition (OMR) reads and captures data marked on a special type of document form. In most instances, this form consists of a bubble or a square that is filled in as part of a test or survey. After the form is marked, it can either be read by dedicated OMR software or fed into a physical scanner device that shines a beam of light onto the paper and then detects answers based on how much light is reflected back to an optical sensor. Older OMR scanners detected answers by measuring how much light passed through the paper itself using phototubes on the other side. Since the phototubes were very sensitive, #2 pencils often had to be used when filling out forms to ensure an accurate reading.

Today’s OMR scanners are much more accurate and versatile, capable of reading marks regardless of how they’re filled out (although they struggle if the mark is made with the same color as the printed form). More importantly, OMR software has made it possible to capture data from OMR forms without the need for any special equipment. This is especially helpful for processing forms information that exists in digital format, such as PDF files or JPEG images. 

The History of Optical Mark Recognition

One of the oldest versions of forms processing technology, OMR dates back to the use of punch cards, which were first developed in the late 1800s for use with crude “tabulating” machines. The cards typically provided simple “yes/no” information based on whether or not a hole was punched out. When fed through the tabulating machine, a hole would be registered and counted. This same basic principle would allow more complex machines to perform basic arithmetic in the early 1900s before serving as the foundation for early computer programming by mid-century. Entire computer programs were stored on stacks of punch cards, which would remain in use until well into the 1970s when more powerful machines made them obsolete.

Although OMR operates on the same principle as a punch card, it instead uses scanning technology to detect the presence of a mark made by a pencil or a pen. This form of identification was first popularized by IBM’s electrographic “mark sense” technology in the 1930s and 1940s. The concept itself was first developed by a schoolteacher named Reynold Johnson, who wanted to streamline test grading. He designed a machine that could read pencil marks on a special test paper and then tabulate the marks to generate a final score. After joining IBM in 1934, Johnson spearheaded the development of the Type 805 Test Scoring Machine, which debuted in 1938 and revolutionized test scoring in the education sector. In production until 1963, the 805 could score 800 sheets per hour when run by an experienced operator.

The 805 registered marks by using metal brushes to sense the electrical conductivity of graphite from the pencil lead. While effective, it had limitations in terms of reading speed and flexibility. When Everett Franklin Lindquist, best known as the creator of the ACT, needed a machine that could keep up with Iowa’s widespread adoption of standardized testing in the 1950s, he developed the first true optical mark reader. Patented in 1962, Lindquist’s machine detected marks by measuring how much light passed through a scoring sheet and was capable of scoring 4,000 tests per hour.

Throughout the 1960s, OMR scanning technology continued to improve and spread to a variety of industries looking for ways to rapidly process data. In education, however, the OMR market would soon be dominated by the Scantron Corporation, which was founded in 1972 to market smaller, less expensive scanners to K-12 schools and universities. After placing the scanners in educational institutions, Scantron then sold large quantities of proprietary test sheets that could be used for a variety of testing purposes. Scantron was so successful that their distinctive green and white sheets have become synonymous with OMR scanning for generations of US college students.

The next major innovation in OMR technology arrived in the early 1990s with dedicated OMR software that could replicate the drop-out capabilities of commercial scanners. Part of the reason why scanners used proprietary, pre-printed forms was so they could use colors and watermarks that would not register during scanning for more accurate reading. Thanks to OMR software, it became possible to create templated forms and then remove the form image during the reading process to ensure that only marked information remained.

Take Control of OMR Forms with Accusoft SDKs

Accusoft’s FormFix forms processing SDK features powerful production-level OMR capabilities. It not only detects the presence of check or bubble marks, but can also detect markings in form fields, which is particularly useful for determining whether or not a signature is present on a document. Capable of reading single or multiple marks at 0, 90, 180, and 270 degree orientations, FormFix can also recognize checkboxes and be programmed to accommodate a variety of bubble shapes. Its form drop-out and image cleanup features also help to ensure the highest level of accuracy during OMR reading.

For expanded forms functionality, including optical character recognition (OCR) and intelligent character recognition (ICR), developers can also turn to FormSuite for Structured Forms. Featuring a comprehensive set of forms template creation tools and data capture capabilities, FormSuite can streamline forms processing workflows and significantly reduce the costs and errors associated with manual data entry and extraction.

Find out what flexible OMR functionality can do for your application with a fully-featured trial of the FormSuite SDK. Get started with some functional sample code and explore FormFix’s features to start planning your integration.

Question

How do I remove XMP Data from my image using ImageGear .NET?

Answer

When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:

ImGearSimplifiedMetadata.Initialize(); 
doc.Metadata.XMP = new ImGearXMPMetadataRoot();

Or, you can traverse through the metadata tree and remove each node from the tree:

// Example code. Not thoroughly tested
private static void RemoveXmp(ImGearMetadataTree tree)
{
ArrayList toRemove = new ArrayList();
foreach (ImGearMetadataNode node in tree.Children)
{
    if (node is ImGearMetadataTree)
        RemoveXmp((ImGearMetadataTree)node);

    if (node.Format != ImGearMetadataFormats.XMP)
        continue;

    toRemove.Add(node);
}

foreach (ImGearMetadataNode node in toRemove)
    tree.Children.Remove(node);
}
Question

ImageGear .NET v24.6 added support for viewing PDF documents with XFA content. I’m using v24.8, and upon trying to open an XFA PDF, I get a SEHException for some reason…

SEHException

Why might this be happening?

Answer

One reason could be because you need to execute the following lines after initializing the PDF component, and prior to loading an XFA PDF:

// Allow opening of PDF documents that contain XFA form data.
IImGearFormat pdfFormat = ImGearFileFormats.Filters.Get(ImGearFormats.PDF);
pdfFormat.Parameters.GetByName("XFAAllowed").Value = true;

This will enable XFA PDFs to be opened by the ImageGear .NET toolkit.

Question

In ImageGear, why am I running into AccessViolationExceptions when I run my application in parallel?

Answer

This issue can sometimes occur if ImGearPDF is being initialized earlier in the application. In order to use ImGearPDF in a multi-threaded program, it needs to be initialized on a per-thread basis. For example, if you have something like this:

ImGearPDF.Initialize();
Parallel.For(...)
{ 
    // OCR code
}
ImGearPDF.Terminate();

Change it to this:

Parallel.For(...)
{
    ImGearPDF.Initialize();
    // OCR code
    ImGearPDF.Terminate();
}

The same logic applies to other ImageGear classes, such as ImGearPage instances or the ImGearRecognition class – you should create one instance of each class per thread, rather than creating a single instance and accessing it across threads. In the case of the ImGearRecognition class, you’ll have to use the createUnique parameter to make that possible e.g.:

ImGearRecognition recEngine = ImGearRecognition(true);

instead of

ImGearRecognition recEngine = ImGearRecognition();
Question

I want to re-arrange the page order of a PDF. I’ve tried the following…

var page = imGearDocument.Pages[indx].Clone();

imGearDocument.Pages.RemoveAt(indx); //// Exception: "One or more pages are in use and could not be deleted."

imGearDocument.Pages.Insert(newIndx, page);

But an exception is thrown. Somehow, even though the page was cloned, the exception states that the page can’t be removed because it’s still in use.

What am I doing wrong here?

Answer

If you’re using an older version of ImageGear .NET, you may run into this exception when you clone the page. Some of the resources between the original and the clone are still shared, which is why this happens.

Starting with ImageGear .NET v24.8, this no longer happens, and the above code should work fine.

If you still need to use the earlier version, you can use the InsertPages method instead.

Question

How do I remove XMP Data from my image using ImageGear .NET?

Answer

When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:

ImGearSimplifiedMetadata.Initialize(); 
doc.Metadata.XMP = new ImGearXMPMetadataRoot();

Or, you can traverse through the metadata tree and remove each node from the tree:

// Example code. Not thoroughly tested
private static void RemoveXmp(ImGearMetadataTree tree)
{
ArrayList toRemove = new ArrayList();
foreach (ImGearMetadataNode node in tree.Children)
{
    if (node is ImGearMetadataTree)
        RemoveXmp((ImGearMetadataTree)node);

    if (node.Format != ImGearMetadataFormats.XMP)
        continue;

    toRemove.Add(node);
}

foreach (ImGearMetadataNode node in toRemove)
    tree.Children.Remove(node);
}
Question

I am trying to perform OCR on a PDF created from a scanned document. I need to rasterize the PDF page before importing the page into the recognition engine. When rasterizing the PDF page I want to set the bit depth of the generated page to be equal to the bit depth of the embedded image so I may use better compression methods for 1-bit and 8-bit images.

ImGearPDFPage.DIB.BitDepth will always return 24 for the bit depth of a PDF. Is there a way to detect the bit depth based on the PDF’s embedded content?

Answer

To do this:

  1. Use the ImGearPDFPage.GetContent() function to get the elements stored in the PDF page.
  2. Then loop through these elements and check if they are of the type ImGearPDEImage.
  3. Convert the image to an ImGearPage and find it’s bit depth.
  4. Use the highest bit depth detected from the images as the bit depth when rasterizing the page.

The code below demonstrates how to do detect the bit depth of a PDF page for all pages in a PDF document, perform OCR, and save the output while using compression.

private static void Recognize(ImGearRecognition engine, string sourceFile, ImGearPDFDocument doc)
    {
        using (ImGearPDFDocument outDoc = new ImGearPDFDocument())
        {
            // Import pages
            foreach (ImGearPDFPage pdfPage in doc.Pages)
            {
                int highestBitDepth = 0;
                ImGearPDEContent pdeContent = pdfPage.GetContent();
                int contentLength = pdeContent.ElementCount;
                for (int i = 0; i < contentLength; i++)
                {
                    ImGearPDEElement el = pdeContent.GetElement(i);
                    if (el is ImGearPDEImage)
                    {
                        //create an imGearPage from the embedded image and find its bit depth
                        int bitDepth = (el as ImGearPDEImage).ToImGearPage().DIB.BitDepth; 
                        if (bitDepth > highestBitDepth)
                        {
                            highestBitDepth = bitDepth;
                        }
                    }
                }
                if(highestBitDepth == 0)
                {
                    //if no images found in document or the images are embedded deeper in containers we set to a default bitDepth of 24 to be safe
                    highestBitDepth = 24;
                }
                ImGearRasterPage rasterPage = pdfPage.Rasterize(highestBitDepth, 200, 200);
                using (ImGearRecPage recogPage = engine.ImportPage(rasterPage))
                {
                    recogPage.Image.Preprocess();
                    recogPage.Recognize();
                    ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions() { VisibleImage = true, VisibleText = false, OptimizeForPdfa = true, ImageCompression = ImGearCompressions.AUTO, UseUnicodeText = false };
                    recogPage.CreatePDFPage(outDoc, options);
                }
            }
            outDoc.SaveCompressed(sourceFile + ".result.pdf");
        }
    }

For the compression type, I would recommend setting it to AUTO. AUTO will set the compression type depending on the image’s bit depth. The compression types that AUTO uses for each bit depth are: 

  • 1 Bit Per Pixel – ImGearCompressions.CCITT_G4
  • 8 Bits Per Pixel – ImGearCompressions.DEFLATE
  • 24 Bits Per Pixel – ImGearCompressions.JPEG

Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them.

However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images.

The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.

Question

I am trying to deploy my ImageGear Pro ActiveX project and am receiving an error stating

The module igPDF18a.ocx failed to load

when registering the igPDF18a.ocx component. Why is this occurring, and how can I register the component correctly?

Answer

To Register your igPDF18a.ocx component you will need to run the following command:

regsvr32 igPDF18a.ocx

If you receive an error stating that the component failed to load, then that likely means that regsvr32 is not finding the necessary dependencies for the PDF component.

The first thing you will want to check is that you have the Microsoft Visual C++ 10.0 CRT (x86) installed on the machine. You can download this from Microsoft’s site here:

https://www.microsoft.com/en-us/download/details.aspx?id=5555

The next thing you will want to check for is the DL100*.dll files. These files should be included in the deployment package generated by the deployment packaging wizard if you included the PDF component when generating the dependencies. These files must be in the same folder as the igPDF18a.ocx component in order to register it.

With those dependencies, you should be able to register the PDF component with regsvr32 without issue.

Question

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

All PDF objects are based on underlying low-level PDF objects that are not controlled by .NET resource manager and garbage collector. Because of this, each PDF object that is created from scratch should be explicitly disposed of using that object’s Dispose() method.

Also, any ImGearPDEContent object obtained from ImGearPDFPage should be released using the ImGearPDFPage.ReleaseContent() in all cases.

This should cause all temp files to be cleared when the application is closed.

Question

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

All PDF objects are based on underlying low-level PDF objects that are not controlled by .NET resource manager and garbage collector. Because of this, each PDF object that is created from scratch should be explicitly disposed of using that object’s Dispose() method.

Also, any ImGearPDEContent object obtained from ImGearPDFPage should be released using the ImGearPDFPage.ReleaseContent() in all cases.

This should cause all temp files to be cleared when the application is closed.