Join us for an engaging webinar, as we unravel the potential of AI for revolutionizing document management.
Watch Now
Enable your employees to remain productive throughout the document management process.
Read More
Learn how SmartZone uses a regular expression engine integrated into the recognition engine to achieve the best possible accuracy on data that can be defined by a regular expression.
Docubee is an intelligent contact automation platform built to help your team success
How do I remove XMP Data from my image using ImageGear .NET?
When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:
ImGearSimplifiedMetadata.Initialize(); doc.Metadata.XMP = new ImGearXMPMetadataRoot();
Or, you can traverse through the metadata tree and remove each node from the tree:
// Example code. Not thoroughly tested private static void RemoveXmp(ImGearMetadataTree tree) { ArrayList toRemove = new ArrayList(); foreach (ImGearMetadataNode node in tree.Children) { if (node is ImGearMetadataTree) RemoveXmp((ImGearMetadataTree)node); if (node.Format != ImGearMetadataFormats.XMP) continue; toRemove.Add(node); } foreach (ImGearMetadataNode node in toRemove) tree.Children.Remove(node); }
In ImageGear, why am I running into AccessViolationExceptions when I run my application in parallel?
This issue can sometimes occur if ImGearPDF is being initialized earlier in the application. In order to use ImGearPDF in a multi-threaded program, it needs to be initialized on a per-thread basis. For example, if you have something like this:
ImGearPDF
ImGearPDF.Initialize(); Parallel.For(...) { // OCR code } ImGearPDF.Terminate();
Change it to this:
Parallel.For(...) { ImGearPDF.Initialize(); // OCR code ImGearPDF.Terminate(); }
The same logic applies to other ImageGear classes, such as ImGearPage instances or the ImGearRecognition class – you should create one instance of each class per thread, rather than creating a single instance and accessing it across threads. In the case of the ImGearRecognition class, you’ll have to use the createUnique parameter to make that possible e.g.:
ImGearPage
ImGearRecognition
createUnique
ImGearRecognition recEngine = ImGearRecognition(true);
instead of
ImGearRecognition recEngine = ImGearRecognition();
I want to re-arrange the page order of a PDF. I’ve tried the following…
var page = imGearDocument.Pages[indx].Clone(); imGearDocument.Pages.RemoveAt(indx); //// Exception: "One or more pages are in use and could not be deleted." imGearDocument.Pages.Insert(newIndx, page);
But an exception is thrown. Somehow, even though the page was cloned, the exception states that the page can’t be removed because it’s still in use.
What am I doing wrong here?
If you’re using an older version of ImageGear .NET, you may run into this exception when you clone the page. Some of the resources between the original and the clone are still shared, which is why this happens.
Starting with ImageGear .NET v24.8, this no longer happens, and the above code should work fine.
If you still need to use the earlier version, you can use the InsertPages method instead.
InsertPages
I am combining multiple PDF documents together, and I need to create a new bookmark collection, placed at the beginning of the new document. Each bookmark should go to a specific page or section of the new document. Example structure:
How might I do this using ImageGear .NET?
You are adding section dividers to the result document. So, for example, if you are to merge two documents, you might have, say, two sections, each with a single document, like so…
…The first page will be the first header page, and then the pages of Document 1, then another header page, then the pages of Document 2. So, the first header page is at index 0, the first page of Document 1 is at index 1, the second header is at 1 + firstDocumentPageCount, etc.
0
1
1 + firstDocumentPageCount
The following code demonstrates adding some blank pages to igResultDocument, inserting pages from other ImGearPDFDocuments, and modifying the bookmark tree such that it matches the outline above, with "Section X" pointing to the corresponding divider page and "Document X" pointing to the appropriate starting page number…
igResultDocument
ImGearPDFDocuments
// Create new document, add pages ImGearPDFDocument igResultDocument = new ImGearPDFDocument(); igResultDocument.CreateNewPage((int)ImGearPDFPageNumber.BEFORE_FIRST_PAGE, new ImGearPDFFixedRect(0, 0, 300, 300)); igResultDocument.InsertPages((int)ImGearPDFPageNumber.LAST_PAGE, igFirstDocument, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearPDFInsertFlags.DEFAULT); igResultDocument.CreateNewPage(igFirstDocument.Pages.Count, new ImGearPDFFixedRect(0, 0, 300, 300)); igResultDocument.InsertPages((int)ImGearPDFPageNumber.LAST_PAGE, igSecondDocument, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearPDFInsertFlags.DEFAULT); // Add first Section ImGearPDFBookmark resultBookmarkTree = igResultDocument.GetBookmark(); resultBookmarkTree.AddNewChild("Section 1"); var child = resultBookmarkTree.GetLastChild(); int targetPageNumber = 0; setNewDestination(igResultDocument, targetPageNumber, child); // Add first Document child.AddNewChild("Document 1"); child = child.GetLastChild(); targetPageNumber = 1; setNewDestination(igResultDocument, targetPageNumber, child); // Add second Section resultBookmarkTree.AddNewChild("Section 2"); child = resultBookmarkTree.GetLastChild(); targetPageNumber = 1 + igFirstDocument.Pages.Count; setNewDestination(igResultDocument, targetPageNumber, child); // Add second Document child.AddNewChild("Document 2"); child = child.GetLastChild(); targetPageNumber = 2 + igFirstDocument.Pages.Count; setNewDestination(igResultDocument, targetPageNumber, child); // Save using (FileStream stream = File.OpenWrite(@"C:\path\here\test.pdf")) { igResultDocument.Save(stream, ImGearSavingFormats.PDF, 0, 0, igResultDocument.Pages.Count, ImGearSavingModes.OVERWRITE); } ... private ImGearPDFDestination setNewDestination(ImGearPDFDocument igPdfDocument, int targetPageNumber, ImGearPDFBookmark targetNode) { ImGearPDFAction action = targetNode.GetAction(); if (action == null) { action = new ImGearPDFAction( igPdfDocument, new ImGearPDFDestination( igPdfDocument, igPdfDocument.Pages[targetPageNumber] as ImGearPDFPage, new ImGearPDFAtom("XYZ"), new ImGearPDFFixedRect(), 0, targetPageNumber)); targetNode.SetAction(action); } return action.GetDestination(); }
(The setNewDestination method is a custom method that abstracts the details of adding the new destination.)
setNewDestination
Essentially, the GetBookmark() method will allow you to get an instance representing the root of the bookmark tree, with its children being subtrees themselves. Thus, we can add a new child to an empty tree, then get the last child with GetLastChild(). Then, we can set the action for that node to be a new "GoTo" action that will navigate to the specified destination. Upon save to the file system, this should produce a PDF with the below bookmark structure…
GetBookmark()
GetLastChild()
"GoTo"
Note that you may need to use the native Save method (NOT SaveDocument) described in the product documentation here in order to save a PDF file with the bookmark tree included. Also, you can read more about Actions in the PDF Specification.
Save
SaveDocument
How do I ensure temp files are deleted when closing ImageGear .NET?
All PDF objects are based on underlying low-level PDF objects that are not controlled by .NET resource manager and garbage collector. Because of this, each PDF object that is created from scratch should be explicitly disposed of using that object’s Dispose() method.
Also, any ImGearPDEContent object obtained from ImGearPDFPage should be released using the ImGearPDFPage.ReleaseContent() in all cases.
This should cause all temp files to be cleared when the application is closed.
I am trying to deploy my ImageGear Pro ActiveX project and am receiving an error stating
The module igPDF18a.ocx failed to load
when registering the igPDF18a.ocx component. Why is this occurring, and how can I register the component correctly?
To Register your igPDF18a.ocx component you will need to run the following command:
igPDF18a.ocx
regsvr32 igPDF18a.ocx
If you receive an error stating that the component failed to load, then that likely means that regsvr32 is not finding the necessary dependencies for the PDF component.
The first thing you will want to check is that you have the Microsoft Visual C++ 10.0 CRT (x86) installed on the machine. You can download this from Microsoft’s site here:
Microsoft Visual C++ 10.0 CRT (x86)
https://www.microsoft.com/en-us/download/details.aspx?id=5555
The next thing you will want to check for is the DL100*.dll files. These files should be included in the deployment package generated by the deployment packaging wizard if you included the PDF component when generating the dependencies. These files must be in the same folder as the igPDF18a.ocx component in order to register it.
DL100*.dll
With those dependencies, you should be able to register the PDF component with regsvr32 without issue.
regsvr32
I am trying to perform OCR on a PDF created from a scanned document. I need to rasterize the PDF page before importing the page into the recognition engine. When rasterizing the PDF page I want to set the bit depth of the generated page to be equal to the bit depth of the embedded image so I may use better compression methods for 1-bit and 8-bit images.
ImGearPDFPage.DIB.BitDepth will always return 24 for the bit depth of a PDF. Is there a way to detect the bit depth based on the PDF’s embedded content?
To do this:
ImGearPDFPage.GetContent()
ImGearPDEImage
The code below demonstrates how to do detect the bit depth of a PDF page for all pages in a PDF document, perform OCR, and save the output while using compression.
private static void Recognize(ImGearRecognition engine, string sourceFile, ImGearPDFDocument doc) { using (ImGearPDFDocument outDoc = new ImGearPDFDocument()) { // Import pages foreach (ImGearPDFPage pdfPage in doc.Pages) { int highestBitDepth = 0; ImGearPDEContent pdeContent = pdfPage.GetContent(); int contentLength = pdeContent.ElementCount; for (int i = 0; i < contentLength; i++) { ImGearPDEElement el = pdeContent.GetElement(i); if (el is ImGearPDEImage) { //create an imGearPage from the embedded image and find its bit depth int bitDepth = (el as ImGearPDEImage).ToImGearPage().DIB.BitDepth; if (bitDepth > highestBitDepth) { highestBitDepth = bitDepth; } } } if(highestBitDepth == 0) { //if no images found in document or the images are embedded deeper in containers we set to a default bitDepth of 24 to be safe highestBitDepth = 24; } ImGearRasterPage rasterPage = pdfPage.Rasterize(highestBitDepth, 200, 200); using (ImGearRecPage recogPage = engine.ImportPage(rasterPage)) { recogPage.Image.Preprocess(); recogPage.Recognize(); ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions() { VisibleImage = true, VisibleText = false, OptimizeForPdfa = true, ImageCompression = ImGearCompressions.AUTO, UseUnicodeText = false }; recogPage.CreatePDFPage(outDoc, options); } } outDoc.SaveCompressed(sourceFile + ".result.pdf"); } }
For the compression type, I would recommend setting it to AUTO. AUTO will set the compression type depending on the image’s bit depth. The compression types that AUTO uses for each bit depth are:
Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them. However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images. The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.
Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them.
However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images.
The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.