An important message to our customers about the current COVID-19/Coronavirus situation. LEARN MORE.
I am combining multiple PDF documents together, and I need to create a new bookmark collection, placed at the beginning of the new document. Each bookmark should go to a specific page or section of the new document. Example structure:
How might I do this using ImageGear .NET?
You are adding section dividers to the result document. So, for example, if you are to merge two documents, you might have, say, two sections, each with a single document, like so…
…The first page will be the first header page, and then the pages of Document 1, then another header page, then the pages of Document 2. So, the first header page is at index 0, the first page of Document 1 is at index 1, the second header is at 1 + firstDocumentPageCount, etc.
0
1
1 + firstDocumentPageCount
The following code demonstrates adding some blank pages to igResultDocument, inserting pages from other ImGearPDFDocuments, and modifying the bookmark tree such that it matches the outline above, with "Section X" pointing to the corresponding divider page and "Document X" pointing to the appropriate starting page number…
igResultDocument
ImGearPDFDocuments
// Create new document, add pages ImGearPDFDocument igResultDocument = new ImGearPDFDocument(); igResultDocument.CreateNewPage((int)ImGearPDFPageNumber.BEFORE_FIRST_PAGE, new ImGearPDFFixedRect(0, 0, 300, 300)); igResultDocument.InsertPages((int)ImGearPDFPageNumber.LAST_PAGE, igFirstDocument, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearPDFInsertFlags.DEFAULT); igResultDocument.CreateNewPage(igFirstDocument.Pages.Count, new ImGearPDFFixedRect(0, 0, 300, 300)); igResultDocument.InsertPages((int)ImGearPDFPageNumber.LAST_PAGE, igSecondDocument, 0, (int)ImGearPDFPageRange.ALL_PAGES, ImGearPDFInsertFlags.DEFAULT); // Add first Section ImGearPDFBookmark resultBookmarkTree = igResultDocument.GetBookmark(); resultBookmarkTree.AddNewChild("Section 1"); var child = resultBookmarkTree.GetLastChild(); int targetPageNumber = 0; setNewDestination(igResultDocument, targetPageNumber, child); // Add first Document child.AddNewChild("Document 1"); child = child.GetLastChild(); targetPageNumber = 1; setNewDestination(igResultDocument, targetPageNumber, child); // Add second Section resultBookmarkTree.AddNewChild("Section 2"); child = resultBookmarkTree.GetLastChild(); targetPageNumber = 1 + igFirstDocument.Pages.Count; setNewDestination(igResultDocument, targetPageNumber, child); // Add second Document child.AddNewChild("Document 2"); child = child.GetLastChild(); targetPageNumber = 2 + igFirstDocument.Pages.Count; setNewDestination(igResultDocument, targetPageNumber, child); // Save using (FileStream stream = File.OpenWrite(@"C:\path\here\test.pdf")) { igResultDocument.Save(stream, ImGearSavingFormats.PDF, 0, 0, igResultDocument.Pages.Count, ImGearSavingModes.OVERWRITE); } ... private ImGearPDFDestination setNewDestination(ImGearPDFDocument igPdfDocument, int targetPageNumber, ImGearPDFBookmark targetNode) { ImGearPDFAction action = targetNode.GetAction(); if (action == null) { action = new ImGearPDFAction( igPdfDocument, new ImGearPDFDestination( igPdfDocument, igPdfDocument.Pages[targetPageNumber] as ImGearPDFPage, new ImGearPDFAtom("XYZ"), new ImGearPDFFixedRect(), 0, targetPageNumber)); targetNode.SetAction(action); } return action.GetDestination(); }
(The setNewDestination method is a custom method that abstracts the details of adding the new destination.)
setNewDestination
Essentially, the GetBookmark() method will allow you to get an instance representing the root of the bookmark tree, with its children being subtrees themselves. Thus, we can add a new child to an empty tree, then get the last child with GetLastChild(). Then, we can set the action for that node to be a new "GoTo" action that will navigate to the specified destination. Upon save to the file system, this should produce a PDF with the below bookmark structure…
GetBookmark()
GetLastChild()
"GoTo"
Note that you may need to use the native Save method (NOT SaveDocument) described in the product documentation here in order to save a PDF file with the bookmark tree included. Also, you can read more about Actions in the PDF Specification.
Save
SaveDocument
If you have a copy of ImagXpress, there are cases where calling certain functions will trigger the following message:
"This function is available in another edition of ImagXpress v13.00 control"
What could cause this error?
This error can occur if ImagXpress Standard Edition is licensed on a system, but you’re trying to call operations that are only available in ImagXpress Professional Edition. So, you will need to ensure you’re using the proper license.
This documentation page specifies the functions that are supported for each edition.
This can also occur if you own both Barcode Xpress and ImagXpress Professional. Barcode Xpress includes ImagXpress Standard Edition, so if you install the ImagXpress Professional license first, and then install Barcode Xpress, the included Standard license will overwrite the Professional license. The resolution in this case is to re-install the ImagXpress license to overwrite Standard with Professional.
My service is crashing in IIS, and I see an error saying w3wp.exe has crashed in my Event Viewer. Why is my service crashing?
Normally, when there is an error in the Event Viewer saying that w3wp.exe has crashed, this means that your application pool has crashed. So, by extension, anything running on IIS will stop working. There are many reasons why this could happen, so to find out the root cause, there are a few things that you can do:
These two things should give you the information you need to determine the root cause of the problem. Below is an online article explaining in more detail how to get these two things and possible causes of the crash.
http://blog.whitesites.com/Debugging-Faulting-Application-w3wp-exe-Crashes__634424707278896484_blog.htm
In some cases, when the Server Licensing Utility (SLU) is run, it may return an error similar to the following:
"Server License Utility – Auto register failed Failed to auto-register. Extra code #0100-20(RCN=Accusoft.ULF.LicenseService.GenerateLicenseKey, RC=-56, REC=428). Contact Accusoft support. Error #1"
"Server License Utility – Auto register failed
Failed to auto-register. Extra code #0100-20(RCN=Accusoft.ULF.LicenseService.GenerateLicenseKey, RC=-56, REC=428). Contact Accusoft support. Error #1"
If, on the other hand, you manually register, you might see a message such as this:
An error has occurred: object (Accusoft.ULF.LicenseService.GenerateLicenseKey), value1 (-56), value2 (429)
What could be the cause?
A possible cause for this error is if you have a license with an expiration date and you have not specified the Access Key in the field on the SLU main window. Since these particular keys expire, our licensing needs to know which specific Access Key to use to differentiate it from any other licenses you may have with different expiration dates or OEM licenses. So, supplying the Access Key will point the license utility to the specific license in the license pool, and should resolve this error.
I am trying to deploy my ImageGear Pro ActiveX project and am receiving an error stating
The module igPDF18a.ocx failed to load
when registering the igPDF18a.ocx component. Why is this occurring, and how can I register the component correctly?
To Register your igPDF18a.ocx component you will need to run the following command:
igPDF18a.ocx
regsvr32 igPDF18a.ocx
If you receive an error stating that the component failed to load, then that likely means that regsvr32 is not finding the necessary dependencies for the PDF component.
The first thing you will want to check is that you have the Microsoft Visual C++ 10.0 CRT (x86) installed on the machine. You can download this from Microsoft’s site here:
Microsoft Visual C++ 10.0 CRT (x86)
https://www.microsoft.com/en-us/download/details.aspx?id=5555
The next thing you will want to check for is the DL100*.dll files. These files should be included in the deployment package generated by the deployment packaging wizard if you included the PDF component when generating the dependencies. These files must be in the same folder as the igPDF18a.ocx component in order to register it.
DL100*.dll
With those dependencies, you should be able to register the PDF component with regsvr32 without issue.
regsvr32
During the installation of ImageGear for .NET (v23.4 and above), the installer reaches out to Microsoft’s site to download the VC++ redistributable and .NET packages. Which one(s) does it download?
The ImageGear for .NET installer places the following redistributables onto a system:
In addition to this, the following .NET framework versions are installed:
So, if a system already has all of these installed on it, this should prevent the installer from trying to reach out to download them.
In ImageGear .NET, I am receiving error “API_HARDTIMEOUT_ERR” when using Recognize() to OCR a document. What is happening and how can I fix it?
Sample case: I have a large PDF I was processing page-by-page. The first 15 pages took six minutes, but the 16th page took three minutes and produced that error.
API_HARDTIMEOUT_ERR can occur when ImageGear has taken too long to process your document. This tends to happen when the OCR process is spending too much time on things it thinks are characters (very common in bitonal documents), such as, scan artifacts in damaged documents, visual marks (e.g, the distortion of a camera picture of a computer monitor), or other marks that the recognition engine would waste time on because it thinks they’re letters. See the bottom of this page for an example.
API_HARDTIMEOUT_ERR
For scanned bitonal documents, running a Despeckle operation on the page can help reduce the amount of noise obstructing the OCR process.
ImGearRasterPage igRasterPage = p.Rasterize(1, 300, 300); if (ImGearRasterProcessing.Verifier.CanApplyDespeckle(igRasterPage)) ImGearRasterProcessing.Despeckle(igRasterPage, 3, 3);
Also, if converting documents to bitonal is part of the document process, ImageGear .NET has reducing methods that may make for a less damaged document, such as our Reduce method with configurable parameters. Alternately, the color document could be OCR’d instead with likely better results.
In the past, some users have found some success adjusting some of the time-based parameters in the recognition engine. ImGearRecTradeoff and DecompMethod can be modified to trade-off accuracy for speed during the actual OCR process, and Locate can be used to identify existing text before recognition.
In FormSuite for Invoices, what are the accepted characters for the Currency and CurrencyPlus field data types?
Note: FSI assumes that currency values are United States values. It normalizes fields that are marked as Currency (not CurrencyPlus), any punctuation (except negative and decimal).
Currency
CurrencyPlus
The Currency and CurrencyPlus formats are used in the FSI recognition to help determine currency values on the page.
The Currency value is of the form
dd¢ [$|€|£|¥]ddd[\{separator}ddd][\{decimal}dd][-|=][ €]
The CurrencyPlus value is of the form
dd¢ [$|€|£|¥|E|EUR]ddd[\{separator}ddd][\{decimal}dd][-|=][ €| E|EUR|DKK|DKR|NOK|NKR|SEK|SK|GBP|KR|USD]
Where separator “,” and decimal “.” Or separator “.” and decimal “,”
I am trying to perform OCR on a PDF created from a scanned document. I need to rasterize the PDF page before importing the page into the recognition engine. When rasterizing the PDF page I want to set the bit depth of the generated page to be equal to the bit depth of the embedded image so I may use better compression methods for 1-bit and 8-bit images.
ImGearPDFPage.DIB.BitDepth will always return 24 for the bit depth of a PDF. Is there a way to detect the bit depth based on the PDF’s embedded content?
To do this:
ImGearPDFPage.GetContent()
ImGearPDEImage
ImGearPage
The code below demonstrates how to do detect the bit depth of a PDF page for all pages in a PDF document, perform OCR, and save the output while using compression.
private static void Recognize(ImGearRecognition engine, string sourceFile, ImGearPDFDocument doc) { using (ImGearPDFDocument outDoc = new ImGearPDFDocument()) { // Import pages foreach (ImGearPDFPage pdfPage in doc.Pages) { int highestBitDepth = 0; ImGearPDEContent pdeContent = pdfPage.GetContent(); int contentLength = pdeContent.ElementCount; for (int i = 0; i < contentLength; i++) { ImGearPDEElement el = pdeContent.GetElement(i); if (el is ImGearPDEImage) { //create an imGearPage from the embedded image and find its bit depth int bitDepth = (el as ImGearPDEImage).ToImGearPage().DIB.BitDepth; if (bitDepth > highestBitDepth) { highestBitDepth = bitDepth; } } } if(highestBitDepth == 0) { //if no images found in document or the images are embedded deeper in containers we set to a default bitDepth of 24 to be safe highestBitDepth = 24; } ImGearRasterPage rasterPage = pdfPage.Rasterize(highestBitDepth, 200, 200); using (ImGearRecPage recogPage = engine.ImportPage(rasterPage)) { recogPage.Image.Preprocess(); recogPage.Recognize(); ImGearRecPDFOutputOptions options = new ImGearRecPDFOutputOptions() { VisibleImage = true, VisibleText = false, OptimizeForPdfa = true, ImageCompression = ImGearCompressions.AUTO, UseUnicodeText = false }; recogPage.CreatePDFPage(outDoc, options); } } outDoc.SaveCompressed(sourceFile + ".result.pdf"); } }
For the compression type, I would recommend setting it to AUTO. AUTO will set the compression type depending on the image’s bit depth. The compression types that AUTO uses for each bit depth are:
Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them. However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images. The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.
Disclaimer: This may not work for all PDF documents due to some PDF’s structure. If you’re unfamiliar with how PDF content is structured, we have an explanation in our documentation. The above implementation of this only checks one layer into the PDF, so if there were containers that had images embedded in them, then it will not detect them.
However, this should work for documents created by scanners, as the scanned image should be embedded in the first PDF layer. If you have more complex documents, you could write a recursive function that goes through the layers of the PDF to find the images.
The above code will set the bit depth to 24 if it wasn’t able to detect any images in the first layer, just to be on the safe side.
The logging for ImageGear C & C++ Deployment Packaging Wizard (DPW) is showing different output for some components since v19.3, why is this?
In ImageGear C & C++ v19.2 and prior, the DPW had additional logging information for the ARTX component in its deployment.log:
Deploying an application that uses the ARTXGUI library of ImageGear ARTX Component requires the following merge modules to be installed: Microsoft_VC90_CRT_x86_x64.msm Microsoft_VC90_MFC_x86_x64.msm
Deploying an application that uses the ARTXGUI library of ImageGear ARTX Component requires the following merge modules to be installed:
Microsoft_VC90_CRT_x86_x64.msm
Microsoft_VC90_MFC_x86_x64.msm
But since v19.3, the logs are no longer telling me to install these modules. Is this a mistake, or are they no longer necessary?
This was an intentional change on our end, and the Deployment Packaging Wizard (DPW) is working as intended. We made some updates to the DPW in the latest release; one update is that the CRM requirements for CORE (which is required in every project) now also covers the ARTX component. If the DPW is not saying you need additional components to use the ARTX component, then you’ll be fine.
Is the main FormSuite for Invoices (FSI) process multithreaded? If so, would more cores or more CPU’s increase the performance?
Internally, we do multi-threading where we can at the beginning of the process. Improving your hardware will help if you are doing multiple pages at once, otherwise it will not. FormSuite for Invoices is thread-safe as long as each thread owns the FSI object being used.
Please see: https://help.accusoft.com/FSInvoices/v2.1/dotnet/webframe.html#topic5.html
When using OCR in ImageGear .NET, is there any way to distinguish between a capital/uppercase letter O and the number 0?
Not without context or a font that makes the difference clear (such as one with a slashed 0). ImageGear will properly recognize Oliver and 1530 as containing O and 0, respectively, but cannot reliably distinguish it when letters and numbers are mixed. That is, ImageGear may not reliably distinguish between 1ABO0F3 and 1AB0OF3.