Technical FAQs

Question

We are saving files to the PDF/A standard and are running into a few cases where the file cannot be saved as PDF/A by ImageGear .NET. Why is this, and how do we do it properly?

Answer

First, determine whether a PDF document can be converted to PDF/A by creating an ImGearPDFPreflight object from your document, and generating an ImGearPDFPreflightReport object from it:

using (ImGearPDFPreflight preflight = new ImGearPDFPreflight((ImGearPDFDocument)igDocument))
{
    report = preflight.VerifyCompliance(ImGearPDFPreflightProfile.PDFA_1A_2005, 0, -1);
}

The first argument of the VerifyCompliance() method is the standard of PDF/A you want to use. ImageGear .NET is currently able to convert documents to adhere to the PDF/A-1A and PDF/A-1B standards:

PDF/A-1 Standard

ImageGear and PDF/A

There are parts of the PDF/A-2 and PDF/A-3 standards which may allow for more documents to be converted, but ImageGear .NET currently does not support those. This could possibly be why your document cannot be converted in ImageGear .NET.

Once the report is generated, you can access its Status, which will tell you if the document is fixable. You can also access its Code which will let you know if it’s a fixed page or if it has issues; it will return Success if fixed, or some error code otherwise. You can check these conditions to determine whether it’s worth attempting to convert the document:

// If the document is not already PDFA-1a compliant but can be converted
if ((report.Code == ImGearPDFPreflightReportCodes.SUCCESS) ||
(report.Status == ImGearPDFPreflightStatusCode.Fixable))
{
    ImGearPDFPreflightConvertOptions pdfaOptions = new ImGearPDFPreflightConvertOptions(ImGearPDFPreflightProfile.PDFA_1A_2005, 0, -1);
    ImGearPDFPreflight preflight = new ImGearPDFPreflight((ImGearPDFDocument)igDocument);
    preflight.Convert(pdfaOptions);
    saveFile(outputPath, igDocument);
}

// Create error message if document was not converted.
else if (report.Status != ImGearPDFPreflightStatusCode.Fixed)
{
    printAllRecordDescriptions(report);
    throw new ApplicationException("Given PDF document cannot be converted to PDFA-1a standard.");
}

If you want more information on why a document may not be convertible, you can access the preflight report for its records and codes. A preflight’s "Records" member is a recursive list of preflight reports. A preflight report will have a list of reports under Records, and each of those reports may have more reports, etc. You can recursively loop through them as seen below to output every reason a document is not convertible:

    private static void printAllRecordDescriptions(StreamWriter file, ImGearPDFPreflightReport report)
    {
        foreach (ImGearPDFPreflightReport rep in report.Records)
        {
            file.WriteLine(rep.Description);
            file.WriteLine(rep.Code.ToString() + "\r\n");
            printAllRecordDescriptions(file, rep);
        }
    }

Ultimately, the failure of a document to convert to PDF/A is non-deterministic. While some compliance failures can be corrected, in combination they may not be correctable. Therefore, the unfortunate answer is that to determine if it can be converted, conversion must be attempted.

Question

If I have a PDF document that only has an embedded image in it (no text objects, etc.), can PrizmDoc Viewer take it and create a searchable PDF file from it?

Answer

Yes. PrizmDoc’s Content Conversion Services can take an image-only PDF and create a searchable PDF file from it. This can be done by modifying the input.dest.pdfOptions.ocr options object; see our documentation here.

If you are attempting to make a searchable PDF from an existing PDF document, please note that the source PDF file should be an image-only PDF. PrizmDoc will not create a searchable file from already-existing vector content.

This feature was introduced in PrizmDoc 13.1, please see our Release Notes for more information.

Question

If I have a PDF document that only has an embedded image in it (no text objects, etc.), can PrizmDoc Viewer take it and create a searchable PDF file from it?

Answer

Yes. PrizmDoc’s Content Conversion Services can take an image-only PDF and create a searchable PDF file from it. This can be done by modifying the input.dest.pdfOptions.ocr options object; see our documentation here.

If you are attempting to make a searchable PDF from an existing PDF document, please note that the source PDF file should be an image-only PDF. PrizmDoc will not create a searchable file from already-existing vector content.

This feature was introduced in PrizmDoc 13.1, please see our Release Notes for more information.

The simultaneous development of Pfizer and Moderna’s safe and effective COVID-19 vaccines in less than a year stands as one of the great feats of recent medical science. Now that the vaccines are available, however, the healthcare industry and government authorities must take on the new challenge of distributing doses to the population quickly and effectively. In some respects, this logistical feat will be every bit as daunting as developing the vaccines themselves.

Fortunately, the use of barcoding in healthcare supply chains and patient records will prove incredibly helpful in overcoming some of the key difficulties in vaccine distribution. Medical barcodes are already being used in many essential applications. For organizations that have yet to fully embrace the potential of digital transformation, barcode processing integrations can help them quickly expand their capabilities to meet the growing demands of vaccine delivery.

4 Ways Medical Barcodes Solve Vaccine Delivery Challenges

1. Better Supply Chain Accuracy Means Less Waste

Given the high costs of manufacturing and distributing the vaccines, there is justifiable concern over the potential for waste. Both versions of the vaccine need to be kept at low temperatures for shipping after manufacture (approximately -90 degrees Fahrenheit for Pfizer and about -10 degrees for Moderna). Once they’re moved to a refrigerator for administration, they cannot be refrozen. While the Moderna vaccine can last for up to 30 days refrigerated (provided the vial is not punctured), the Pfizer vaccine must be discarded after a mere six hours. Further complicating matters, each Pfizer thermal shipping container can potentially hold up to 975 multidose vials (4875 individual doses), whereas each box of Moderna vaccine contains 10 vials (100 doses).

Without accurate inventory and shipment tracking, healthcare providers could easily end up with too much supply in one location and not enough elsewhere. In a worst case scenario, unused doses might even go to waste because they can’t be redirected to another site quickly enough. By incorporating medical barcode scanning throughout the supply chain, healthcare organizations can ensure more efficient distribution during the shipping process. They can also verify that delivery sites have the appropriate storage capacity ahead of time to avoid the possibility of doses going to waste due to lack of freezer space.

2. Improved Dosage Records

One of the key challenges with distributing the currently approved vaccines is that they require multiple doses. Although the doses are identical from a chemical composition and dosage standpoint, the problem is that they must be administered after a specified interval. According to the FDA, that interval is approximately 21 days for the Pfizer vaccine and 28 days for the Moderna vaccine. As healthcare providers work to deliver the vaccine effectively, they must keep accurate records to show who has received the first dose and how much supply of each vaccine shipment should be designated for second doses.

The ability to read and print barcodes providers quickly track where patients are in the vaccination process and ensure that second doses will be available at the appropriate time. This is especially important considering that the vaccines are not interchangeable. Once someone has received the first Pfizer dose, for instance, they should not receive the Moderna vaccine for their second dose (except in exceptional circumstances). By generating a specific barcode after the initial dose and including it with a patient’s health records, providers can quickly and easily match people with the correct vaccine and make sure they have available doses on hand.

3. Keeps Essential Medical Equipment On-Hand

Vaccine distribution involves more than just shipping the doses themselves. Many different accessories are required to administer the vaccine, including protective equipment, vials, rubber stoppers, syringes and needles, and alcohol swabs. Healthcare supply chains were already under significant strain throughout the pandemic, so it should not be taken for granted that providers will have everything they need when the vaccine arrives. Furthermore, as the overall pace of vaccinations increases, it will be important to keep an accurate count of available equipment, especially if a provider does a lot of off-site vaccinations.

Barcoding in healthcare is critical to establishing connections between different elements of the supply chain. By using medical barcode integrations, providers can track and coordinate every piece of equipment needed for vaccine delivery in near-real time. Incorporating the same barcodes into patient records also gives a more up-to-date inventory count as doses are administered, ensuring that hospitals and healthcare facilities don’t run out of essential equipment when they need it most.

4. Expands Distribution Beyond Traditional Supply Chain

Distributing the vaccine in major population centers is difficult enough, but extending delivery into underserved rural areas presents a different set of challenges. These areas often lack the supply chain infrastructure to accommodate the rapid and widespread transfer of medical products. Healthcare providers will need technology tools that allow them to set up remote distribution and treatment centers capable of coordinating with local communities in order to extend their reach into these areas.

While barcoding in healthcare may provide the visibility organizations need into vaccine logistics and patient records, certain regions will also require mobile medical barcode integrations that can put more power and control into the hands of field workers. Rugged, reliable barcode integrations capable of reading broken or damaged barcodes using any mobile device will be essential for overcoming the limitations of rural digital infrastructure.

Unlock the Potential of Barcoding in Healthcare with Barcode Xpress

Accusoft’s Barcode Xpress SDK integration helps healthcare applications read, write, and detect more than thirty different barcode types, even if those images are damaged, broken, or incomplete. With the ability to read multiple barcodes at speeds of up to 1,000 pages per minute, Barcode Xpress can help medical providers take control of their supply chains and manage patient records more efficiently. That same functionality can be extended even further thanks to Barcode Xpress Mobile, which can turn any iOS or Android device into a powerful barcode scanner.

Distributing COVID-19 vaccine doses is one of the great logistical undertakings of the 21st century. By expanding the usage of barcoding in healthcare, providers can create greater transparency into their supply chains to reduce waste and deliver the vaccine more efficiently to the patients who need it most. Find out how Accusoft’s Barcode Xpress can help the medical industry upgrade its infrastructure to meet the challenge of restoring a sense of normalcy to people’s lives and overcoming the pandemic. Try a hands-on demo of our barcode SDK today.

OCR form

An automated forms processing solution can significantly improve accuracy and efficiency when it comes to managing large quantities of documents containing structured content. Whether an organization needs to digitize existing records or is continuously processing new documents within application workflows, having a versatile optical character recognition (OCR) component working to identify and extract text from multiple languages allows them to capture data more effectively.  Solid OCR form capture is critical.

Although a good OCR engine operates quickly and efficiently, the process of recognizing and extracting text is a highly complex undertaking that can be impacted by a variety of factors. Under optimal conditions, for example, the OCR component within Accusoft’s FormSuite can generate results quickly and accurately, with the ability to read several languages from around the world. However, if an application’s forms processing workflow is not set up efficiently or overlooks a few important considerations, recognition performance may suffer in terms of speed and accuracy.

6 Ways to Achieve the Best Results with the Accusoft OCR Component in FormSuite

 

1. Pay Attention to Image Resolution

As a general rule, OCR components should be provided with high resolution images so the recognition engine is able to distinguish the details that would otherwise be missed on low resolution images. This helps them to recognize the differences between “l” and “i” or “O” and “0” (zero), which results in better, more accurate results.

However, there could be a problem if the image resolution is too high. These images require much more time to process without delivering any benefits since the required letter properties are clearly distinguishable in a lower resolution.

To strike a balance between speed and accuracy, it’s better to scan all images in a 150-400 dots-per-pixel range. This allows the recognition engine to identify all possible letter properties and avoid being bogged down with analyzing a lot of data at the same time.

2. Don’t Lose Image Properties While Preparing to Recognize

To achieve the best results, it’s important to provide the recognition engine with a few helpful hints. In some cases, resolution properties may be lost while an image is being prepared for recognition, leading to worse than expected results. This happens most frequently when working with System.Drawing.Image or SystemDrawing.Bitmap classes directly during operations like clipping, merging, or reducing the bit depth. 

In this case, the best solution is to make sure that HorizontalResolution and VerticalResolution properties are set correctly and reflect initial image resolution values. The ScanFix component within FormSuite can perform this task automatically and is designed to be compatible with the OCR component to help achieve better recognition results.

3. Clean Up Underlined Text Before Recognition

Specks, dirt, and other imperfections within the source image can significantly reduce recognition quality. Sometimes, however, even a seemingly good image can be recognized incorrectly when there are underlined words like URLs, emails, or specifically formatted generic text. 

From the software’s point of view, this kind of text isn’t very different from other types of image distortion. ScanFix’s LineRemovalOptions can clean up the text by eliminating lines that could interfere with recognition. The API also features special parameters that ensures characters with low hanging elements (such as “j” or “y”) will be restored after line removal to avoid another potential recognition problem.

4. Use Long-Living Objects to Avoid Recognition Performance Drop

Creating a new instance requires OCR engine initialization and loading neural network data suitable for specific recognition parameters. This process is not resource free because of the data complexity and may cause delays from ~200 msec to 2 sec depending on the hardware and recognition properties. 

Existing Accusoft OCR instances may be reused to recognize other images with different properties. This will speed up the overall process because initialization will be done only once during the first AnalyzeField call and subsequent calls will be much cheaper in terms of computing resources.

5. Assign Instances to Their Own Worker Threads

Objects are thread safe and can be called from different threads. However, assigning an object to its own thread can avoid extra locking. One of the simplest ways to do this is to use C# Parallel.ForEach loop and create ConcurrentQueue with the pre-allocated objects. 

This ensures that the number of threads will not exceed the number of available CPUs. Any available instance can then be automatically assigned to recognize the images in their own thread while extra possible threads will wait until busy instances will be free to acquire.

Other common patterns are producer-consumer and map-reduce, which are more complex to implement but provide better flexibility when managing input data.

6. Dispose Objects to Avoid Memory High Memory Consumption

This is a generic rule for the C# to call a Dispose for the objects which use non-managed resources. FormSuite’s OCR component uses an external recognition engine, so it is highly recommended to call Dispose when the instance will not be required anymore. This can avoid a situation where the memory will not be available for different parts of the application, especially when a high amount of data exists for post-processing or the amount of available memory is low because of the different processes running in parallel.

Get Accurate OCR Data Capture Results with FormSuite

When properly configured and incorporated into a forms processing workflow, the FormSuite OCR component can accelerate automated data capture and reduce manual errors. Its zonal field recognition capabilities allow it to hone in on predefined field types to improve processing speed and accuracy. Developers can also adjust confidence values for recognition results to determine how frequently manual review is necessary. 

To get a hands-on look at how FormSuite incorporates OCR seamlessly into its collection of forms processing tools, schedule a free trial today.

Question

We are converting emails into PDFs using PrizmDoc. When the PDF is viewed in PrizmDoc, if you hover over the names in the email header, you see a mailto link that provides the email address.

Is there a way to remove those links during the conversion process? We wish to ensure there are no email addresses present in the PDFs.

Answer

To work around this issue, you can first convert the email (MSG) to a TIFF file. This will remove the links and just keep the name of the email recipient. Then convert the TIFF file to a searchable PDF.

This workaround requires that your PrizmDoc license has the OCR option enabled to create the searchable PDF. If you do not need to make the text searchable, then you can just convert the TIFF to a PDF.

On March 10, 2021, Accusoft announced the arrival of the free-to-use Accusoft PDF Viewer, the latest addition to its family of PDF solutions. An entirely client-side integration with no complicated server dependencies, this lightweight JavaScript PDF viewer also features a responsive UI for out-of-the-box mobile support.

“We’re excited to offer this free version of the Accusoft PDF Viewer to developers,” says Jack Berlin, CEO of Accusoft. “Our team worked hard to build a viewer that’s a step above what you can get from open source offerings. We think it’s going to solve a lot of the problems developers typically encounter with existing PDF libraries.”

Accusoft PDF Viewer integrates into an application quickly and easily with just a few snippets of code. It runs entirely within the browser to deliver an optimized viewing experience across all devices. The intuitive UI controls allow users to zoom, pan, jump to page, navigate thumbnails, and pinch-to-zoom on mobile screens with ease. And thanks to lightning fast full-text search, locating essential information is easier than ever.

“Accusoft PDF Viewer is great for developers because it allows them to maintain complete control over documents without having to set up any cumbersome server infrastructure,” says Mark Hansen, Product Manager. “Having a responsive UI that adapts to mobile displays will also increase their flexibility tremendously.”

The free version of Accusoft PDF Viewer allows developers to quickly add powerful viewing capabilities to their web applications. We’re currently working on additional features (such as annotation and eSignature) that will be included in an upgraded paid version.

To learn more about Accusoft PDF Viewer or download it for a first-hand look, please visit our website.

About Accusoft:
Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

PrizmDoc Hybrid Viewing

Today’s customers expect more out of their software applications. No one wants to waste time juggling between multiple platforms every time they need to open a simple document. They want applications to provide a streamlined user experience that allows them to interact with various file formats quickly and easily, with minimal performance issues.

Most software developers turn to third party integrations like Accusoft’s PrizmDoc to incorporate document processing capabilities into their applications. Since developers are frequently pressed for time and resources, it doesn’t make sense to build document lifecycle features from scratch when they can easily deploy a proven, scalable solution that provides all the tools they need. An API-based integration like PrizmDoc can quickly add industry-leading viewing, editing, collaboration, conversion, and assembly features to an application, which allows developers to focus on other features that will help their software stand out from competitors.

Pros and Cons of Server-Side Viewing

All that document processing power has to come from somewhere, and in the case of solutions like PrizmDoc, most processing is handled by a dedicated server. The server may be self-hosted on the developer’s local infrastructure, a dedicated private cloud, or a public cloud that’s shared by multiple customers.

There are plenty of advantages to this model. Scalable infrastructure is available for the heaviest document processing workloads, but customers only have to pay for the resources they actually use. A dedicated server also makes it easy for applications to manage document storage and avoid version confusion.

Server-side resources can also pose challenges for some applications. If the server is constantly being used to prepare and render documents for viewing, customers may find themselves utilizing more processing resources than expected. Scaling viewing capabilities for multiple users can increase resource usage because each session places additional processing requirements on the server, especially if users need to make annotations, redactions, or other alterations to files.

Viewing multiple, lengthy files server-side can also impact performance. PrizmDoc’s HTML5 viewer, for instance, converts and renders documents in SVG format. While this format offers outstanding quality and flexibility, load time may take longer and it also takes up server storage space.

Introducing PrizmDoc Hybrid Viewing

The new PrizmDoc Hybrid Viewing feature solves these challenges by offloading the processing work for viewing in PDF format to the end user’s device. This is a hybrid approach between server-side processing and client-side processing, with all of the viewing capabilities handled by the client-side device. This drastically reduces the server resources needed to prepare files for viewing, which translates into cost saving and improved performance. Since all viewing is handled by the browser on the client-side device, Hybrid Viewing offers much greater responsiveness for a better overall user experience.

For files not already in PDF format users can take advantage of the new viewing package which converts any file format to PDF. This not only allows documents to be viewed more quickly in the future, but also reduces server load and storage requirements.

5 Key Benefits of PrizmDoc Hybrid Viewing

The Hybrid Viewing feature works within PrizmDoc’s existing viewing package infrastructure, making it a simple and streamlined solution for both new and existing customers. Shifting viewing processing from the server to client-side devices provides applications with several important benefits.

1. Cost Savings

Transferring the processing work required for document viewing to an end user’s device reduces server workloads. Since customers pay for the server resources their applications utilize, minimizing server requirements for viewing can deliver significant cost savings over time.

2. Better Resource Management

All file types can be used with this new Hybrid Viewing feature. The new PDF viewing package pre-converts all file types into PDF only, rather than creating SVG files with large amounts of data. This saves both processing time and storage resources. Developers can take advantage of this flexibility and resource savings to implement additional application features that leverage PrizmDoc’s capabilities.

3. Increased Productivity

Shifting document viewing workloads to client-side devices allows applications to process, view, and manage multiple documents faster. This helps end users to do their jobs more efficiently and get greater value out of their applications.

4. Enhanced Performance

Hybrid viewing not only requires fewer resources, but files can be viewed and manipulated faster with enhanced responsiveness. For applications that need to provide editing features such as annotations, offloading processing to client-side devices minimizes load times and lag for a better overall user experience.

5. Scalable Document Viewing

By handling document viewing capabilities on local devices instead of the server, scaling capacity becomes far less resource intensive. File conversion only needs to be performed once, so adding more users doesn’t increase the overall server workload.

What Hybrid Viewing Means for PrizmDoc Users

The new Hybrid Viewing feature allows PrizmDoc users to get more out of their integration than ever before. For customers who have long relied on desktop-based PDF.js viewers due to concerns about server workload or performance, the Hybrid Viewing feature provides a localized viewing solution that streamlines their tech stack and leverages the full capabilities of PrizmDoc. By minimizing server requirements, developers can unlock the potential of their applications to scale their document lifecycle features without worrying of runaway costs.

Hybrid Viewing is available for PrizmDoc Server v13.15 or greater and can be used for self-hosted, private cloud-hosted, or public cloud-hosted deployments. To learn more about how it can provide the flexibility your application needs to scale with user demands, talk to one of our PrizmDoc specialists today.

Question

I am trying to retrieve documents and files to view in PrizmDoc Viewer. The files are located in a NAS device. The file server is available via an HTTP link but I would prefer not to use the HTTP put method.

Answer

A NAS device is short for Network Attached Storage. Typically, to access these devices would be no different than a shared network drive from a server.

You can setup Prizm Application Services (PAS) to point to a NAS device, if there are actual file shares set up on that device. Also, a key note to remember is that both PrizmDoc and PAS should be running with a domain id which has full access to that share so that the services can access the files when called.

For instance, you have a folder on the NAS device called PrizmFolders and it is shared with a network drive share of \mynasdevice\PrizmFolders. You can modify the pcc.win.yml file to point to that root folder by updating the document.path as outlined below. Keep in mind that the backslashes have to be escaped so you will need an extra backslash for each backslash in the path:

documents.path:"\\\\mynasdevice\\PrizmFolders"

Once this is done, when posting a viewing session through PAS, you can simply specify the subfolder\filename. For instance, if there was a folder called northregion and a file called metrics.pdf (\mynasdevice\PrizmFolders\northregion\metrics.pdf) you would be able to specify northregeion\metrics.pdf in the post command.