Technical FAQs

Question

We are saving files to the PDF/A standard and are running into a few cases where the file cannot be saved as PDF/A by ImageGear .NET. Why is this, and how do we do it properly?

Answer

First, determine whether a PDF document can be converted to PDF/A by creating an ImGearPDFPreflight object from your document, and generating an ImGearPDFPreflightReport object from it:

using (ImGearPDFPreflight preflight = new ImGearPDFPreflight((ImGearPDFDocument)igDocument))
{
    report = preflight.VerifyCompliance(ImGearPDFPreflightProfile.PDFA_1A_2005, 0, -1);
}

The first argument of the VerifyCompliance() method is the standard of PDF/A you want to use. ImageGear .NET is currently able to convert documents to adhere to the PDF/A-1A and PDF/A-1B standards:

PDF/A-1 Standard

ImageGear and PDF/A

There are parts of the PDF/A-2 and PDF/A-3 standards which may allow for more documents to be converted, but ImageGear .NET currently does not support those. This could possibly be why your document cannot be converted in ImageGear .NET.

Once the report is generated, you can access its Status, which will tell you if the document is fixable. You can also access its Code which will let you know if it’s a fixed page or if it has issues; it will return Success if fixed, or some error code otherwise. You can check these conditions to determine whether it’s worth attempting to convert the document:

// If the document is not already PDFA-1a compliant but can be converted
if ((report.Code == ImGearPDFPreflightReportCodes.SUCCESS) ||
(report.Status == ImGearPDFPreflightStatusCode.Fixable))
{
    ImGearPDFPreflightConvertOptions pdfaOptions = new ImGearPDFPreflightConvertOptions(ImGearPDFPreflightProfile.PDFA_1A_2005, 0, -1);
    ImGearPDFPreflight preflight = new ImGearPDFPreflight((ImGearPDFDocument)igDocument);
    preflight.Convert(pdfaOptions);
    saveFile(outputPath, igDocument);
}

// Create error message if document was not converted.
else if (report.Status != ImGearPDFPreflightStatusCode.Fixed)
{
    printAllRecordDescriptions(report);
    throw new ApplicationException("Given PDF document cannot be converted to PDFA-1a standard.");
}

If you want more information on why a document may not be convertible, you can access the preflight report for its records and codes. A preflight’s "Records" member is a recursive list of preflight reports. A preflight report will have a list of reports under Records, and each of those reports may have more reports, etc. You can recursively loop through them as seen below to output every reason a document is not convertible:

    private static void printAllRecordDescriptions(StreamWriter file, ImGearPDFPreflightReport report)
    {
        foreach (ImGearPDFPreflightReport rep in report.Records)
        {
            file.WriteLine(rep.Description);
            file.WriteLine(rep.Code.ToString() + "\r\n");
            printAllRecordDescriptions(file, rep);
        }
    }

Ultimately, the failure of a document to convert to PDF/A is non-deterministic. While some compliance failures can be corrected, in combination they may not be correctable. Therefore, the unfortunate answer is that to determine if it can be converted, conversion must be attempted.

Question

If I have a PDF document that only has an embedded image in it (no text objects, etc.), can PrizmDoc Viewer take it and create a searchable PDF file from it?

Answer

Yes. PrizmDoc’s Content Conversion Services can take an image-only PDF and create a searchable PDF file from it. This can be done by modifying the input.dest.pdfOptions.ocr options object; see our documentation here.

If you are attempting to make a searchable PDF from an existing PDF document, please note that the source PDF file should be an image-only PDF. PrizmDoc will not create a searchable file from already-existing vector content.

This feature was introduced in PrizmDoc 13.1, please see our Release Notes for more information.

Anyone who has watched a thriller about government secrecy probably has an image in mind about what it means to redact a document. That picture usually involves piles of classified pages with entire paragraphs blotted out with black marker. At some point, a character holds a sheet up to a light and finds a spot where the redacted text is just barely visible enough to provide them with the next clue that moves the story forward. They may even use some special form of scanner that allows them to see the hidden material.

Such scenes reveal the fundamental problem with text redaction. As long as the content remains present, there might be some way of making it visible again, which presents serious problems in terms of privacy and security. The transition to purely digital documents should have made these concerns a thing of the past. Unfortunately, too many people fail to take advantage of PDF redaction tools and leave their confidential material dangerously exposed.

PDFs Are Not Like Physical Documents

In 2016, Democrats in the U.S. House of Representatives made the embarrassing mistake of releasing a cache of documents that contained improper redactions. Journalists easily found what was hidden beneath the black markings by copying the PDF text and pasting it into another document, which instantly revealed the redacted material.

This was not the first time government officials, or other organizations, released improperly redacted documents. Part of the reason why this mistake keeps happening is that people frequently apply the same practices used with physical documents to digital documents. It’s a simple matter to use shapes or drawing tools to obscure text in a PDF, but doing so only hides the content from view rather than removing it altogether.

As the “copy and paste” trick described above shows, it’s often trivially easy to bypass such “redactions.” That’s because a PDF document is not like a physical, printed document, even though it resembles one in a viewer. A PDF consists of multiple layers, as well as extensive metadata that isn’t visible. Adding a black box over text simply adds another layer to the document. Accessing the layer of text information underneath is quite simple, even with relatively basic software tools.

Redacting Content from Electronic Documents

The first step in true redaction involves the removal of selected content entirely. This ensures that even if someone is able to extract the text layer from the document, the redacted portions will not become visible when pasted elsewhere.

However, even removing the visible text itself may not be enough to protect confidential information. That’s because there may be some data remaining in the document that could contain information about how to render the redacted portions. While it would be possible to avoid this problem by converting a PDF to a bitmap image, removing the portions to be redacted, and then building an entirely new document using OCR, this process is time consuming and difficult to scale.

Using PDF Redaction Tools in PrizmDoc Viewer

A much more efficient approach would be to utilize dedicated PDF redaction tools like those built into PrizmDoc Viewer. Thanks to a sophisticated and intuitive API, PrizmDoc allows users to perform a number of redaction functions within its easy-to-use HTML5 viewer:

  • Add individual redactions by selecting text, applying a redaction rectangle, or marking out the whole page.
  • Perform a search for specific terms and apply redactions to each instance.
  • Add redaction layers to a document that can be saved and edited during preparation.
  • Apply redaction reasons to explain why certain content has been removed.

When integrating PrizmDoc Viewer into their applications, developers can also customize the HTML5 viewer to apply predefined redactions, preload entire redaction layers, or create unique redactions programmatically. This is especially useful for high-volume document workflows that need to identify and remove commonly used private data like Social Security numbers, contact information, and financial information.

PrizmDoc Viewer’s redaction API strips out all information associated with the redacted material from the document. That means any removed content isn’t just no longer visible; it also can’t be highlighted, copied, searched, or indexed because it’s no longer present in any way. Remaining text content, however, is still readily available. Even better, sharing documents through the HTML5 viewer also hides metadata that could contain sensitive information.

When redactions are made, PrizmDoc Viewer allows users to indicate the reasons for these removals. This is especially important for transparency purposes when working with government documents. The redaction API supports single and multiple redaction reasons for improved clarity.

Of course, most organizations still need to retain access to unredacted documents for internal use. That’s why PrizmDoc Viewer retains an unaltered version of the document safely uploaded to the server. The actual redacted document is a new file with all redacted content removed. Users can then use PrizmDoc Viewer’s sharing controls to further manage access to the file.

Redact Your Documents the Right Way

Today’s applications can’t afford to take redaction lightly. Whether they’re building the next generation of government technologies or LegalTech applications, developers need to provide their customers with the ability to easily screen documents to protect sensitive and private information from being exposed. By integrating viewing and document editing solutions with PDF redaction tools, they can help organizations take control over document security and avoid embarrassing redaction mistakes that could expose them to severe liability.

PrizmDoc Viewer’s versatile HTML5 viewing capabilities leverage powerful APIs to easily incorporate document redaction into application workflows. With just a simple API call, users can quickly locate and remove information from documents before sharing them with anyone outside the organization. To see PrizmDoc Viewer’s PDF redaction tools first hand, check out our interactive online demo today.

On August 3, 2021, Accusoft announced the release of the paid Professional version of Accusoft PDF Viewer. Initially released in March of 2021, the Standard version of Accusoft PDF Viewer is a free-to-use, lightweight JavaScript PDF library featuring a responsive UI for out-of-the-box mobile support. The new Professional version adds enhanced PDF tools and document functionality without introducing any complex server dependencies that could impact application security or performance.

“We’ve received tremendous feedback so far regarding the Standard version of Accusoft PDF Viewer,” says Jack Berlin, CEO of Accusoft. “With the release of the paid Professional version, customers now have a clear upgrade path that allows them to add new features without having to rethink their application architecture.”

Key Accusoft PDF Viewer Professional features include:

  • Multiple Annotation Types
  • Customizable UI
  • White Labeling
  • Electronic Signature

As an entirely client-side integration, Accusoft PDF Viewer can be incorporated into any web application with just a few lines of code. The paid Professional version features the same intuitive UI controls that provide an optimized viewing experience across all screen types, making it ideal for web apps that need to run on both desktop and mobile devices.

“We did a lot of research to determine which features are most important to developers,” says Mark Hansen, Product Manager at Accusoft. “The ability to markup and electronically sign documents without having to rely on external servers or backend processing is going to be a gamechanger for a lot of applications.”

To learn more about the latest Accusoft PDF Viewer features, please visit our website.

About Accusoft: 

Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

###

Question

We are converting emails into PDFs using PrizmDoc. When the PDF is viewed in PrizmDoc, if you hover over the names in the email header, you see a mailto link that provides the email address.

Is there a way to remove those links during the conversion process? We wish to ensure there are no email addresses present in the PDFs.

Answer

To work around this issue, you can first convert the email (MSG) to a TIFF file. This will remove the links and just keep the name of the email recipient. Then convert the TIFF file to a searchable PDF.

This workaround requires that your PrizmDoc license has the OCR option enabled to create the searchable PDF. If you do not need to make the text searchable, then you can just convert the TIFF to a PDF.

On March 10, 2021, Accusoft announced the arrival of the free-to-use Accusoft PDF Viewer, the latest addition to its family of PDF solutions. An entirely client-side integration with no complicated server dependencies, this lightweight JavaScript PDF viewer also features a responsive UI for out-of-the-box mobile support.

“We’re excited to offer this free version of the Accusoft PDF Viewer to developers,” says Jack Berlin, CEO of Accusoft. “Our team worked hard to build a viewer that’s a step above what you can get from open source offerings. We think it’s going to solve a lot of the problems developers typically encounter with existing PDF libraries.”

Accusoft PDF Viewer integrates into an application quickly and easily with just a few snippets of code. It runs entirely within the browser to deliver an optimized viewing experience across all devices. The intuitive UI controls allow users to zoom, pan, jump to page, navigate thumbnails, and pinch-to-zoom on mobile screens with ease. And thanks to lightning fast full-text search, locating essential information is easier than ever.

“Accusoft PDF Viewer is great for developers because it allows them to maintain complete control over documents without having to set up any cumbersome server infrastructure,” says Mark Hansen, Product Manager. “Having a responsive UI that adapts to mobile displays will also increase their flexibility tremendously.”

The free version of Accusoft PDF Viewer allows developers to quickly add powerful viewing capabilities to their web applications. We’re currently working on additional features (such as annotation and eSignature) that will be included in an upgraded paid version.

To learn more about Accusoft PDF Viewer or download it for a first-hand look, please visit our website.

About Accusoft:
Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.

Question

What are the technical details/process of “Flattening” a PDF document?

Answer

It is possible to “Flatten” PDF documents in PrizmDoc Viewer. You can do this by converting the document to a raster format (TIFF is recommended for PDF conversion) using PrizmDoc’s Content Conversion Service, and then converting it back to PDF format. This will result in a PDF with a single layer and no hidden objects. However, this will usually lower the quality and increase the file size of PDFs that are largely text.

Here is an example workflow using the Workfile API and the Content Conversion Service API:

1. Create a WorkFile from PDF

POST {{pccisUrl}}/PCCIS/V1/WorkFile
Content-Type: application/octet-stream

{{file bytes}}

2. Initiate Conversion to TIFF

POST {{pccisUrl}}/v2/contentConverters
Content-Type: application/json

{
    "input": {
        "sources": [
            {
                "fileId": "{{fileId}}"
            }
        ],
        "dest": {
            "format": "tiff"
        }
    }
}

3. Poll until response[“state”] === “complete”

GET {{pccisUrl}}/v2/contentConverters/{{processId}}

4. Initiate Conversion from TIFF back to PDF

POST {{pccisUrl}}/v2/contentConverters
Content-Type: application/json

{
    "input": {
        "sources": [
            {
                "fileId": "{{fileId_from_Step3_output}}"
            }
        ],
        "dest": {
            "format": "pdf"
        }
    }
}

5. Poll again

GET {{pccisUrl}}/v2/contentConverters/{{processId}}

6. Download

GET {{pccisUrl}}/PCCIS/V1/WorkFile/{{fileId}}?ContentDispositionFileName={{desiredFileNameWithExtension}}
Question

How can I improve the performance and memory usage of scanning/recognition in Barcode Xpress?

Answer

Barcode Xpress supports a number of optimization settings that can improve your recognition performance, sometimes up to 40%, along with memory usage. The best way to optimize Barcode Xpress is to fine-tune the properties of the Reader class to be specific to your application’s requirements.

BarcodeTypes

  • The best way to increase performance is to limit which barcodes Barcode Xpress should search for. By default, BarcodeTypes is set to UnknownBarcode which targets all 1D barcodes.

MaximumBarcodes

  • This property will instruct Barcode Xpress to halt searching after finding a specified number of barcodes. The default value is 100.

Area & Orientation

  • If you know the location or orientation of your barcodes in your image, specifying an orientation (such as Horizontal) and area can prevent Barcode Xpress from searching for vertical or diagonal barcodes, or in places where barcodes would not exist.

ScanDistance

  • Raising this value increases performance by applying looser recognition techniques by skipping rows of an image. However, this may fail to detect barcodes.

Finally, BarcodeXpress Professional edition does not impose a 40 page-per-minute limit on processing.

Question

How can I improve the performance and memory usage of scanning/recognition in Barcode Xpress?

Answer

Barcode Xpress supports a number of optimization settings that can improve your recognition performance, sometimes up to 40%, along with memory usage. The best way to optimize Barcode Xpress is to fine-tune the properties of the Reader class to be specific to your application’s requirements.

BarcodeTypes

  • The best way to increase performance is to limit which barcodes Barcode Xpress should search for. By default, BarcodeTypes is set to UnknownBarcode which targets all 1D barcodes.

MaximumBarcodes

  • This property will instruct Barcode Xpress to halt searching after finding a specified number of barcodes. The default value is 100.

Area & Orientation

  • If you know the location or orientation of your barcodes in your image, specifying an orientation (such as Horizontal) and area can prevent Barcode Xpress from searching for vertical or diagonal barcodes, or in places where barcodes would not exist.

ScanDistance

  • Raising this value increases performance by applying looser recognition techniques by skipping rows of an image. However, this may fail to detect barcodes.

Finally, BarcodeXpress Professional edition does not impose a 40 page-per-minute limit on processing.