Skip to main content

How PDF Redaction Tools Can Secure Documents

Anyone who has watched a thriller about government secrecy probably has an image in mind about what it means to redact a document. That picture usually involves piles of classified pages with entire paragraphs blotted out with black marker. At some point, a character holds a sheet up to a light and finds a spot where the redacted text is just barely visible enough to provide them with the next clue that moves the story forward. They may even use some special form of scanner that allows them to see the hidden material.

Such scenes reveal the fundamental problem with text redaction. As long as the content remains present, there might be some way of making it visible again, which presents serious problems in terms of privacy and security. The transition to purely digital documents should have made these concerns a thing of the past. Unfortunately, too many people fail to take advantage of PDF redaction tools and leave their confidential material dangerously exposed.

PDFs Are Not Like Physical Documents

In 2016, Democrats in the U.S. House of Representatives made the embarrassing mistake of releasing a cache of documents that contained improper redactions. Journalists easily found what was hidden beneath the black markings by copying the PDF text and pasting it into another document, which instantly revealed the redacted material.

This was not the first time government officials, or other organizations, released improperly redacted documents. Part of the reason why this mistake keeps happening is that people frequently apply the same practices used with physical documents to digital documents. It’s a simple matter to use shapes or drawing tools to obscure text in a PDF, but doing so only hides the content from view rather than removing it altogether.

As the “copy and paste” trick described above shows, it’s often trivially easy to bypass such “redactions.” That’s because a PDF document is not like a physical, printed document, even though it resembles one in a viewer. A PDF consists of multiple layers, as well as extensive metadata that isn’t visible. Adding a black box over text simply adds another layer to the document. Accessing the layer of text information underneath is quite simple, even with relatively basic software tools.

Redacting Content from Electronic Documents

The first step in true redaction involves the removal of selected content entirely. This ensures that even if someone is able to extract the text layer from the document, the redacted portions will not become visible when pasted elsewhere.

However, even removing the visible text itself may not be enough to protect confidential information. That’s because there may be some data remaining in the document that could contain information about how to render the redacted portions. While it would be possible to avoid this problem by converting a PDF to a bitmap image, removing the portions to be redacted, and then building an entirely new document using OCR, this process is time consuming and difficult to scale.

Using PDF Redaction Tools in PrizmDoc Viewer

A much more efficient approach would be to utilize dedicated PDF redaction tools like those built into PrizmDoc Viewer. Thanks to a sophisticated and intuitive API, PrizmDoc allows users to perform a number of redaction functions within its easy-to-use HTML5 viewer:

  • Add individual redactions by selecting text, applying a redaction rectangle, or marking out the whole page.
  • Perform a search for specific terms and apply redactions to each instance.
  • Add redaction layers to a document that can be saved and edited during preparation.
  • Apply redaction reasons to explain why certain content has been removed.

When integrating PrizmDoc Viewer into their applications, developers can also customize the HTML5 viewer to apply predefined redactions, preload entire redaction layers, or create unique redactions programmatically. This is especially useful for high-volume document workflows that need to identify and remove commonly used private data like Social Security numbers, contact information, and financial information.

PrizmDoc Viewer’s redaction API strips out all information associated with the redacted material from the document. That means any removed content isn’t just no longer visible; it also can’t be highlighted, copied, searched, or indexed because it’s no longer present in any way. Remaining text content, however, is still readily available. Even better, sharing documents through the HTML5 viewer also hides metadata that could contain sensitive information.

When redactions are made, PrizmDoc Viewer allows users to indicate the reasons for these removals. This is especially important for transparency purposes when working with government documents. The redaction API supports single and multiple redaction reasons for improved clarity.

Of course, most organizations still need to retain access to unredacted documents for internal use. That’s why PrizmDoc Viewer retains an unaltered version of the document safely uploaded to the server. The actual redacted document is a new file with all redacted content removed. Users can then use PrizmDoc Viewer’s sharing controls to further manage access to the file.

Redact Your Documents the Right Way

Today’s applications can’t afford to take redaction lightly. Whether they’re building the next generation of government technologies or LegalTech applications, developers need to provide their customers with the ability to easily screen documents to protect sensitive and private information from being exposed. By integrating viewing and document editing solutions with PDF redaction tools, they can help organizations take control over document security and avoid embarrassing redaction mistakes that could expose them to severe liability.

PrizmDoc Viewer’s versatile HTML5 viewing capabilities leverage powerful APIs to easily incorporate document redaction into application workflows. With just a simple API call, users can quickly locate and remove information from documents before sharing them with anyone outside the organization. To see PrizmDoc Viewer’s PDF redaction tools first hand, check out our interactive online demo today.