PII Security: How to Redact Personal Identifiable Information From Documents

In today’s information-rich digital world, every organization must think of itself as a data company in order to build sustainable success. Without strong controls and software solutions in place to safeguard sensitive data, companies run the risk of unintentionally exposing private data to the public. Applications with document redaction capabilities are an important tool for helping to improve PII security, safeguard privacy, redact personal information and sensitive information.

PII Security

What Is PII?

Any information that can be used to accurately identify a specific individual is classified as personally identifiable information (PII). Classic forms of PII include Social Security numbers, mailing addresses, email addresses, and phone numbers. The category continues to expand as organizations collect additional forms of information, such as account log-in IDs, biometric records, and geolocation data.

There are two broad categories of PII: sensitive and non-sensitive. Sensitive PII consists of information that could directly identify a person on their own. Non-sensitive PII refers to data that would need to be combined with other sources to identify someone. A good example of non-sensitive PII is commonly available information like date of birth, gender, race, or zip code.

Managing PII Security

Managing PII

Personal information is vital for delivering quality services in many industries, whether it’s medical information stored in electronic health record (EHR) systems, financial data held by financial services organizations, or personal data used by insurance underwriters to set rates. Organizations are legally obligated to have the controls in place to manage PII and prevent it from being exposed to unauthorized access.

In recent years, sweeping legislation like the EU’s General Data Protection Regulation (GDPR) and California’s California Consumer Privacy Act (CCPA) have sought to establish clear restrictions on the gathering and usage of personal information. Existing compliance regimes like the Health Insurance Portability and Accountability Act (HIPAA) and various Service and Organization Controls (SOC) also require organizations to establish the necessary controls and procedures to manage personal data as safely as possible.

Privacy protection is a fundamental element of cybersecurity, so it’s no surprise that most applications have features in place to keep sensitive customer data secure. Companies can strengthen those safeguards by ensuring that employees are well-educated about their responsibilities when it comes to PII. With more organizations shifting to a predominantly remote workplace, these efforts are even more important. Even the most secure platform won’t be able to keep private data secure if an employee unwittingly downloads or emails a document containing sensitive personal information.

Redact Personal information

PII Security in Documents

Much of the data organizations collect from customers is drawn from documents, specifically from structured forms. From loan applications and health records to tax returns and insurance claims, people frequently enter their personal information into various types of forms that are then collected and processed by software applications. Once that data is gathered via forms processing or optical character recognition (OCR), it finds its way into other databases and workflows.

Between the original forms documents and new documents generated within an application (such as contracts, patient records, and insurance policies), PII ends up appearing on a variety of files. So long as the documents remain stored within the application and access privileges are restricted, they don’t represent an additional security risk. Additional steps need to be taken when those documents are either removed from that environment or shared with someone who is not authorized to view the sensitive data they contain.

That’s when document redaction tools come into play.

Redact Personal information

How NOT to Redact PII From Documents

Organizations have long relied on redaction to protect privacy, but the transition to predominantly digital documents has brought with it a number of challenges. Redacting a physical document typically involves little more than covering the sensitive text with black marker. While covering text in an electronic document is quite simple, this is only a temporary solution that does nothing to protect privacy.

Consider how some two commonly-utilized “redaction” techniques fall well short of security requirements:

Black Boxes

Most document annotation tools give users the ability to draw solid shapes like rectangles. Placing a black box over PII may appear to hide the sensitive text, but it actually only temporarily hides it. Simply opening the document in another application could give someone the ability to remove the annotation element. Even worse, since the text is still present underneath the annotation, it’s possible to simply copy the text and paste it into a fresh document. This simple process has actually been used by journalists to reveal information in poorly redacted government records. Just because the text is hidden doesn’t mean it’s gone!

Background Matching

Another ineffective redaction method involves altering the color of sensitive text to match the color of the document background. For a typical document, simply changing the text color to white renders it invisible to viewing. After the change is made, the document is then saved into another format (usually a PDF). Like the black box method, this approach assumes that the user will not be able to manipulate the text after the color is changed. Unfortunately, nothing could be further from the truth. Since the actual text is still present, it can be easily extracted and made visible again. The same copy/paste method is often all that’s needed to expose the obscured information.

PII Security - Redaction Software

The Value of PII Redaction Software

Most document redaction mistakes are caused by a failure to consider the nature of digital files. An electronic document is not like a piece of physical paper. It contains metadata information that can be used to render text even if that text is somehow obscured. That’s why true PII redaction requires specialized software capable of completely removing sensitive data from a document before applying the familiar redaction marks.

Although organizations sometimes use word processing solutions to obscure sensitive text and then convert the document into a flattened PDF, this is only effective if the document is being printed. An optical character recognition (OCR) solution could be used to extract the visible text from this document and create a new file, but this could quickly result in version confusion and would not provide any indications of why text was redacted.

A much more effective strategy involves the use of an HTML5 viewer with a dedicated redaction API capable of programmatic search and text removal. There are four key benefits to this approach:

Flexible Redaction Options

Individual redactions can be made by simply selecting text to apply a redaction rectangle or by selecting the entire page. The text is completely removed from the document and replaced with a redaction object (typically a black box).

Individual Redactions

PII Security: Robust Search Tools

Search tools can locate specific terms throughout a document and programmatically redact every instance of them. This is especially useful for lengthy documents.

Robust Search Tools

Redaction Layers

Multiple redaction layers can be added, saved, and edited by multiple collaborators while the document is being prepared.

Multiple redaction layers

Redaction Comments

Every redacted element can be marked with a redaction reason that indicates why it was removed from the document. This makes it much easier to identify where PII has been redacted as opposed to other sensitive information like proprietary data.

Redaction Explaination

Using a redaction API within an HTML5 viewer ensures that redacted PII cannot be recovered. All data associated with the redacted text is stripped out of the file, which makes it impossible for anyone to highlight, copy, search, or index sensitive material. While remaining text content can still be viewed and searched normally, the redacted content is simply gone.