Technical FAQs

Question

I want to re-arrange the page order of a PDF. I’ve tried the following…

var page = imGearDocument.Pages[indx].Clone();

imGearDocument.Pages.RemoveAt(indx); //// Exception: "One or more pages are in use and could not be deleted."

imGearDocument.Pages.Insert(newIndx, page);

But an exception is thrown. Somehow, even though the page was cloned, the exception states that the page can’t be removed because it’s still in use.

What am I doing wrong here?

Answer

If you’re using an older version of ImageGear .NET, you may run into this exception when you clone the page. Some of the resources between the original and the clone are still shared, which is why this happens.

Starting with ImageGear .NET v24.8, this no longer happens, and the above code should work fine.

If you still need to use the earlier version, you can use the InsertPages method instead.

Question

When using OCR in ImageGear .NET, is there any way to distinguish between a capital/uppercase letter O and the number 0?

Answer

Not without context or a font that makes the difference clear (such as one with a slashed 0). ImageGear will properly recognize Oliver and 1530 as containing O and 0, respectively, but cannot reliably distinguish it when letters and numbers are mixed. That is, ImageGear may not reliably distinguish between 1ABO0F3 and 1AB0OF3.

In part one of our series on how APIs are empowering a new generation of LegalTech solutions, we looked at some of the technology obstacles facing today’s legal organizations. We also covered the basic principles of how API integration works and how it can bridge the gap between legacy systems and new applications. In part two, we’ll be taking a closer look at some specific API integration use cases and explain why they’re an essential part of a successful firm’s LegalTech toolkit.

4 Benefits of APIs for Legal Teams

Before going into more detail about how LegalTech API integration works, it’s worth highlighting the broad benefits APIs can deliver to a law firm. 

1. Streamlined Workflows

The average legal department relies on more than one software solution to meet their business needs. While larger firms that provide a broad range of legal services typically require more specialized platforms, even smaller legal teams deploy different software applications to address different business needs.

Thanks to API functionality, these diverse LegalTech solutions can be integrated into a single, platform-agnostic portal that eliminates the workflow disruption caused by constantly switching back and forth between programs. 

2. Remote Functionality

API integrations also make it possible for lawyers to access an assortment of LegalTech tools from any location, even if they can’t physically be at their offices. This capability is more important than ever as the legal profession continues to grapple with the impact of the COVID 19 pandemic. 

Many lawyers are still working from home and communicating with their clients and colleagues remotely. In some states, virtual court proceedings might remain in use even after the pandemic. If law practices aren’t able to function effectively in a remote context, they will struggle to deliver quality legal services to their clients.

3. Competitive Advantage

Managing multiple technology resources and facilitating remote collaboration isn’t just about making work easier for legal teams. Streamlining workflows results in greater efficiency, which means lawyers can spend more time doing high-value work for their clients rather than sorting out technical issues or tracking down hard-to-find documents and files. 

It also translates into reduced costs, since key administrative functions can be automated and carried out both faster and more accurately. Law firms that invest in technology integration can deliver better services to their clients at lower costs than their competition while still retaining the flexibility to adapt to future disruptions.

4. Enhanced Security

By its very nature, the legal industry ends up handling a great deal of sensitive information. Financial records, contracts, protected health data, and private correspondence are frequently relevant to legal proceedings of all kinds. There’s also the matter of attorney-client privilege, which greatly restricts what information can be shared outside the firm. 

Without a way to securely manage files and documents, law firms leave themselves exposed to significant liability. Thanks to API integrations, attorneys can use their existing LegalTech solutions to access, share, and edit essential files safely and securely. 

PrizmDoc Viewer: LegalTech API Integration in Practice

For a better understanding of how API integration can enhance the performance of LegalTech applications, it’s instructive to look at some specific examples. Accusoft’s PrizmDoc Viewer uses a powerful collection of REST APIs to provide HTML5 document viewing functionality through a single interface. It not only allows LegalTech developers to quickly and easily integrate document viewing capabilities into their applications, but it also delivers several additional features that are particularly relevant to the legal industry’s eDiscovery process

Document Conversion

The digitization of the discovery process has made it easier for legal organizations to share documents and back up important data. Unfortunately, it’s also created a huge glut of electronically stored information (ESI) in a variety of formats. In addition to the large number of commonly used file formats (such as DOCX, PDF, and JPEG), firms must also deal with a variety of proprietary file formats and case-specific formats (like DICOM for healthcare clients). PrizmDoc Viewer uses an array of APIs to convert more than 100 file formats for easy presentation within a browser-based HTML5 viewer. It can also convert image-based documents into searchable PDFs or editable text files with a built-in OCR engine. Thanks to this integration, attorneys can quickly share and view documents internally or with clients and the court without having to download and install specialized applications.

Annotation

The ability to annotate and markup documents is essential for any collaborative legal process. Although many platforms make it easy to insert comments and edits into documents, these programs often don’t support more than a handful of file types and alter the original file when making annotations. PrizmDoc Viewer’s annotation functionality supports over 100 file types and allows multiple users to make layered edits that can be easily shown or hidden. More importantly, all markups exist on top of the original document, preserving the integrity of the original file to comply with state and federal preservation of data requirements. When the time comes to present documents, annotations can be burned into the file if necessary. 

Redaction

Sharing documents is always a sensitive process in the legal profession. Information may be protected by attorney-client privilege, disclosure agreements, contractual obligations, or government regulations. LegalTech applications need to be able to redact sensitive data when sharing documents with outside parties. PrizmDoc Viewer’s REST API allows users to manually redact individual sections, use search features to redact specific terms, or even programmatically redact data for pre-determined reasons (such as account numbers or Social Security numbers). Redacted content is not only hidden from view, but no longer shows up in search results and cannot be copied or highlighted.

Security

As mentioned previously, security should be a key consideration for any LegalTech solution. Firms need to strictly control who has access to data and confidential documents, whether that consists of sensitive client information or internal litigation strategies. PrizmDoc Viewer provides a few key features to help LegalTech applications maintain high levels of security.

  • DRM: Digital rights management (DRM) controls can manage who has access to documents and what functions they can use (such as printing, downloading, or viewing). This makes it easy to restrict how files are shared and track any document leaks back to their source to hold the responsible parties accountable.
  • Watermarking: PrizmDoc Viewer can hard code identifying information into documents to prove ownership and prevent the unauthorized reproduction of documentation.
  • Encryption: With so many people working remotely from potentially unsecured Internet connections, file encryption is absolutely essential for any LegalTech application. PrizmDoc Viewer uses 256-bit AES content encryption to ensure that documents remain secure throughout the collaboration process.

Transform Your LegalTech Strategy with API Integration

As we covered in part one, many legal organizations cling to outdated processes and technology due to familiarity and deeply-ingrained status quo bias. But familiar doesn’t always mean functional. Over reliance on manual processes exposes firms to increased human error and a range of potential data security risks, to say nothing of undercutting productivity. Advanced APIs offer a new tactical toolkit, a way to select best-fit code that solves specific issues and helps legal firms improve operational outcomes. Learn more about how Accusoft’s PrizmDoc Viewer can unlock the full potential of your LegalTech applications today.

When it comes to downloading or viewing documents over the internet, PDFs have long served as a de facto standard for most organizations. Since PDFs are not a proprietary file format, there’s rarely any risk that someone will be unable to open them. However, just because PDFs have become so commonplace doesn’t mean that they all share the same characteristics. For anyone who has ever wondered why some PDFs seem to take so much longer to load than others, the answer often has less to do with connection and processing speeds as it does with the way the PDF’s content is organized.

More specifically, it’s a matter of whether or not the document is a linearized PDF.

What Is a Linearized PDF?

Sometimes called “fast web view,” linearization is a special way of saving a PDF file that organizes its internal components to make them easier to read when the file is streamed over a network connection. While a standard, non-linearized PDF stores information associated with each page across the entire file, linearized PDFs use an object tree format to consolidate page elements in an ordered, page by page basis. When a reader opens a linearized PDF, then, all of the information needed to render the first page is readily available, allowing it to load the page quickly without having to search the entire document for a specific object like an embedded font.

Originally introduced with the PDF 1.2 standard in 1996, linearized PDFs were critical to the format’s early internet success. In order to view a non-linearized PDF, the entire document needs to be downloaded or read via HTTP request-response transactions. Given the bandwidth limitations of early internet connections (often still between 28.8k and 33.6k in 1996), this created a serious bottleneck problem when it came to document viewing. While it was possible to view a document without downloading it, the multiple HTTP requests needed to do so could easily be disrupted if the connection was lost, something that was all too common in the days before reliable broadband connections were introduced.

Non-Linearized vs Linearized PDFs

To visualize the difference between a non-linearized PDF and a linearized PDF, imagine two separate people sitting down to file their business taxes. One person has all of their receipts, invoices, and financial documents scattered across their office, with some stacked in unordered piles, others crammed into unlabeled folders, and even more stuffed into assorted drawers and file cabinets. Finding and organizing all of this documentation would take almost as much time as actually filing the taxes themselves! The second person, however, has all of the records they need stored in a neatly labeled file cabinet, allowing them to retrieve everything quickly and easily.

The first example is similar to a non-linearized PDF, while the second shows how much easier it is for a reader to access the information it needs to render the file. Even better, since each page is organized in the same way, jumping to a different page in a multi-page PDF doesn’t require the reader to reload the entire file. It can simply read the current page and get everything necessary to display the PDF correctly.

Why Linearized PDFs Are Still Valuable

In a world dominated by high speed internet connections, it’s fair to wonder whether or not PDF linearization is still necessary. For small PDFs that are only a few pages, linearization may not be essential, but when it comes to larger documents, linearization can still deliver substantial performance and user experience benefits.

Consider, for instance, a document that consists of several hundred, or even several thousand, pages. Loading that entire document and keeping it cached may be possible, but it’s an inefficient use of processing and bandwidth resources. With a linearized PDF, a reader typically encounters a linearization directory and hint tables at the top of the document, which provides it with instructions on where to locate any necessary resources within the file. After loading the hint tables and the first page, the reader stops the download process rather than opening the entire file. When the user navigates to another page, the reader can quickly reference the hint tables and jump to that page.

This ensures that the reader is only ever loading the pages that actually need to be displayed, which helps to conserve memory, processing resources, and bandwidth. For mobile devices with limited file and cache storage, linearized PDFs are much easier to manage than their non-linearized counterparts. They also provide some protection against network interruptions, which could make it difficult to download and view an entire document.

How to Linearize PDFs

Although the linearization process is well laid out in the current PDF standards documentation, many PDFs are created using software that doesn’t automatically linearize the content. More importantly, some linearized PDFs are “broken” by a process called incremental saving, which saves minor updates at the end of the file, rather than changing existing structure. Over time, too much incremental saving can undermine the effectiveness of a linearized PDF.

The best way to resolve such problems and linearize the PDF is to save a new, linearized version of the file using PDF editing and conversion tools.

Take Control of PDFs with PrizmDoc

Accusoft’s PrizmDoc provides a broad range of document functionality that allows applications to more effectively create, convert, and compress PDF files.

For a closer look at PrizmDoc and to see its powerful document processing capabilities in action, download a free trial today.

SmartZone powershell
 

Continuous innovation has allowed Accusoft to build sustained success over the course of three decades. Much of that innovation comes from talented developers creating novel solutions to everyday problems, many of which go on to become patented technologies that provide the company with an edge over competitors. 

Others, however, are the byproduct of looking at problems from a different perspective or using existing technologies in unique ways. Accusoft supports both approaches by hosting special “hackathon” events each year. These events encourage developers to spend time working on their own unique projects or try out ideas they think may have potential but have never been implemented.

For this year’s hackathon, I took a closer look at how our SmartZone SDK could be implemented as part of an automation solution within a .NET environment without creating an entire application from the ground up. What I discovered was that PowerShell modules offer a quick and easy way to deploy character recognition for limited, unique use cases.

.NET and PowerShell

One of the underestimated abilities of the .NET infrastructure is support loading and executing assemblies out of box from the command line using a shell module. Although there are many shell variants available, PowerShell comes preinstalled on most Windows machines and is the only tool required to make the scripts and keep them running. PowerShell also runs on Linux and macOS, which makes it a true cross-platform task automation solution for inventive developers who crave flexibility in their scripting tools. 

Incorporating the best features of other popular shells, PowerShell consists of a command-line shell, a scripting language, and a configuration management framework. One of the unique features of PowerShell, however, is that unlike most shells which can only accept and return text, it can do the same with .NET objects. This means PowerShell modules can be used to build, test, and deploy solutions as well as manage any technology as part of an extensible automation platform.

Implementing SmartZone Character Recognition

Accusoft’s SmartZone technology allows developers to incorporate advanced zonal character recognition to capture both machine-printed and hand-printed data from document fields. It also supports full page optical character recognition (OCR) and allows developers to set confidence values to determine when manual review of recognition results are necessary. 

Implementing those features into an application through a third-party integration is the best way to incorporate recognition capabilities, but there are some use cases where they might need to be used for general tasks outside of a conventional workflow. A number of Accusoft customers, for instance, had inquired about simple ways to use some of SmartZone’s features in their existing process automation software without having to spend weeks of development time integrating those capabilities on a larger scale.

Thanks to the versatility of PowerShell, there’s no reason to build such an application from scratch. SmartZone’s zonal recognition technology can easily be incorporated into any .NET environment with just a few snippets of code. PowerShell syntax itself is not very difficult to understand and for a quick start it should be enough to use a Windows Notepad application, but we recommend using your favorite integrated development environment (IDE) for a better experience.

Getting Started

First, you need to download SmartZoneV7.0DotNet-AnyCPU.zip from the Accusoft SmartZone download page and unpack it to any suitable directory. This bundle contains all required binaries to run SmartZone.

Create a Simple.ps1 file inside the unpacked directory and start typing your script:


using namespace System.Drawing
using namespace System.Reflection
using namespace Accusoft.SmartZoneOCRSdk

# Load assemblies.
Add-Type -AssemblyName System.Drawing
$szPath = Resolve-Path ".\bin\netstandard2.0\Accusoft.SmartZoneOCR.Net.dll"
[Assembly]::LoadFrom($szPath)

# Create a SmartZone instance.
$szObj = [SmartZoneOCR]::new()
$szAssetsPath = Resolve-Path ".\bin\assets"
$szObj.OCRDataPath = $szAssetsPath.Path

# Licensing
# $szObj.Licensing.SetSolutionName("Contact Accusoft for getting the license.")
# $szObj.Licensing.SetSolutionKey(+1, 800, 875, 7009)
# $szObj.Licensing.SetOEMLicenseKey("https://www.accusoft.com/company/legal/licensing/");

# Load test image.
$bitmapPath = Resolve-Path ".\demos\images\OCR\MultiLine.bmp"
[Bitmap] $bitmap = [Image]::FromFile($bitmapPath.Path)

# Recognize the image and print the result.
$result = $szObj.Reader.AnalyzeField([Bitmap] $bitmap);
Write-Host $result.Text

# Free the resources.
$bitmap.Dispose();
$szObj.Dispose();


This simple code snippet allows you to use SmartZone together with PowerShell in task automation processes like recognizing screenshots, email attachments, and images downloaded by the web browser. It can also be deployed in other similar cases where the advantages of PowerShell modules and cmdlets can help to achieve results faster than writing an application from scratch.

Another Hackathon Success

Identifying a new way to deploy existing Accusoft solutions is one of the reasons why the hackathon event was first created. This script may not reinvent the wheel, but it will help developers save time and money in a lot of situations, which means fewer missed deadlines and faster time to market for software products. Developing unique approaches to existing problems can be difficult with deadlines and coding demands hanging over a developer’s head, so Accusoft’s hackathons are incredibly important for helping the company stay at the forefront of innovation. 

To learn more about how that innovation can help your team implement powerful new features into your applications, talk to one of our solutions experts today!

Question

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

All PDF objects are based on underlying low-level PDF objects that are not controlled by .NET resource manager and garbage collector. Because of this, each PDF object that is created from scratch should be explicitly disposed of using that object’s Dispose() method.

Also, any ImGearPDEContent object obtained from ImGearPDFPage should be released using the ImGearPDFPage.ReleaseContent() in all cases.

This should cause all temp files to be cleared when the application is closed.

Question

How do I ensure temp files are deleted when closing ImageGear .NET?

Answer

All PDF objects are based on underlying low-level PDF objects that are not controlled by .NET resource manager and garbage collector. Because of this, each PDF object that is created from scratch should be explicitly disposed of using that object’s Dispose() method.

Also, any ImGearPDEContent object obtained from ImGearPDFPage should be released using the ImGearPDFPage.ReleaseContent() in all cases.

This should cause all temp files to be cleared when the application is closed.

Question

How do I remove XMP Data from my image using ImageGear .NET?

Answer

When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:

ImGearSimplifiedMetadata.Initialize(); 
doc.Metadata.XMP = new ImGearXMPMetadataRoot();

Or, you can traverse through the metadata tree and remove each node from the tree:

// Example code. Not thoroughly tested
private static void RemoveXmp(ImGearMetadataTree tree)
{
ArrayList toRemove = new ArrayList();
foreach (ImGearMetadataNode node in tree.Children)
{
    if (node is ImGearMetadataTree)
        RemoveXmp((ImGearMetadataTree)node);

    if (node.Format != ImGearMetadataFormats.XMP)
        continue;

    toRemove.Add(node);
}

foreach (ImGearMetadataNode node in toRemove)
    tree.Children.Remove(node);
}
Question

How do I remove XMP Data from my image using ImageGear .NET?

Answer

When removing XMP data in ImageGear, the simplest way to do this is to set the XMP Metadata node to null, like so:

ImGearSimplifiedMetadata.Initialize(); 
doc.Metadata.XMP = new ImGearXMPMetadataRoot();

Or, you can traverse through the metadata tree and remove each node from the tree:

// Example code. Not thoroughly tested
private static void RemoveXmp(ImGearMetadataTree tree)
{
ArrayList toRemove = new ArrayList();
foreach (ImGearMetadataNode node in tree.Children)
{
    if (node is ImGearMetadataTree)
        RemoveXmp((ImGearMetadataTree)node);

    if (node.Format != ImGearMetadataFormats.XMP)
        continue;

    toRemove.Add(node);
}

foreach (ImGearMetadataNode node in toRemove)
    tree.Children.Remove(node);
}

spreadsheet security

Few document formats are more common than XLSX spreadsheet files. Although many alternatives are available, most enterprises continue to rely on the broad (and familiar) functionality of Microsoft Excel when it comes to their spreadsheet needs. However, few organizations take the appropriate steps to ensure Excel spreadsheet security, which could leave their private data and formula assets exposed to substantial risk.

As a third party dependency, Excel represents an obvious security gap that could easily be exploited. Any time a file travels outside a secure application environment, there is a potential risk of data theft and version confusion. In any situation where files are travelling between separate applications, there is also an opportunity for malicious files to slip into unsuspecting workflows. By focusing on ways to shore up their Excel spreadsheet security, organizations can minimize risk and protect their sensitive data.

Excel Spreadsheet Security Risk #1: Malicious File Extensions

Most organizations are aware that opening a file attached to an email is one of the most common ways to introduce malware into a system. What they may not realize, however, is just how pervasive the problem is or how well those files are masked. It’s easy to identify a malicious email attachment when its name is a jumble of letters and it has an unfamiliar file extension. The real threat comes when it actually resembles something familiar and potentially legitimate.

Unfortunately, XLSX spreadsheet files are frequently used to distribute malware. According to a comprehensive cybersecurity study conducted by Cisco in 2018, Microsoft Office file extensions (such as DOCX and XLSX) were used by 38 percent of malicious email attachments, higher than any other format. These extensions are attractive to cybercriminals precisely because they’re so widely used. Someone working in a financial services organization, for instance, is usually quite accustomed to sending and receiving spreadsheets via email, so they are more likely to open an XLSX file out of curiosity.

Of course, this raises a separate question about basic cybersecurity. No organization today should be relying on poorly secured channels like email to share sensitive documents in the first place. By integrating native XLSX viewing and editing capabilities directly into their web applications, developers can provide the tools necessary to share spreadsheets without the risk of exposing collaborators to malicious file extensions. Embedding spreadsheet files into the application allows for easy access, but also keeps the file safely within a secure environment. Once users become accustomed to accessing spreadsheets this way, they’ll be less likely to fall prey to a malicious XLSX extension in their email. 

Excel Spreadsheet Security Risk #2: Insufficient Access Control

Spreadsheets can contain a great deal of information. Not only do they make it easy to reference data and carry out complex calculations in seconds, there’s a lot happening behind the scenes that may not be immediately obvious to the average user. Spreadsheet cells typically incorporate highly detailed (and often proprietary) formulas that help organizations to estimate costs, assess risk, and adjust revenue forecasts. For many industries, there’s simply no software that can compete with the extensive capabilities of spreadsheets.

But that versatility comes with a cost. Any user with a rudimentary knowledge of spreadsheets can easily reveal hidden information and examine the formulas behind the document’s calculations. And once they’ve downloaded their own copy of the spreadsheet, there’s nothing to prevent them from using it elsewhere, which can be a serious problem for any organization that depends upon its proprietary formulas to drive business success.

The root problem in this case comes down to who has control over the spreadsheet. When an XLSX file is shared, it can then be copied or even altered without the knowledge or permission of its original owner. The best way to maintain control over spreadsheets is to integrate native XLSX viewing capabilities directly into a web application. This allows developers to control which elements of the spreadsheet are being shared and prevents anyone from downloading a copy without permission. Since users can only interact with the spreadsheet on the terms set by the file’s owner, they can’t peek “under the hood” to obtain proprietary assets like cell formulas.

Secure Your Spreadsheets with PrizmDoc Cells

Accusoft’s PrizmDoc Cells is a powerful API integration that allows developers to provide dynamic spreadsheet viewing and editing capabilities within their web application environment. Far more versatile than traditional viewer integrations that offer only a static “print preview” image of a spreadsheet, PrizmDoc Cells makes it possible to scroll both vertically and horizontally and even enter information into cells to perform calculations. It’s the most secure way to provide access to spreadsheet resources without sacrificing control over editing permissions. And since the XLSX file never has to travel beyond a secure application environment, there’s no need to worry about malicious file extensions when sharing spreadsheets.

Developers can use PrizmDoc Cells’s whitelabeling features to customize its look and functionality within their application. From editing cell content and format to embedding graphics, they retain complete control over the way viewers interact with spreadsheet files to maximize security and protect vital proprietary information. To learn more about how PrizmDoc Cells can enhance Excel spreadsheet security within your application, visit our product page to explore this powerful integration’s features.

TAMPA, Fla. – On September 22, 2020, Accusoft announced its latest SDK, ImageGear PDF. This integration enables developers to add a variety of PDF functionalities into an application.

“We are proud to add ImageGear PDF as the latest addition to our product portfolio,” says Jack Berlin, CEO of Accusoft. “We recognized a need in the market for a more robust PDF solution that developers could use to enhance their products. Using our proprietary technology, I knew we could bridge that gap.”

ImageGear PDF gives end-users the ability to merge multiple PDFs, split a PDF into multiple PDFs, rearrange pages within a PDF, add pages or remove pages in a PDF, and more. The SDK adds programmatic annotation capabilities as well as compression, signature, comparison, and data capture.

“ImageGear PDF is a great tool for developers looking to enhance their application,” says Mark Hansen, Sr. Product Manager of SDKs. “Accusoft has a variety of different PDF solutions, but we wanted to add a more robust SDK that solves PDF pain points more efficiently.”

ImageGear PDF is available with an optical character recognition (OCR) add-on feature, which programmers can use to search for specific characters within a document, highlight different sections, and markup the output for easier viewing and collaboration. To learn more about ImageGear PDF, please visit our website at accusoft.com/products/imagegear-collection/imagegear-pdf/.

About Accusoft:

Founded in 1991, Accusoft is a software development company specializing in content processing, conversion, and automation solutions. From out-of-the-box and configurable applications to APIs built for developers, Accusoft software enables users to solve their most complex workflow challenges and gain insights from content in any format, on any device. Backed by 40 patents, the company’s flagship products, including OnTask, PrizmDoc™ Viewer, and ImageGear, are designed to improve productivity, provide actionable data, and deliver results that matter. The Accusoft team is dedicated to continuous innovation through customer-centric product development, new version release, and a passion for understanding industry trends that drive consumer demand. Visit us at www.accusoft.com.