Technical FAQs

Question

I have a PDF of a form that I’m sending to PrizmDoc to have it auto-detect, but PrizmDoc does not find any fields in the document. What would cause this?

Answer

Currently only PDF files with embedded AcroForms will be auto-detected. If the PDF document
has an embedded image of a form, PrizmDoc will not find any results from auto-detection.

Redacting documents is critically important for legal departments and government agencies. By removing sensitive information from a digital file before sharing it publicly, it’s possible to protect private data or classified materials from being exposed. 

In the days before digital documents, redaction involved a simple, if crude, process of covering text with a black marker. Since redactions were done by hand, it was easy for mistakes to be made, which could range from using insufficiently dark ink to leaving portions of text exposed. The development of high-powered photo enhancement has rendered this approach all but useless, as even inexpensive image processing technology can distinguish blacked-out text.

With the transition to digital documents, organizations finally have access to true redaction capabilities. Unfortunately, they still tend to make mistakes when it comes to flattened PDFs that could leave redacted context exposed and vulnerable.

What Is a Flattened PDF?

A modern PDF file consists of multiple layers, each of which can contain separate elements. One layer might feature text, another image, and yet another a fillable form. The flattening process removes all interactive elements from form fields and combines all of the document’s elements into a single layer. 

Organizations frequently used this process to “lock in” form content to prevent anyone from altering the information after a user completes the forms. It also removes elements like dropdown selections within form fields and can burn in other annotations or markups, making them a permanently visible element of the document.

Flattened PDF Redactions

Unfortunately, simply flattening a PDF is usually not sufficient to securely redact a document. That’s because obscured elements are still present in the document; they’re just not visible when the file is viewed and printed. 

Recovering improperly redacted content is actually quite trivial in many cases. Two of the most infamous recent examples include information released during the investigation of political campaign chairman Paul Manafort in 2019 and court documents related to Facebook’s use of personal data in 2017. In both cases, journalists were able to copy redacted text from PDF files and paste it into a text editor to reveal the obscured content.

There are typically two ways that improper redactions occur:

  1. Covering Text with Boxes: This frequent mistake occurs when people try to treat a digital document like a physical piece of paper. They place annotations over the sensitive content, usually in the form of a black box, and then save a flattened version of the PDF thinking that no one will be able to separate the text from the annotation element. As the Manafort and Facebook cases demonstrate, however, getting around these “redactions” is usually quite easy.
  2. Changing the Color of Text: Another common redaction error involves altering the color of the sensitive text to match the document background. Changing the text color to white, for instance, might make it invisible to the human eye, but it does nothing to alter the content itself. The text can be made visible again by using the copy/paste trick described above or by altering the background characteristics in another program. 

The only way to make these methods viable for true redactions would be to actually print the documents with the content hidden and then scan them back into digital form, where OCR could be used to reconstruct a new file. But even in this case, there’s a chance that a powerful OCR engine might be able to pick up the hidden elements.

Using Proper Redaction Prior to Flattening with PrizmDoc Viewer

In order to redact documents securely, applications need to have access to specialized redaction tools that are capable of actually removing content from the document itself before applying redaction indicators. PrizmDoc Viewer’s redaction API can find and extract key text while also providing single or multiple reasons for the removal. 

This not only allows organizations to redact documents quickly, but it also ensures that the redacted information won’t be exposed later because it no longer even exists within the document. More importantly, the outputted document is entirely new, so there is no deleted information to recover. 

While most people are familiar with the distinctive black bars that indicate redacted content, even this leaves behind significant context clues that could provide hints of what was removed. Consider, for instance, a document involving multiple parties where the names of conversation participants have been redacted.

The following information:

PDF Redaction

The length of the redaction, then, would at least indicate when the redaction did not involve one person or the other. There are also many instances involving government documents where the length of the redacted information in classified material might suggest its relevance or importance.

When it comes to GovTech applications that need to remove large portions of information for security reasons, it often helps to perform redaction BEFORE turning a document into a flattened PDF. The PrizmDoc Viewer redaction API can be used to quickly extract text from a document and then redact it as a plain text file

Unlike a static PDF document, plain text accounts for width variations, so all redactions can be replaced with a standardized <Text Redacted> marker that makes it impossible to know the length of the redacted content. The text could then be converted into a PDF after the redaction process is complete.

Take Control of PDFs with PrizmDoc Viewer

As a fully-featured HTML5 viewer, Accusoft’s PrizmDoc Viewer delivers powerful viewing, annotation, and conversion functionality to your web application. It provides a broad range of redaction capabilities that allow legal, financial, and government organizations to keep their sensitive data secure and protect their customers. 

By integrating these complex features into your applications, you can focus your development efforts on building the tools that set your solution apart from the competition while our proven technology powers your customers’ viewing and redaction needs. To learn more about PrizmDoc Viewer’s powerful capabilities, download a free trial and test how it can support and enhance your application.

Question

I have a PDF of a form that I’m sending to PrizmDoc to have it auto-detect, but PrizmDoc does not find any fields in the document. What would cause this?

Answer

Currently only PDF files with embedded AcroForms will be auto-detected. If the PDF document
has an embedded image of a form, PrizmDoc will not find any results from auto-detection.

barcodes enterprise content management system

Information is critically important for organizations of all sizes, but it’s especially vital for large enterprises. Without access to accurate data, it can be difficult for separate departments to coordinate efforts or for leadership to make informed decisions. Important files can quickly be lost in a complex web of IT systems, some of which may not even be able to directly communicate with each other. Developers have worked hard to address these challenges by building content management platforms that integrate various technology resources into a single system and provide a primary source of digital information.

What Is an Enterprise Content Management System?

Today’s enterprises have massive amounts of information at their disposal. Much of that data, however, is scattered across the organization in different repositories, folders, archives, and file shares. A great deal of valuable insights could be found there, including information about customers, market trends, and product feedback, but so long as it remains spread across different locations, it can be difficult to access and view in totality.

Enterprise content management (ECM) systems help organizations to create a more workable structure for business knowledge. By implementing document automation and data capture tools, they can quickly assess and process information flowing into the enterprise to identify its value and route it to the proper destination.

A typical ECM system uses a few key steps when processing incoming information. These steps form the basis of the enterprise’s document or content lifecycle:

  • Capture: First, the information needs to enter the system in some way. This usually takes the form of document files or images being uploaded into the ECM.
  • Manage: Documents and other files need to be identified and labeled for accurate storage and easy access. Simply uploading content into the system without doing anything to organize it quickly results in content chaos. 
  • Storage: Whether the ECM utilizes physical, on-premises storage or cloud-based storage (or some combination of the two), the system needs to use a clearly defined structure when saving content so it can be easily located in the future. A database should contain all the necessary metadata to indicate where each file is stored.
  • Retrieval: Without some way of easily retrieving the right information when it’s needed, an ECM system isn’t going to be able to reach its full potential. Stored documents and files need to be accessible quickly and easily so they can help to inform key business decisions.

Avoiding Content Chaos with Barcodes

Without some way of effectively tracking documents through an ECM, organizations can quickly fall prey to “content chaos,” in which there is an abundance of information available but no easy way to access the right content at the appropriate time. This can be particularly frustrating for an enterprise that already has effective data capture and file conversion capabilities in place because without an effective retrieval mechanism, a great deal of valuable information will often go unused or even unnoticed.

Fortunately, ECM developers can provide a simple solution to this problem by utilizing barcode recognition technology. Although barcodes have been a mainstay of inventory management for decades across many industries, they’re finding a new use case in document management systems.

Rather than manually indexing documents with alphanumeric account number strings, barcodes can be created and applied to documents at the point of capture and then automatically routed to the proper storage destination. Once the barcode is scanned, key information about the file is uploaded into the ECM database so it can be easily located and retrieved in the future.

Another key benefit of barcodes is their ability to link documents that need to be associated with one another as part of the same batch. When documents are captured and converted into a digital format, one or more barcodes can be assigned to them to indicate connections to other file types. That information will be uploaded into the ECM database when the barcodes are scanned, instantly creating a traceable record of where files are located. 

This is especially important for situations where different information types could be stored in different locations. For instance, architectural drawings for a project may be stored in one location, but financial documents related to the same project may be stored elsewhere. When one of the files is accessed, the ECM’s database will indicate that there are related files in other locations and provide a link to them. This is particularly important for large enterprises with content spread across multiple departments that could easily be overlooked.

Build a Better Enterprise Content Management System with Barcode Xpress

Accusoft’s Barcode Xpress SDK provides powerful barcode support that’s designed to address the specific needs of document barcodes. While some software is oriented toward retail or supply chain applications, Barcode Xpress is optimized for document management, which makes it ideal for ECM systems. The SDK’s barcode reader can accurately locate and decode multiple barcodes on each page at incredibly high speeds.

With support for over 30 unique barcode types, Barcode Xpress provides tremendous flexibility when it comes to content management. Developers can also generate and detect both 1D and 2D barcodes to create a diverse content ecosystem within their ECM platform. Barcode Xpress can easily identify and recognize barcodes no matter where they’re located (and oriented) on the page. It can even accurately register incomplete barcodes from just a few intact lines.

To learn more about how Barcode Xpress can enhance your enterprise content management system, download our detailed fact sheet for a closer look at the barcode SDK’s capabilities.

PDFs HTML embed

As digital processes become more commonplace, it’s more important than ever for organizations to have the tools in place to manage electronic documents effectively. The evolution of PDF viewing technology continues to provide new levels of flexibility for software applications. Now that HTML5 is capable of rendering PDF data within a conventional browser, developers are looking for new ways to make the viewing experience even more seamless. By embedding PDFs in HTML, they can continue to streamline document viewing and reduce the need for external software.

Why Embed a PDF in HTML?

Sharing a PDF online is far easier to do today than it was just a decade ago. For many years, the two most commonly used options were providing a link to download the file directly from a server or sending it as an attachment in an email. Once the file was downloaded, it could be opened and viewed with PDF reader software installed on a computer. This, of course, introduced numerous security risks that are associated with downloadable files and email attachments.

The widespread adoption of cloud storage has made it very convenient to share a PDF file and even manage who has access to it. And since most modern browsers can view PDFs without needing to download the file, providing a link is typically all that’s necessary to pass the file along.

While this solution is usually sufficient for the personal needs of an individual user, it’s not a practical option for even a small-scale business when it comes to public-facing document management. Organizations want to retain control over their files with respect to how they’re accessed and displayed. By embedding PDFs in HTML, they can keep their documents within their secure application environment where they have full control over how they’re managed, shared, and viewed. For developers looking to provide a seamless user experience, building options for embedded PDFs into their software is critically important.

The Value of an Integrated PDF Viewer

Since most modern browsers can utilize HTML5 to render PDF files, developers could lean on those capabilities without building a dedicated PDF viewer for their application. That decision will very quickly lead to some unpleasant complications, however. In the first place, they are leaving a lot to chance in terms of the viewing experience. Not every browser renders PDF files the same way, so it’s very possible that two different users could have two very different experiences when viewing a document. In some cases, that could mean nothing more than a missing font that’s replaced with an alternative. But in other cases, it could mean that the document doesn’t open at all or is missing important graphical elements.

This approach also forces users to make do with whatever PDF functionality is incorporated into their browser’s viewer. In most cases, that will mean subpar search performance, a lack of responsive mobile controls, and no annotation features. The browser may also have trouble with some of the less common PDF specifications, making it impossible for some users to even view a document.

By embedding a JavaScript-based PDF viewer into their application, developers can ensure that documents will display the correct way every time. Since the viewing is handled through a viewer embedded into the web application by default, it will be the same no matter what kind of browser or operating system is being used. A customizable viewer also allows developers to adjust the interface to permit or hide certain features, such as downloading or markup tools.

The open-source PDF.js library is a popular choice for many web applications, but it comes with a number of well-documented shortcomings. In addition to lacking key features like annotation, it also doesn’t support the entire PDF standard and does not provide a responsive UI for mobile devices. For developers looking to add more robust features, working with PDF.js often entails quite a bit of additional coding and engineering to build those capabilities from the ground up.

Embed PDFs in HTML with Accusoft PDF Viewer

Accusoft PDF Viewer takes the foundation of PDF.js and provides robust enhancements to meet the viewing needs of today’s applications. In addition to incredibly fast text search, expanded PDF standard support, and optimization for high-resolution displays, this lightweight SDK is also equipped with a responsive UI that adapts automatically to mobile screens. Developers can integrate essential mobile features like pinch to zoom quickly and easy, with no additional integrations or engineering required.

With no external dependencies or complicated server configurations, Accusoft PDF Viewer integrates into a web-based application with less than 10 lines of code. Once the viewer is in place, developers can embed PDFs in HTML and easily render them to provide a state-of-the-art PDF viewing experience regardless of the browser or device users have at their disposal. And since the UI can be customized to your application’s needs, there’s no reason to sacrifice control for the sake of viewing convenience.

Accusoft PDF Viewer is a JavaScript SDK that you can incorporate into your application environment quickly and easily to provide much greater viewing control and functionality than is possible with a standard browser viewer or base PDF.js library. If you’re planning to embed PDFs in HTML as part of your software solution, taking just a few moments to integrate versatile and responsive viewing tools can ensure a high-quality viewing experience. Download Accusoft PDF Viewer Standard Version today at no cost to see how easily it can transform your application’s HTML5 viewing potential.

For additional features like annotation, eSignature, and UI customization, contact one of our solutions experts to upgrade to Professional Version.


In government offices and data centers, document archives and data repositories are highly effective for organizing information which remains at rest. For those occasions when information workers simply need to read or review the contents of a file, repositories like SharePoint are ideal.  But reading or reviewing is not the whole range of government business processes.

Even repositories which are integrated with business applications which address business functions like human capital management, Freedom of Information (FoI) requests, or asset management are documents which rarely remain unmodified.

Yet when a public sector employee needs to modify, move, or manipulate a document to reflect an event which occurs in the real world, it requires a combination of workflow and analytics which exceed the capabilities of document filing and storage applications.

Document repositories provide limited value-add for government organizations which need to serve thousands of constituents, have annual budgets in the billions of dollars, and ongoing infrastructure projects which keep people and economies moving.

Custom development of these functions into your business applications demands significant time and technical resource investment. Pre-built supported APIs and SDKs reduce these overhead demands.

Here are five workflow functions which government organizations should add to the document management functionality of their business applications to get the best value from their historical and day-forward data.


 

1. Document Viewing, Annotation, and Redaction

Accusoft’s PrizmDoc Viewer enables users to view, annotate, and redact information without needing third-party applications like Word or Adobe Acrobat Reader. This enhances application security and facilitates project-related document collaborations.

Most businesses can draw a bright line between internal and external documents. Internal document permissions can usually be applied based on role and privileged access. As time passes, document viewers eliminate concerns over incompatible document versions.

Government organizations, however, have transparency requirements. Many documents need to be made public, but citizen privacy could be compromised in some cases, requiring sensitive info be redacted for those without proper clearances.


 

2. Document Lifecycle Management

Creating, editing, sharing, approving, and converting documents to searchable file types can take up a lot of time for government employees across departments including legal, operations, and public works. ImageGear simplifies these processes.

By standardizing these application-embedded document activities to a single interface, government organizations can accelerate workflow-related activities like eSignature approvals, annotations, and OCR of scanned files.


 

3. Enhanced Application-Based Document Routing and Approvals

Government organizations which use Citizen Management apps negotiate contracts with redlines and redactions, which is easier to do in one application. Documents won’t get lost in email chains, and alerts enable effortless digital signatures. Workflows are visually trackable, so if one approver isn’t available, a document can be rerouted. Workflows don’t get stuck in a big data bottleneck.


 

4. Barcoding for Images and Physical Records

In government facilities, physical records are often stored in folders or boxes before digital conversion, and damage to the record may come in the form of blurry text or ink blotches. There’s a better way to store the data. Barcodes can help businesses store and process this data in a better way, but building a custom barcode recognition for your business application is cumbersome and may cause frustration.

Accusoft’s Barcode Xpress SDK enables users to collect critical document information with ease. This barcode reader can detect even the most damaged or broken barcodes for a variety of industries and there’s even a mobile scanner available.

5. Government Forms Processing
Public sector organizations have vast amounts of forms to expedite the intake of data from their citizens. Court cases, license applications, and invoices are only a few examples of forms which have standardized fields which can aid in the conversion process. Accusoft’s FormSuite enables users to customize form field detection and process it into your application.

Are you looking for ways to increase the administrative productivity of your information workers? Want to increase the speed of document processing, discovery, and approval processes?

Accusoft offers a wide spectrum of document workflow solutions for government organizations which are proven, supported, and require minimal development effort to enhance your existing application ecosystem. Contact us to discuss your unique requirements today.

The world of investment technology moves almost as quickly as the investment markets themselves. Without the right FinTech tools, today’s individual investors are likely to be left behind the latest financial trends. That’s why FinTech investment solutions are once again becoming a major point of emphasis for developers looking to expand access to key financial services.

The History and Impact of FinTech Investment Solutions

As a subset of the FinTech industry, “invest-tech” is sometimes used to refer to a wave of innovative investment management technologies that are helping to connect aspiring investors to the information and financial services they need to capitalize on new opportunities. Like many other FinTech applications, investment software tools have played a pivotal role in expanding access to financial markets and helping consumers take direct control of their investment decisions.

Much of the early FinTech investment market was driven by “robo-advisor” services that used sophisticated algorithms to provide customers with investment guidance. The boom reached its peak in the mid-2010s, with a record 81 new invest-tech solutions hitting the market in 2014. Since then, the number of launches has dwindled as established incumbents in the financial services sector moved in to acquire some of the most promising firms.

In many instances, those acquisitions were made to expand existing digital capabilities or to secure a new base of established investment customers. Since the typical FinTech investment user was younger and possessed fewer assets, the profit margins for many start-ups were simply too low and the costs of customer acquisition too high. This dynamic has gradually shifted the industry’s focus toward the B2B market, although crowdsourced investment platforms remain quite popular among many retail investors. 

The Current State of FinTech Investment Technology

FinTech investment platforms roared back into the public consciousness following the COVID-19 pandemic as the combination of work-from-home mandates and accumulated savings caused a rise in retail investment. Individual investors made up 19.5 percent of stock market activity in the first half of 2020, an increase of nearly five percent from the previous year. On a particularly busy day of trading, individual investors constitute a whopping 25 percent of market activity.

Thanks to mobile FinTech apps from startups and established players in the financial services industry, more people than ever before have access to investment opportunities, which has caused significant disruption to the market. The controversial rush on GameStop stock in early 2021, for instance, demonstrated just how much impact easy-to-access these platforms could have on investment trends.

This resurgence in retail investment could very well spark another wave of interest in FinTech investment apps, especially from established firms looking to expand their digital capabilities and capitalize on the growing market.

Enhancing the FinTech Investment Experience

For developers building the latest iterations of FinTech applications, there are a few key features worth focusing on to deliver a better investment experience. 

Sharing Data and Portfolios

While being able to access investment portfolio data on demand is valuable, customers are understandably concerned about the security of that data. Whether they’re building a retail investment app or a managed digital vault, developers need to provide a way of viewing private information securely. This is especially critical for digital documents. Relying on an external application for viewing or even just using the default browser viewer could potentially expose information to unauthorized users. By integrating secure, native viewing features, developers can ensure that investment portfolio data remains within a protected application environment.

Protecting Proprietary Research

One of the key benefits of working with an investment firm is having access to their market research when making financial decisions. In many cases, financial projections are calculated using proprietary formulas embedded within spreadsheets. Unfortunately, spreadsheets pose a number of security and compatibility problems. Even if a workbook is shared securely, there’s often little to stop someone from copying the proprietary formulas embedded within the cells and using it for other purposes. FinTech developers need ways to make those spreadsheets available without also compromising the valuable formulas developed over years of painstaking research.

Improving Data Capture

Making the right investment is all about having the right information. That data could come from a variety of sources, and in many instances it will need to be collected and analyzed before it can be of any use. Automating the data capture process can help to get that information into a customer’s hands faster. For example, customer information can be updated quickly by automatically extracting data from structured forms like tax filings. Scanned documents can also be converted into searchable PDFs using Optical Character Recognition (OCR), which makes it easier for AI-powered tools to sift through data in search of trends and potential opportunities.

Choosing the Right FinTech Investment Integrations

Building a successful FinTech application requires developers to build innovative tools that set them apart from the competition while also implementing everyday functionality that often lies outside their experience or expertise. Features like document viewing, annotation, and file conversion may be integral components of their platform, but take both time and development resources to build from scratch. By turning to SDKs and APIs, developers can quickly roll out new features without detracting from their primary software development goals.

Accusoft has been working with FinTech investment platforms for many years, helping developers to build powerful InsureTech applications without sacrificing the viewing and image processing technology that customers expect.

  • PrizmDoc Viewer: Adds secure HTML5 viewing, annotation, conversion, and redaction capabilities to web-based applications, allowing developers to control every aspect of the viewing experience without compromising privacy.
  • PrizmDoc Cells: Provides full XLSX support for applications, making it possible to securely upload and share Excel workbooks without exposing the source file or allowing users to access and copy proprietary formulas.
  • FormSuite: A versatile forms SDK that allows developers to add form template identification and data extraction to their application, making it easier than ever to automate and streamline workflows.
  • ImageGear: In addition to conversion and compression tools, it also provides full-page OCR for converting scanned documents into searchable text.

Learn more about how Accusoft is helping FinTech developers to drive the next generation of investment technology platforms.

 

 

The financial industry has made significant investments in document lifecycle management solutions to enhance their productivity, accuracy, and flexibility. There is broad recognition that paper-based processes are a huge source of waste and inefficiency, but simply transitioning away from paper often isn’t enough on its own to achieve true digital transformation. That’s because performing a digital-based process manually still presents many of the same problems. In order to leverage the true benefits of digital document management, FinTechs need to implement data capture and document generation capabilities as part of a broader process automation solution.

A Quick History of Data Capture & Document Generation

To understand how FinTechs can use data capture and document generation technology to enable their digital transformation, it’s helpful to take a moment to understand the history of these tools and how they’ve developed since their origins.

Data Capture

The financial industry was an early innovator in data capture technology with the development of the specialized OCR-A font in the 1960s. This simple monospace font is still used today for the account and routing numbers on an ordinary bank check. Early data capture technology relied on pattern recognition, so an exact pixel match was needed to read the characters electronically and match them to a corresponding character in a font library. While this worked well enough for scanning printed bank checks into a computer system to track transactions, reading anything else on the check with an automated system required further developments in data capture tools.

Modern character recognition technology utilizes a more sophisticated feature detection approach that uses the component elements of each character to distinguish them from one another. An “A,” for example, usually consists of the same basic elements (two angular lines that come to a point with a horizontal line crossing them) regardless of the font used. Breaking characters down into their component elements has even made it possible for software to read handwritten characters as well as machine-printed text.

Document Generation

Document generation technology emerged in the 1970s in the form of document assembly, which was originally used by lawyers to streamline contract creation. Contracts are highly structured and rules-oriented, which made it easy to build a decision-tree logic that could be understood by the software tools of that era. Early document assembly programs used a collection of document templates that incorporated conditional fields the software could replace automatically each time it generated a contract.

Modern document assembly is typically used as part of a more robust document automation solution. Software extracts information from a database and inserts it into a template to generate unique documents quickly, easily, and accurately. These programs are much more sophisticated and flexible than early document assembly tools, allowing organizations to programmatically generate a wide range of documents without ever having to look at the contents prior to the final review process.

Data Capture & Document Assembly in FinTech Today

Despite being an early innovator in OCR technology, the financial industry has been slow to implement more robust data capture capabilities throughout their operations. According to a recent study, 63% of banks are still collecting information from documents manually, a process that’s not only time consuming, but also incredibly prone to error. They’ve been slightly faster to adopt document generation, with 49% of banks still relying on manual processes to create documents. 

Ironically, FinTech organizations are even more dependent upon manual practices than traditional banks. When it comes to data capture, 75% of FinTechs are reviewing documents and entering their data manually rather than using an automated solution. The story is largely the same for document generation, as 79% of them are still creating documents manually.

Understandably, most of these organizations are planning to implement some form of automated data capture and document generation solution within the next two to three years. That’s because they recognize that it will be difficult to achieve true digital transformation without them.

Why Data Capture and Document Generation Are So Important for FinTech

FinTech companies have developed a wide range of innovative financial tools that allow consumers to take better control of their finances and help organizations manage their resources more efficiently. In order to deliver those streamlined solutions, however, FinTechs need to have the capabilities in place to make their own processes more efficient.

Data capture and document generation work together to help these organizations maximize the value and potential of their document management systems. Financial information can be submitted in many different formats, ranging from digital forms and fillable PDFs to images, flattened PDFs, and scanned documents. Extracting information from each of these formats requires a sophisticated understanding of data capture that few software developers possess. 

Once that data is extracted, it can be routed anywhere it’s needed by workflow automation tools. That could be a new document that’s being generated, but more often it will be sent to a database. When the time comes to generate a new document, previously captured information can be inserted wherever it’s needed programmatically. Multiple documents (or just sections of them) can also be merged or split apart to create entirely new ones filled with information drawn from several sources.

All of this can be done in a matter of seconds with the right software integrations, which saves a tremendous amount of time for FinTech teams who have many other priorities to focus on. By incorporating robust data capture and document generation capabilities into their platforms, they can provide faster, better functionality to their customers. Rather than uploading a document and waiting for it to be processed, information can be extracted and routed wherever it’s needed instantly to facilitate faster reviews and resolutions.

Another key benefit of data capture and document generation is accuracy. Between manually reviewing information, entering it by hand into a system, and then retrieving it to create new documents, there are plenty of opportunities for mistakes to be made. In a financial context, those errors often have the potential to be systemic, creating additional errors that are time consuming and expensive to remediate. Automated extraction and assembly remove the risk of human error, which enables FinTechs to accelerate and scale their processes more effectively.

Integrating Data Capture and Document Generation with Accusoft

For over 30 years, Accusoft has been a pioneer in building software integrations that expand application functionality. We provide a variety of data capture and document generation solutions that meet the needs of today’s FinTech platforms. Whether you’re incorporating functionality directly into your application with an SDK or deploying a cloud-based solution that connects to one of our APIs, we have the flexibility to help you integrate the features you need to complete your digital transformation.

To learn more about how Accusoft can enhance your FinTech application with data capture and document generation, talk to one of our solutions experts today.

 

While collection and storage solutions have evolved to help manage the document deluge, problems emerge when it comes to personnel. As noted by Tech Republic, staff now cycle through 35 applications and perform more than 130 cut-and-paste actions per day as they attempt to view key resources and complete critical tasks. Big data is taking over the world, but it’s not being managed effectively. Documents of all shapes, sizes, files, and formats are everywhere.

The solution to this application overload in a document-dominated world? Customizable APIs — like those powering Accusoft’s industry-leading PrizmDoc™ Viewer — that allow developers to tackle everything from quick integrations to basic interface adjustments and advanced customization. Ready to do more with documents and get staff back on track? Here’s how PrizmDoc API customization can help.


Level One: Integration

As noted by the SD Times, APIs are the ideal solution for new-decade deployments because they offer the critical advantage of easy integration. Instead of taking the long road of designing applications, identifying interdependencies, and regulating resource calls from the ground up, customizable API solutions offer the shortcut of easy integration with existing apps, allowing staff to stay under the umbrella of familiar functions while adding value-driven features.

By simply adding the jQuery plugin to existing web applications, teams can use the full feature set of PrizmDoc Viewer  — which includes multi-format document viewing, search, annotation, redaction, and conversion — without the need for complex customization.

 


Level Two: Customizable Configuration Parameters

For many organizations, minor customization to basic functions like tab display and localization help viewer APIs align with user expectations and existing application frameworks. Using the jQuery namespace plugin, developers can customize basic UI elements and set specific initialization parameters. Teams can choose to hide or display tabs, specify the size of the viewer, and set the mode of comparison tools. 

Worth noting? Modifying PrizmDoc Viewer via the jQuery plugin requires no modification to actual code, allowing powerful customization with minimal effort and ensuring the viewer is always compatible with future release versions.  

 


Level Three: Interface API Customization

Need to do more with your customizable API? PrizmDoc Viewer is designed using an open markup approach, meaning all HTML and .css code is fully open and customizable. By modifying HTML templates or injecting your own code, you can create a completely redesigned interface that aligns with existing application formats or use the API’s unminified, unobfuscated JavaScript library to edit the business logic and behavior of the viewer. 

For example, PrizmDoc Viewer offers total control over the configuration and customization of its eSignature, allowing you to modify existing parameters or build your own from the ground up, complete with programmatic field fill-in.


Level Four: Completely Customize Your Document Viewer

Want complete customization control? Use PrizmDoc Viewer as sample code and build your own viewer from the ground up. Our Developer Guide provides insight on using the Viewer API to modify or augment application behavior and the configuration of PrizmDoc application services (PAS) and the PrizmDoc server to enhance both viewing functionality and automated document processing.

With effective document management now critical to business success, full-featured viewer integration and customization is required to help combat application overload, align software functionality, and improve end-user access. From out-of-the-box support to building from scratch, the customizable API of PrizmDoc Viewer gives your team total control.

If the people using your web application need to view, search, redact, or annotate documents right in their browser, PrizmDoc Viewer is an amazing option. It lets you present Office, PDF, TIFF, email, and many other kinds of documents as part of your web application. Check out some of the demos if you’ve never seen it in action.

To make all of this possible, there are basically two sides to the PrizmDoc Viewer architecture:

  • The HTML viewer itself, running in the browser
  • A powerful backend which converts documents, page by page, to SVG for viewing in the browser

Your web server sits between these two, acting as a proxy for the viewer to ask the backend for the pages it needs to display:

One of the advantages of this architecture is that we can deliver the first page of the document as soon as it’s ready, even while the rest of the document is still being converted. However, setting up and maintaining the backend is not trivial.

Fortunately, Accusoft can handle all of that for you with PrizmDoc Cloud. Sign up, get an API key, and simply connect your web application to our already-running, fully-managed PrizmDoc Viewer backend. It’s a great option, especially if you’re just getting started with PrizmDoc Viewer.

But, of course, using an Accusoft-hosted backend may not work for your business. Maybe you are not allowed to ever let documents leave your network, even temporarily. If that’s the case, you’ll need to host and manage the backend yourself. As customers start looking into what it takes to do that, we get a lot of questions about how load balancing works. How is the compute workload spread across the servers? How are HTTP requests routed to the the correct machines? What sort of load balancer(s) should I be using? Those are the kinds of questions we’ll cover in this post.

To do that, though, we first need a more detailed picture of the backend. For more on PrizmDoc load balancing, check out the rest of my article here.

 


 

Adam Cooper

Adam Cooper, Software Architect, PrizmDoc

Adam joined Accusoft in 2010 and works as a software architect for the PrizmDoc family of products. He focuses primarily on API design, customer experience, and internal tooling to support product development. Prior to Accusoft, Adam developed software for a variety of organizations, mostly focusing on .NET, web development, and automated testing. Outside of work, Adam enjoys photography, music composition and engraving, discussing the Bible, and spending time with his family and church.

ViewerJS

Full-featured document viewer outperforms open source alternative

There are various HTML5 document viewers available for embedding in web and mobile applications. Typically, developers have a few minimum requirements for a viewer: it must display PDFs and Microsoft Office documents, should embed easily into their front-end HTML/CSS/Javascript. Sometimes additional features are needed, such as additional file support, signature, annotation, or redaction.

ViewerJS is an open source product powered by PDF.js, a viewer created by Mozilla for its Firefox browser. ViewerJS appeals to developers due to its cost (zero), its ease of integration and the simplicity of its embedded viewer. It’s written in JavaScript and can display PDF and ODF (Open Document Format) files, but it doesn’t support Microsoft Office documents. This means it doesn’t display basic files like Word documents (.doc) and Excel spreadsheets (.xls). This limitation means many projects will need to evaluate alternatives.


Complete Functionality – More Than Just Document Viewing

PrizmDoc is a far more powerful alternative, offering a number of features unavailable in ViewerJS. PrizmDoc handles over 50 different file formats, including Microsoft Office documents, CAD files, PDFs and major image types. It also offers:

  • Search: Find a text string and see all instances of the text in the document for easy reference.
  • Annotation: For collaborative work on documents, this allows users to highlight areas of a document, inserting comments or instructions or emphasizing key passages.
  • Redaction: Cover sensitive information in the displayed files – a key consideration for government, legal and financial institutions.
  • eSignature: This enables readers to sign documents, agreeing to their terms.

Compare PrizmDoc and ViewerJS

PrizmDoc goes far beyond open source options in providing a comprehensive alternative for embedded document viewing for applications:

PrizmDoc
ViewerJS
Embedded viewer
PDF viewing
MS Office documents
CAD files
Vector and raster images
Search
Annotation
Redaction
eSignature
Engineering support
Regular updates
Works with any programming language

PrizmDoc: Fully Supported & Dependable For The Long Haul

Accusoft offers complete support with PrizmDoc, with our engineers always ready to assist with implementation and integration. And PrizmDoc is always being improved, with new versions released regularly to make the most of new innovations. ViewerJS offers none of these advantages.

View this complete set of product demos to try out PrizmDoc, or contact us here with any questions you might have. And use the trial download link below to test PrizmDoc out in your own applications.

Free Trial

One of the new additions to our recent PrizmDoc v11.0 release was a developer preview of our document pre-conversion feature. This is an exciting new addition, given many of our clients work with thousands—even millions—of large documents and can’t afford to waste even seconds waiting for files to process.

Pre-conversion will allow the conversion of documents and images prior to being requested for viewing. For example, you can determine certain files to be converted and rendered. The rendered viewing packages will be stored in cache. When documents are requested for viewing, PrizmDoc will check to see if the document has been already converted and if so, call the viewing package for that document from cache to the viewer. This will dramatically increase performance, since the documents and images are already converted and ready for viewing before they are requested.

Although the feature is still in late-stage development and is slated to be production-ready in an upcoming release, you can download and use the current developer preview to test and evaluate the functionality. First we’ll give a quick PrizmDoc overview, and then cover how to get started with pre-conversion.

 

Understanding PrizmDoc

PrizmDoc is a powerful, scalable suite of APIs and Javascript that use HTML5 standard technologies to convert, view, search, annotate, and redact documents in dozens of formats in a zero footprint viewing client. At its core, the basic concept of PrizmDoc is fairly straightforward: your web application passes a document to an http service that converts it into SVG and returns it to the browser. So long as the browser supports SVG (which all modern browsers now do), the document is viewable without needing to install any software on the browsing device. This model works well across PCs, tablets, and even smartphones. Every device can view all standard document types without downloading or installing any extra software.

PrizmDoc client layout
The above diagram provides an overview of the core components of the PrizmDoc Client and Server

 

PrizmDoc Application Services

PrizmDoc Application Services (PAS) is installed “ready to run” via any of our four web tier samples (C#, MVC, Java, PHP).

 

Pre-converting documents in Application Services

When viewing large documents, a user can experience a delay viewing later pages in the document. The pre-conversion API allows the user to avoid any delay in viewing a fully converted document prior to the creation of a viewing session. Users may choose to pre-convert all documents over a certain file size or documents that are frequently viewed, allowing for a more tailored viewing experience.

 

How to Create a viewing Package by Pre-converting Documents

Pre-conversion is available by using the Pre-conversion API. (For detailed information, refer to PrizmDoc Application Services RESTful Viewing Package Creators API.) Documents are pre-converted using the following steps:

 

Step 1

Issue a POST request with the body of the request containing JSON formatted ‘source’ object. The source.type property can be a “document”, “url” or “upload”. In this example, “document” is used as a source.type property.

POST http://localhost:3000/v2/viewingPackageCreators

viewingPackageCreator POST Body

Content-Type: application/json
{
    "input": {
        "source": {
            "type": "document",
            "fileName": "sample.doc",
            "documentId": "unT67Fxekm8lk1p0kPnyg8",
            . . .
        },
        "viewingPackageLifetime": 2592000
    }
}

A successful response to the above POST provides ‘processId’ in the response body.

200 OK
Content-Type: application/json
{
    "input": {
        "source": {
            "type": "document",
            "fileName": "sample.doc",
            "documentId": "unT67Fxekm8lk1p0kPnyg8",
            . . .
        },
        "viewingPackageLifetime": 2592000
    },
    "expirationDateTime": "2015-12-09T06:22:18.624Z",
    "processId": "khjyrfKLj2g6gv8fdqg710",
    "state": "processing",
    "percentComplete": 0
}

 

Step 2

Using the ‘processId’ obtained in step 1, query the pre-conversion process for the status.
GET http://localhost:3000/v2/viewingPackageCreators/khjyrfKLj2g6gv8fdqg710

A successful response body contains the JSON formatted properties ‘state’ and ‘percentComplete’. The state value indicates whether it is ‘complete’ or ‘processing’ and the property ‘percentComplete’ indicates percentage amount complete.

Start polling the status by issuing a GET command using the above URL. It is recommended to use shorter intervals initially between the requests for the first few times. If it is still not complete, then the document may be large, requiring more processing time. In scenarios like this, an increase in the time interval between requests would be necessary to prevent a large number of status requests that could potentially cause network congestion. On 100% completion, the response body will among other information contain an output object with ‘packageExpirationDateTime’ property.

200 OK
Content-Type: application/json
{
    "input": {
       "source": {
            "type": "document",
            "fileName": "sample.doc",
            "documentId": "unT67Fxekm8lk1p0kPnyg8",
            . . .
       },
       "viewingPackageLifetime": 2592000
    },
    "output": {
        "packageExpirationDateTime": "2016-1-09T06:22:18.624Z"
    },
    "expirationDateTime": "2015-12-09T06:22:18.624Z",
    "processId": "khjyrfKLj2g6gv8fdqg710",
    "state": "complete",
    "percentComplete": 100
}

 

How to obtain information about the converted viewing Package

(For detailed information about the converted viewing Package, refer to PrizmDoc Application Services RESTful Viewing Package Creators API.)

When the status is 100% complete, details can be obtained about the converted package by issuing the following request:

GET http://localhost:3000/v2/viewingPackages/unT67Fxekm8lk1p0kPnyg8

Response Body

200 OK
Content-Type: application/json
{
    "input": {
        "source": {
            "type": "document",
            "fileName": "sample.doc",
            "documentId": "unT67Fxekm8lk1p0kPnyg8",
            . . .
        },
        "viewingPackageLifetime": 2592000
    },
    "state": "complete",
    "packageExperationDateTime": "2016-1-09T06:22:18.624Z"
}

 

How to Delete a Previously Converted Package

(For detailed information about deleting converted viewing Package, refer to PrizmDoc Application Services RESTful Viewing Package Creators API.)

A previously converted package can be deleted by issuing a DELETE request.

DELETE http://localhost:3000/v2/viewingPackages/unT67Fxekm8lk1p0kPnyg8

This request marks the package for asynchronous deletion. A successful response is as follows:

204 (No Content)

 

Viewing Packages

PrizmDoc Application Services 11.0 introduces the Viewing Packages feature. A Viewing Package is a cached version of a document that the PrizmDoc Viewer will use when displaying a document. Viewing a document from a Viewing Package will significantly reduce the load on PrizmDoc Server and will allow you to serve many more users per minute than you would otherwise be able to.

A Viewing Package can be created through Pre-Conversion or by using On-Demand Caching.
This topic provides information about the following:

 

Storage

Viewing Packages are stored in both the filesystem and configured database. If using multiple instances of PAS, you must use a shared database and NAS (Network Attached Storage). The Running PrizmDoc Application Services on Multiple Servers topic can provide more information for configuring PAS in Multi-Server Mode.

By default, storage is configured in the following way:

Config Key Storage Provider Description
viewingPackagesData database Data about a Viewing Package. This is the data that can be retrieved from GET /v2/viewingPackages.
viewingPackagesProcesses database Data about a Viewing Package creator process. This is the data that can be retrieved from GET /v2/viewingPackageCreators.
viewingSessionsData database Data about a Viewing Session. When creating a Viewing Session, an entry is added to this table.
viewingSessionsProcessesMetadata database Data about processes for a Viewing Session. This is currently used for content conversion and markup burner processes.
viewingPackagesArtifactsMetadata database Metadata for a viewing package artifact. This is used to find specific artifacts for a package and contains the artifact type, the file name in the filesystem among other important information.
viewingPackagesArtifacts filesystem Artifacts for a Viewing Package. These include SVG and raster content for every page, the source document, and other artifacts the Viewing Client will likely request.

 

Configuration

Viewing Packages are opt-in and require special configuration to work properly. At a minimum, your configuration should include the following:

# Feature toggles
feature.packages: "enabled"

# Database configuration
 
database.adapter: "sqlserver"
database.host: "localhost"
database.port: 1433
database.user: "pasuser"
database.password: "password"
database.database: "PAS"

# Default timeout for the duration of a viewing session
defaults.viewingSessionTimeout: "20m"

viewingPackagesData.storage: "database"
viewingPackagesProcesses.storage: "database"
viewingSessionsData.storage: "database"
viewingSessionsProcessesMetadata.storage: "database"

viewingPackagesArtifactsMetadata.storage: "database"
viewingPackagesArtifacts.storage: "filesystem"
viewingPackagesArtifacts.path: "/usr/share/prizm/Samples/viewingPackages"

 

Pre-Conversion

Creating a Viewing Package through Pre-Conversion provides a way to generate packages whenever it makes the most sense for your application to do so. It allows you to make use of down-time for Pre-Conversion to reduce load in high traffic periods. Pre-Conversion does all the work of creating a Viewing Package whenever it is requested. It starts a process that will begin downloading content and allow you to poll for progress.

It is recommended that you maintain a queue of Pre-Conversions so you don’t overload the server and have faster turnaround time. We recommend a maximum of five Pre-Conversion processes at a time per PrizmDoc Application Services and PrizmDoc Server instance. This will allow packages to be created quickly while maintaining a sustainable load.

 

On-Demand Caching

Creating a Viewing Package through On-Demand Caching is a seamless process through the Viewing Session API. On-Demand Caching allows you to trigger a Viewing Package creation process in the background and use the resulting Viewing Package when it is ready. This feature is designed to allow immediate viewing using PrizmDoc Server while caching a package for subsequent views of the same document.

As an example, consider this request for a Viewing Session:

POST /ViewingSession
{
   "source": {
       "documentId": "PdfDemoSample-a1b0x19n2",
       "type": "document",
       "fileName": "PdfDemoSample.pdf"
   }
}

This request will always return a viewingSessionId regardless of the status of the matching Viewing Package. If a Viewing Package does not currently exist with the given documentId, PrizmDoc Server will handle document viewing while a background process creates a Viewing Package. Once the background process is complete, PrizmDoc Application Services will handle all further viewing sessions until the Viewing Package expires (24 hours by default).

PrizmDoc viewing session

 

Conclusion

Given the increasing digital nature of document management, the pre-conversion services in PrizmDoc are an exciting addition to our already-robust suite of features. Pre-converting, rendering, and storing documents in cache will provide a more seamless experience, allowing users to immediately call and view documents and images. We believe it’s a game-changer that will allow many of our clients to significantly streamline their processes.