Technical FAQs

Question

During the installation of ImageGear for .NET (v23.4 and above), the installer reaches out to Microsoft’s site to download the VC++ redistributable and .NET packages. Which one(s) does it download?

Answer

The ImageGear for .NET installer places the following redistributables onto a system:

In addition to this, the following .NET framework versions are installed:

  • Microsoft .NET Framework 2.x
  • Microsoft .NET Framework 3.0
  • Microsoft .NET Framework 3.5
  • Microsoft .NET Framework 4.0

So, if a system already has all of these installed on it, this should prevent the installer from trying to reach out to download them.

Question

After applying a new license/evaluation license through the license utility on Linux, the following error appears in the logs:

{"gid":"","name":"OCS","time":"2019-01-3T18:26:39.368Z","pid":36875,"level":50,"tid":36875,"taskid":8,"FATAL ERROR":"MSO feature is active, but 'fidelity.msOfficeCluster.host' and 'fidelity.msOfficeCluster.port' are not configured, going to 'Unhealthy' state"}

What could cause this issue to occur, and how can it be fixed?

Answer

As you are running on Linux, the MSO switch on the license assumes that there are additional settings configured:

fidelity.msOfficeCluster.host and fidelity.msOfficeCluster.port

These settings are meant to point to a Windows server which has Microsoft Office 2013 or 2016 installed alongside PrizmDoc with MSO enabled. This is required for MSO functionality to be enabled.

If you wish to use the license with MSO enabled but do not have a separate Windows server, you can do the following to set the PrizmDoc service to run using LibreOffice:

  1. Make a backup of /usr/share/prizm/prizm-services-config.yml file.
  2. Edit the file in the text editor of your choice and find the following line, fidelity.msOfficeDocumentsRenderer: auto
  3. Be sure to remove the hash and leading space in front of the line and then change from auto to libreoffice.
    fidelity.msOfficeDocumentsRenderer: libreoffice
  4. Restart the service by running /usr/share/prizm/scripts/pccis.sh restart
Question

We are running PrizmDoc on a Windows operating system and we noticed that our ms-office-conversion-service remained unhealthy even through a restart of the service. We also noticed an error in the MsOfficeConverter.log referencing the following error. What could be the cause?

“WARN – COM error occurs on 1 initialization attempt. Retrieving the
COM class factory for component with CLSID
{000209FF-0000-0000-C000-000000000046} failed due to the following
error: 80080005 Server execution failed (Exception from HRESULT:
0x80080005 (CO_E_SERVER_EXEC_FAILURE))”

Answer

The PrizmDoc MSO feature requires either Microsoft Office 2013 or 2016 to be installed in order to function properly. Based on the error, there is a Microsoft Office specific .dll file which is not registered properly.

The following process will re-register the .dll files and potentially resolve this issue (Note: for Step 2, this may vary depending on what directory you used to install Microsoft Office and the version of Office. You want to find the directory containing winword.exe):

  1. Run Command Prompt as Admin.
  2. Type cd C:\Program Files\Microsoft Office\Office15 or cd C:\Program Files\Microsoft Office\root\Office16
  3. Run winword.exe -regserver.
  4. Reboot the server.

If you don’t have the Prizm service set up to run on boot then make sure that Office applications are started by PrizmDoc, or from the command line, before being opened manually.

Implementing any technology solution within an established organization can be a monumental challenge for a developer. Doing so for a legal firm that has a strong culture and longstanding processes can be even more difficult. That’s why LegalTech developers need to take a few key factors into consideration as they work on applications for the legal industry.

Build vs. Buy

One of the first questions any firm needs to ask is whether it wants to build a specialized solution or turn to an existing LegalTech application. In many cases, this comes down to a question of resources. For larger “big law” firms or legal departments within an enterprise business, internal developers may be available to build a customized application that caters to specific organizational needs. 

If the resources and development skills are on hand, building a dedicated solution can be an effective strategy. Developers can focus narrowly on the established processes used at the firm and design technology that targets clear pain points more effectively than an “off-the-shelf” product.

More importantly, as Kelly Wehbi, Head of Product for Gravity Stack, points out, building doesn’t necessarily mean starting from nothing

“I think a lot about how to leverage the platforms we have or could potentially purchase, but then add our own expertise and strengths on top of it. That doesn’t have to mean you have to build some entirely new interface or have to invent some new technology. It could be there’s a tool that’s out there that does exactly what you need and maybe you have to build a few customizations on top of that.”

Of course, building a solution also presents a number of challenges, especially if the project’s requirements are not well defined from the beginning. There’s a great deal of overhead involved with building new technology in terms of maintenance and ongoing support. It’s easy to fall into the trap of focusing on technology at the expense of services. But legal firms are not product companies; they need to focus instead on finding ways they can use technology to leverage their core services.

It’s that emphasis on services that drives many firms to buy the technology solutions they need rather than to build them. Existing software integrations are typically better positioned to maintain security and don’t need to be maintained as extensively. Deploying proven software integrations also helps organizations to maximize their on-premises resources and enhance their flexibility in the long-term. 

“I tend to default toward leveraging an existing platform when possible,” Wehbi admits. “Security ends up being a huge part of this and when you can leverage a company that’s solved that really well, that goes a long, long way. It offers you a bunch of options you wouldn’t have if you had to build it yourself,” Wehbi says. “That’s a pretty big undertaking to start from scratch.”

Getting Buy-In for LegalTech Solutions

Once the build or buy decision is finally made, there’s still the critical matter of executing and putting the new solution into practice. Getting feedback throughout the development and integration process is important, whether it’s gathered from anecdotal observations or some form of usage analytics. 

Neeraj Raijpal, CIO at Schroock & Schroock & Lavan, finds that implementations tend to go smoother when the development team is able to get rapid feedback from key decision makers: “The faster you get the feedback, the faster you know you’re down the right path or not. It is very frightening when the stakeholder…looks at something and says ‘This is exactly the opposite of what I expected.’ You don’t want to be in that situation.”

Ultimately, a LegalTech application’s success depends largely upon whether or not the firm as a whole embraces it. When developers are seeking to implement a solution, they need to be especially careful to take the culture of the firm into consideration. Without buy-in at the top, it will be difficult to convince anyone in the organization to commit to change. 

“If you’re trying to solve a problem because you have a deficiency in a current business process, but you’re not willing to change the process…that’s (a) disaster,” Raijpal warns. Although LegalTech solutions are designed to enhance efficiency and reduce errors, they often require people to learn how to use them or to abandon existing technology solutions.

Take, for example, a legal firm that needs to redact documents during the discovery process. The existing process likely involves printing out documents and then laboriously redacting them by hand with marker. Once that process is finished, they are scanned and saved as image-based PDFs. An HTML5 viewer with redaction capabilities could easily streamline this process to make it faster, more flexible, and more secure. Unfortunately, if the firm’s attorneys aren’t willing to adopt the new process, all of the potential efficiency benefits go to waste.

The Importance of Communication

Communication and ongoing support are critical to ensuring a successful LegalTech implementation. Developers can begin this important conversation right from the beginning when they’re designing application features. Whether they’re building everything from scratch or turning to software integrations, they need to have honest and thorough discussions to determine what specific features are needed to support legal processes. Implementing a LegalTech solution is more likely to be successful if that solution is closely aligned with the firm’s existing needs and future goals.

But the conversation doesn’t stop once the application goes live. Ongoing support and education is often necessary to help firms adopt new technology and make the most of its potential. Developers may even need to adjust some features over time as needs change. If they utilized third party software integrations to add key functionality, they need to know they can count on those vendors to support them as the LegalTech application evolves.

Make Your LegalTech Implementation a Success with Accusoft

Accusoft’s family of software integrations allow LegalTech developers to quickly add the features their clients need to modernize workflows and improve efficiency. Whether it’s PrizmDoc’s extensive document redaction capabilities that make it easier to protect privacy during eDiscovery or the automated document assembly features of PrizmDoc, developers can lean on our 30 years of document processing expertise so they can focus on building the tools legal teams require

As part of our ongoing work with the LegalTech industry, Accusoft recently sponsored a Law.com webinar on the subject of building vs buying technology solutions for legal firms. You can listen to some of the highlights with contributors Kelly Wehbi and Neeraj Rajpal along with host Zach Warren, editor-in-chief of LegalTech News, on the Law.com Perspectives podcast.

Understanding the Value of Third-Party Software Integrations
 

Today’s customers expect more of software applications than ever before. Piecemeal solutions that provide only a few noteworthy features are quickly being overtaken by more comprehensive platforms that deliver an end-to-end experience for users. This has prompted developers to incorporate more capabilities, while also building innovative features that set their solutions apart from the competition. Thanks to third-party software integrations, they’re able to meet both demands.

What is Third-Party Software Integration?

Third-party software integrations typically come in the form of SDKs or APIs that provide applications with specialized capabilities. Rather than building complex features like optical character recognition (OCR), PDF features, or image cleanup from scratch, developers can instead incorporate the necessary features directly into their software via an SDK or use an API call to access capabilities without expanding their application’s footprint.

From a user experience standpoint, third-party software integrations allow developers to build more cohesive software solutions that provide all the essential features a customer may require. Instead of pushing them into a separate application to interact with documents, provide a signature, or fill out a digital form, they can instead deliver an unbroken experience that’s easier to navigate and manage from start to finish.  

4 Key Third-Party Software Benefits

There are a number of important benefits organizations can gain from using third-party software integrations, but four stand out in particular:

1. Reduce Development Costs

When evaluating whether it makes sense to build functionality for an application in-house or buy a third-party software integration, cost is frequently one of the key considerations. There is often a tendency to think that it would be more cost-effective to have developers already working on the project simply build the capabilities they need on their own. After all, there’s no shortage of open-source SDKs and other tools that are available without having to pay licensing or product fees.

In practice, however, this approach usually ends up being more expensive in the long run. That’s because the developers working on the project often lack the experience needed to build those capabilities quickly. A software engineer hired to help build AI software, for instance, probably doesn’t know a lot about file conversion or annotation. While they might be able to find an open-source tool to build those features, they still need to do quite a bit of development work and on-the-job learning to get the new capabilities stood up and thoroughly tested. 

Focusing on these features means they’re not focusing on the more innovative aspects of their application. From a cost standpoint, that means they’re being paid to build something that’s already readily available in the market. When these internal development costs are taken into account, it’s almost always more cost effective to buy ready-to-implement software features built by an experienced third party. As the saying goes, there’s no reason to reinvent the wheel. 

2. Get to Market Faster

Software developers are always working against the clock. With new applications hitting the market faster than ever, there’s tremendous pressure to keep development timelines on track and avoid missing important deadlines. This helps projects stay within their expected budgets and prevents potential competitors from getting to market faster. Any steps that can be taken to accelerate development and potentially shorten the timeline to releasing a product could mean the difference between becoming an industry innovator or being labeled as an also-ran.

Third-party software integrations allow developers to quickly and seamlessly integrate essential capabilities into applications without compromising their project timeline. Rather than building features like forms processing, document annotation, and image conversion from scratch, teams can instead use third-party SDKs and APIs to add proven, reliable, and secure features in a fraction of the time. By keeping projects on or ahead of schedule, they can focus on delivering a better, more robust product that exceeds customer expectations. 

3. Expand Application Features & Functionality

Software development teams typically possess the experience and expertise needed to build the core architecture and innovative features of a new application. In many cases, they’re designing something novel that will provide a point of differentiation in the market. The more time they can spend on refining and expanding those capabilities, the more likely the application is to make an impact and win over customers.

What these developers often lack, however, are the skills needed to implement a variety of other features that will enhance the application’s functionality. Features like document conversion, OCR, PDF support, digital forms, eSignature, and image compression are complex and difficult to build from scratch. By integrating third-party software, developers can leverage proven, feature-rich technology to expand their application’s capabilities. This not only allows them to improve their solution’s versatility but also enhance the overall user experience by eliminating the need for external programs or troublesome plug-ins. 

4. Access Specialized Engineering Support

Incorporating features like PDF support, image conversion, and document redaction into an application poses several challenges. Some of those challenges don’t show up right away, instead, they become evident long after a software product launches. If the developers don’t have a lot of experience with the technology behind those features, minor issues can quickly escalate into serious problems that leave customers unhappy and willing to look elsewhere for alternatives. No organization wants to be caught in a situation where a bug embedded in an open-source tool renders a client’s valuable assets unusable.

By leveraging proven, tested, and secure third-party software integrations, developers gain access to support from experienced engineering teams with deep knowledge of their solutions. In addition to documentation and code samples, they can also speak directly with developers who can provide guidance on how to best integrate features and resolve issues when they emerge. The best integration providers will even work with organizations to customize their solutions to meet specific application needs, which helps create even smoother user experiences and enhances reliability.

Integrating Third-Party Software with Accusoft

For over 30 years, Accusoft has helped organizations add essential features like barcode recognition, file conversion, document assembly, and image compression to their applications through an innovative line of SDKs and APIs. Our document lifecycle technologies are backed by multiple patents and have been incorporated successfully into a wide range of applications. Our dedicated engineers provide ongoing support and work closely with customers to implement their specific use cases, ensuring that their software platform is delivering the best possible experience.

To learn more about integrating third-party software with Accusoft SDKs and APIs, talk to one of our solutions experts today.

learning management system LMS

Post-secondary schools look very different this year as colleges and universities embrace both blended learning and online-only approaches to content delivery and engagement. But this isn’t a one-off operation. Even as pandemic pressures ease, the shift to distance learning as the de facto solution for many students won’t disappear.  As a result, it’s critical for schools to develop and deploy learning management systems (LMSs) that both meet current needs and ensure they’re capable of keeping up with educational evolution. But what does this look like in practice? How do developers and team leaders build fully-functional LMS solutions that empower student success without breaking the bank?

 

Learning Management Systems (LMS) Challenges

When schools first made the shift to distance learning directives, speed was of the essence. While students were barred from campus for safety reasons, they’d paid for a full semester of instruction, and schools needed to deliver. As a result, patchwork programs became commonplace. Colleges and universities combined existing education software with video conferencing and collaboration tools to create “good enough” learning models that got them through to summer break. Despite best educational efforts, however, some students still went after schools with lawsuits, alleging that the quality of instruction didn’t align with tuition totals.

So it’s no surprise that as fall semesters kick off, students aren’t willing to put up with learning management systems that barely make the grade. They want full-featured distance learning that helps them engage with instructors and connect with new content no matter how, where, or when they access campus networks. 

As a result, development teams can’t simply correct for current COVID conditions. Instead, they need to create systems that deliver both blended and purely online interactions, and have the power to ensure students that choose to continue with digital-first learning can still stay connected even after returns to campus become commonplace.

 

How to Create a Functional LMS Framework

So what does a fully-functional LMS framework look like in practice? Six features are critical for ongoing success. Let’s explore how these features can enhance your learning management system and set your end-users up for success in the classroom and at home:

 

Diverse Document Viewing

As schools make the shift to distance learning, the ability to view multiple document types is critical for long-term LMS success. From standard Word documents, Excel spreadsheets, and PowerPoint presentations to more diverse image types — such as those used in medical educational programming or manufacturing courses — students and instructors need the ability to both send and view diverse document types on-demand. 

While both free and paid solutions for viewing exist outside LMS ecosystems, choosing this route creates two potential problems. Students with diverse technological and economic backgrounds may face challenges in finding and using these tools, and data security may be compromised. This is especially critical as schools handle greater volumes of students’ personal and financial information. If document viewing happens outside internal systems, private concerns become paramount.

 

In-Depth Annotations

With students now submitting assignments and exams via educational software, viewing isn’t enough. Staff also need the ability to annotate assets as they arrive. Here, professors and teaching assistants are best-served by built-in tools that allow them to quickly redline papers or projects, add comments, highlight key passages, and quickly markup documents with specific instructions or corrections. 

Without this ability, staff have two equally unappealing choices. They can either print out, manually correct, and then re-scan documents, or send all comments as separate email attachments. Both are problematic, since they limit the ability of students and teachers to easily interact with the same document.

 

Comprehensive Conversion

File conversion is critical for effective learning management systems (LMSs). Specifically, schools need ways to quickly convert multiple document types into single, searchable PDFs. Not only do PDFs offer the ability to control who can edit, view, or comment on papers or exams, they make it easy for teachers to quickly find specific content. The permissions-based nature of PDFs makes them ideal for post-secondary applications and a must-have for any education software solution. 

 

Cutting-Edge OCR and ICR

Optical character recognition and intelligent character recognition also forms a key part of distance learning directives. With some students still more comfortable with hand-written hard copies and some classes that require students to show specific work, OCR can help bridge the gap between form and function. By integrating tools with the ability to recognize and convert multiple character types and sets, schools are better equipped to deal with any document type. Search is also bolstered by cutting-edge OCR; instead of forcing staff to manually examine documents for key data, OCR empowers digital discovery.

 

Complete Data Capture

Forms are a fundamental part of university and college life — but the myriad of digital documents can quickly overwhelm legacy education software. Integrating tools with robust form-field detection allow schools and staff to streamline the process of complete data capture, both increasing the speed of information processing and reducing the potential for human error.

 

Barcode Benefits

As campuses shift to hybrid learning models, students occupy two worlds, both physical and digital. But this duality introduces complexity when it comes to tracking who’s on campus, when, and why. These are currently key metrics for schools looking to keep students safe in the era of social distancing. 

By deploying full-featured barcode scanning solutions as part of LMS frameworks, colleges and universities can get ahead of this complexity curve. From scanning ID cards to take attendance and track resource use to using barcodes as no-contact purchase points or metric measurements for ongoing analytics, barcode solutions are an integral part of LMS solutions.

 

Automation Advantages

The sheer volume of digital documents now generated and handled by post-secondary schools poses the problem of practicality. Teachers and administrators simply don’t have time to evaluate and enter data at scale and speed while also ensuring accuracy. By automating key processes including document conversion, capture, and character recognition, schools can reduce the time required to process documents, leaving more room for student engagement.

 

Building an LMS Product for Teachers & Students

The bottom line for LMS solutions? If they don’t work for end-users, they won’t work for the broader school system as a whole. Gone are the days of invisible IT infrastructure. Now, students and staff alike are school stakeholders with evolving expectations around technology.

By deploying distance learning solutions that prioritize end-user outcomes with enhanced document viewing, editing, data capture, and automation, developers can create LMS tools capable of both solving immediate issues and offering sustained student success over time. Learn more about these functionality integrations for your learning management system at accusoft.com/products

Electronic spreadsheets have been a mainstay of business operations since their introduction four decades ago, but the way organizations use them has changed significantly during that time. Today, the financial industry needs FinTech accounting software that facilitates online spreadsheet collaboration without creating unnecessary risk or disrupting workflows. 

Spreadsheets in the Tax and Accounting Industry

Although many tax and accounting firms use dedicated software solutions to manage complex financial workflows, they still rely on conventional spreadsheets for a variety of tasks. In fact, a recent study by Deloitte found that 62% of companies are still relying heavily upon spreadsheets for business insights. The data used to inform risk analysis, growth projections, and financial modeling is often collected and sorted in individual spreadsheet files by individual employees. In many instances, that data will eventually be transferred into a more sophisticated accounting platform, either through manual entry or an API integration.

Spreadsheets also play a critical role when it comes to presenting complex financial data. Whether it’s for an internal presentation to key stakeholders within the organization or a customer-facing report designed to relay important information about their business, tax and accounting firms routinely need to create, edit, view, and share spreadsheets. 

Although Google Sheets has gained quite a bit of traction over the last few years, Microsoft Excel remains the preferred spreadsheet solution for most financial industry professionals. Practically every CRM and CMS platform allows users to easily export data into Excel’s XLSX file format for convenient viewing, making it the de facto standard for most companies. Online spreadsheet collaboration is also easier than ever before thanks to public cloud tools like Office 365.

5 Major Spreadsheet Collaboration Challenges

Unfortunately, all of that ubiquity and convenience comes with a few drawbacks. There are also some inherent shortcomings with Excel spreadsheets that pose significant challenges to tax and accounting firms in particular.

1. Version Control

One of the great benefits of spreadsheets is their ability to track data over time, with new information constantly being fed into the spreadsheet formula to generate different results. Unfortunately, that typically means that the document could potentially be outdated the moment it’s copied, shared, or downloaded because a more current version might exist elsewhere. While cloud-based software like Google Sheets or Office 365 theoretically ensure that everyone is viewing and referencing the same document, if there are too many people making changes, errors can easily escape notice and break entire spreadsheet formulas (or possibly corrupt the file). Even then, people may clone their own version to work on independently, which creates the same version control challenge posed by Excel-dependent files. 

2. Security

Familiarity has a way of breeding complacency. That’s certainly true when it comes to sharing XLSX files. People are accustomed to sending and receiving spreadsheets over email and other messaging platforms. What they may not realize, however, is that 38% of malicious email attachments disguise themselves as Microsoft Office file types. The last thing a tax or accounting firm wants is for an employee to accidentally infect their network with harmful malware by opening what they thought was a spreadsheet. At the same time, even conventional spreadsheet collaboration can pose a serious security risk. Excel files offer limited security controls, and downloaded or shared files could be easily hacked to compromise important financial data. With more people working remotely in response to the COVID-19 pandemic, FinTech accounting software needs to account for the common security risks posed by home offices while still meeting consumer demands for high-speed, low-friction digital solutions in 2020 and beyond.

3. Asset Protection

Spreadsheets often contain more than just important financial data. The spreadsheet formulas buried within the many rows and columns of cells may represent important intellectual property for a tax or accounting firm. Any time a company shares a spreadsheet, it runs the risk of those formulas being stolen and distributed. Even if these proprietary assets remain safely tucked away within the spreadsheet, there’s still the matter of anyone with a copy of the file being able to use it however they want, potentially cutting into the firm’s business.

4. Workflow Efficiency

Managing a large number of independent XLSX files can quickly become burdensome for any organization. Take, for example, a situation where a tax firm’s customers must download a spreadsheet to enter their tax information and then send that file back to the firm so the data can be entered into its FinTech accounting software. Not only does this create numerous opportunities for manual errors, but it also introduces several unnecessary (and potentially risky) steps into the process. What if a file is not attached to an email? Or if someone downloads the spreadsheet, but then misplaces it? How does the tax firm verify that the version sent back to them is the most up-to-date version? This approach to spreadsheet collaboration ends up wasting time and is highly prone to mistakes.

5. Software Dependencies

While Excel may be the most widely used spreadsheet software in the world, that doesn’t mean every organization has access to it. Smaller companies and startups are much more likely to rely upon cloud-based tools like Google Sheets due to their low cost and ease of online spreadsheet collaboration. Although Google’s Chrome browser offers extensions capable of reading, viewing, and editing XLSX files, the conversion process is often imperfect due to differences in feature sets. Transferring data back and forth between Excel and other spreadsheet programs can create formatting problems and potentially break internal formulas. 

The PrizmDoc Cells Solution

One of the best ways for FinTech accounting software developers to address these issues is to simply integrate spreadsheet viewing and editing functionality into their applications. PrizmDoc Cells is a web-based spreadsheet editor that natively supports XLSX files by storing them on a secure server and allowing users to interact with them online through an Excel-like interface. 

Secure Spreadsheet Functionality

PrizmDoc Cells provides essential spreadsheet features within a familiar UI. After opening an XLSX file, users can review and edit cell content within a secure web-based environment. Firms can also restrict features to protect spreadsheets from errors and unauthorized alterations. 

No Microsoft Dependencies

Deployed entirely within a Docker container, PrizmDoc Cells can import, view, edit, and export XLSX files entirely within a firm’s FinTech accounting software or web-based application. No one needs access to a copy of Microsoft Excel to access files.

Manage End-User Access

In addition to hosting their source files securely within a proprietary server or private cloud environment, organizations can control what end-users can access within the spreadsheet. Proprietary data and spreadsheet formulas can be safely hidden from view to protect valuable IP.

Maintain Version Control

As an entirely web-based viewer, PrizmDoc Cells eliminates the need to email, copy, or download spreadsheets, ensuring that the file being viewed is always the most up-to-date version. Editing access can also be adjusted to ensure that only authorized users are able to make changes.

White Label Customization

Developers can easily remove all branding to seamlessly integrate PrizmDoc Cells with their applications and FinTech accounting software.

Say Goodbye to the Old Way of Spreadsheet Collaboration

Today’s tax and accounting firms need to work more efficiently than ever before to keep up with the demands of their clients. They can’t afford to keep relying upon outdated approaches to spreadsheet collaboration. The pressure is on for FinTech developers to build applications capable of accommodating their security, workflow, and version control requirements when it comes to spreadsheets. 

With PrizmDoc Cells, developers can build FinTech accounting software solutions that allow for true online spreadsheet collaboration without compromising the security or control organizations expect from their applications. Experience the functionality of PrizmDoc Cells firsthand by trying a demo today. To get a closer look at how PrizmDoc Cells will operate in your own development environment, sign up for a free trial.

Document image cleanup is a vital step in building an efficient and accurate processing workflow. In a perfect world, every file an organization receives would be in pristine, high-resolution condition so it could be processed quickly and easily. Unfortunately, the reality is that documents come in all sizes, conditions, and formats. Companies can receive vital information in the form of email, traditional mail, fax, or even text. Documents scanned into a crooked, low-resolution file are just as likely to be received alongside digital versions submitted entirely through a web application.

This poses a significant challenge for software developers building the next generation of automation solutions. Without some way of cleaning up document images, companies that still rely upon manual processes will struggle to read and process files. More importantly, poor image quality interferes with optical character recognition (OCR) engine accuracy, making more human interaction necessary to verify recognition results. By integrating document image cleanup tools into their applications, developers can enhance the speed and accuracy of their automated processes and help their customers leverage the full potential of digital transformation.

7 Essential Document Image Cleanup Features Your Application Needs

There are a few essential document image cleanup tools that should be considered absolutely essential for any application that has to manage multiple file formats. To see these tools in action and understand why they’re so vital, let’s take a look at how these features work in ImageGear, Accusoft’s powerful document and image processing SDK integration.

1. Despeckling

Speckles can appear on document images for a variety of reasons. In some cases, they are unwanted image noise created during the original scanning process (the classic “salt and pepper” noise), but in other instances, they’re simply the result of dust particles on the surface of a scanned document or on the scanner itself. They are frequently encountered when converting old documents into digital form. Speckling not only interferes with OCR engine performance, but can also make it difficult to maintain image fidelity when compressing or converting files. 

ImageGear can reduce or eliminate speckling as part of the document image cleanup process. There are two ways to approach speckle removal:

  • Despeckle Method: This function removes color noise from 1-bit images by taking the average color value in a square area around the speckle and replacing its pixels with that value.
  • GeomDespeckle Method: This function uses the Crimmins algorithm to send the image through a geometric filter, reducing the undesired noise while preserving edges of the original image. This process is applied only to 8-bit grayscale images.

2. Image Inversion

With so many documents being scanned, converted, and transferred between applications, there’s a greater likelihood of something going wrong along the way. One of the most frequent problems is image inversion, which swaps pixel colors and turns a standard white background with black text into a black background with white text. This mix-up can render documents completely unreadable by OCR engines.

ImageGear can be configured to automatically recognize when image inversion is necessary. The invert method can also be used to immediately change the color of each pixel contained in the entire image, turning white to black and black to white.

3. Deskewing

Skewed document images are both cumbersome to manage and challenging for OCR engines to read accurately. Unfortunately, manually scanned documents are often uneven, and the problem is only becoming worse now that many people are using their phone cameras as makeshift document scanners. That’s why the first step in the document image cleanup process is often deskewing, which rotates and aligns the images to enhance recognition accuracy.

The deskewing process often involves more than just rotating a document, especially where images taken by a digital camera are concerned. ImageGear’s 3D deskew feature corrects for perception distortion, which can occur whenever a document is scanned by a handheld camera, using a sophisticated algorithm.

4. Blank Page Detection

Many documents converted into digital format contain information on both sides. If they are fed into a scanner along with single page documents, the resulting file will contain multiple blank pages. This might not seem like much of a problem, but if there is enough speckling or noise around the edge of the image, an application may try to apply an OCR engine to it and generate an error result. Blank page detection can quickly identify any image that is blank or mostly white and flag it for deletion.

5. Line Removal

Although they may not seem very troublesome at first glance, lines can create a number of problems for OCR engines. When lines and printed text overlap, it can be difficult for the engine to distinguish between the two. In some instances, the engine may even misread a line as a letter or number. Removing lines from a document prior to OCR reading ensures that the remaining text will be recognized more quickly and analyzed more accurately.

ImageGear supports both solid line removal and dotted line removal. The first method automatically detects and removes any horizontal and vertical lines contained in the document (like frames or tables), while the second method determines which dotted lines to remove by measuring the number and diameter of dots.

6. Border Removal

When scanned documents don’t align properly with the boundaries of the scanner or were copied onto paper that was larger than the original image at some point, the remaining space is often filled in with black. These borders are not only unsightly, but they also interfere with other document image cleanup processes. Although they can usually be cropped out easily, the cropping process alters the proportions of the image, which could create more problems later.

Removing these large black regions is easy with ImageGear’s CleanBorders option. It focuses on the areas near the edge of the page, which typically should not contain any important image data. 

7. Remove Hole Punches

Important documents were often stored in binders before they were prepared for digitization. When scanned, the blank space from the hole punch leaves a large, black dot along the edge of the document. Unfortunately, these holes sometimes overlap with text or could be picked up as filled-in bubbles by an optical mark recognition (OMR) engine.

ImageGear can identify and remove punch holes created by common hole punchers, including two, three, and five hole configurations. The RemovePunchHoles method can be adjusted to account for differing hold diameters in addition to different locations.

Unlock Your Application’s Document Image Cleanup Potential with ImageGear

Although ImageGear can perform a variety of document handling functions such as viewing, conversion, annotation, compression, and OCR processing, its document image cleanup capabilities help applications overcome key content management challenges and enhance performance in other areas. Improved document image quality allows data to be extracted more quickly, enhances the viewing experience, and reduces complications when it comes to file compression and conversion.

Learn more about the ImageGear collection of SDKs to discover how they can deliver versatile document and image processing to your applications.

scalable vector graphics

The scalable vector graphic (SVG) format continues to enjoy steady adoption across the web. According to data from W3Techs, SVG now accounts for 25 percent of website images worldwide. But it wasn’t always this way. In 1998, it became apparent that vector-based graphics had a future on the web, and the W3C received six different file format submissions from technology companies that year. Some were mere proposals ready for a complete revamp, while others were proprietary products that W3C wasn’t permitted to modify. Instead of forging a format from one of the submissions, however, W3C’s SVG working group decided to start from the ground up — and SVG was born.

While the file format had lofty ambitions, focusing on common use rather than specific syntax, the original iteration was cumbersome and complex. However, SVG has improved year after year after year. With increased support came more streamlined functionality and usable features. Now, SVG is often the first choice for meeting the evolving demands of scalable, responsive, and accessible web content.


What is a Scalable Vector Graphic (SVG) and how does it work?

Today, SVG is the de-facto standard for vector-based browser graphics. But what exactly is this file format, and how does it work?

Based on XML, SVG supports three broad types of objects: 

  • Vector graphics including paths and outlines that are both straight and curved
  • Bitmap images such as .jpeg, .gif, and .png
  • Text

What sets SVG apart from bitmap-based images is the use of lines and curves along the edges of graphical objects. Because bitmap images use a fixed set of pixels, scaling them up creates blurriness where the edges of pixels meet. In the case of vector images, meanwhile, a fixed-shape approach allows the preservation of smooth lines and curves no matter the image size.

SVG also offers the benefit of interoperability. Because it’s a W3C open standard, SVG plays well with both other image format and web markup languages including JavaScript, DOM, CSS, and HTML. This allows the format to easily support responsive design approaches that scale websites and web content based on the user device rather than defining standardized size parameters. Thanks to the curves and lines of SVG, scaling presents no problem for responsive designers looking to ensure consistency across device types.


The Benefits of SVG

While scalability is often cited as the biggest benefit of SVG, this format also offers other advantages, including:

  • Responsiveness — Images can be easily scaled up or down and modified as necessary to meet web design and development demands.
  • Accessibility — Since SVG is text-based, content can be indexed and searched, allowing both users and developers to quickly find what they’re looking for.
  • Performance Image rendering is quick and doesn’t require substantive resources, allowing sites to load quickly and completely.
  • Use in Web ApplicationsBrowser incompatibilities and missing functions often frustrate web design efforts, forcing developers to use multiple tool sets and spend time checking content and images for potential format conflicts. SVG, meanwhile, offers powerful scripting and event support, in turn allowing developers to leverage it as a platform for both graphically rich applications and user interfaces. The result? Better-looking sites that enhance the overall user experience.
  • InteroperabilityBecause SVG is based on W3C standards, the format is entirely interoperable, meaning developers aren’t tied to any specific implementation, vendor, or authoring tool. From building their own framework from the ground up to leveraging third-party SVG applications, web developers can find their format best-fit.

SVG in PrizmDoc Viewer

Accusoft’s PrizmDoc Viewer offers multiple ways for developers to make the most of SVG elements at scale, such as:

  • File TransformationConversion is critical for effective and efficient web design. If development teams need different file transformation tools for every format, the timeline for web projects expands significantly. PrizmDoc Viewer streamlines this process with support for the conversion of more than 100 file types — including PDFs, Microsoft Office files, HTML, EML, rich text, and images — into browser-compliant SVG outputs. In practice, this permits near-native document and image rendering that’s not only fast, but also accessible anytime, anywhere, and from any device.
  • HTML5 FunctionalityUsing SVG in PrizmDoc Viewer is made easier thanks to native HTML5 design. The use of HTML5-native framework not only improves load times with smaller document sizes but means that PrizmDoc Viewer works in all modern web browsers — while also dramatically enhancing document display quality.
  • Pre-Conversion One of the biggest challenges with viewing large documents in a browser is delay. Pages toward the end of the document may take longer to load and frustrate users looking to quickly find a specific image or piece of information. PrizmDoc Viewer solves this problem with a pre-conversion API that returns the first page as an SVG while the rest of the document is being converted, allowing users to interact with documents as conversion takes place and lowering the chance that files will experience format-based delays.

SVG hasn’t always been the go-to web image format. Despite a promising start based on open, interoperable standards, the lack of early support and specific use cases for vector-based file formats saw SVG sitting on the sidelines for decades. 

The advent of on-demand access requirements and mobile-first development realities has changed the conversation. SVG is now continuously gaining ground as companies see the benefit in this scalable, streamlined, and superior-quality file format. Get the big picture and see SVG in action with our online document viewing demo, or start a free PrizmDoc Viewer trial today!