Technical FAQs
Ever feel like there aren’t enough hours in the day? Here’s a scenario which will undoubtedly resonate with you if you are like several of our clients. You’re a senior app developer for a provider of data management solutions for the insurance industry.
There are two weeks left before you head off for vacation. Your team is working on UI and code fixes which arose out of UAT on your company’s latest major upgrade cycle. Everything was wrapping up nicely, until the product manager (PM) approached your desk with a peace offering of a double shot latte and a Philly cheesesteak sandwich.
You quickly discover that the peace offering is due to the fact that the upgrade was supposed to include a document and file management functionality, but it had been left off the project EPIC.
The PM hands you a list of functional specs, including a mobile-friendly document viewer, markup, document capture, image compression, eSignature functionality, and PDF OCR for document discoverability.
She has that “deer in the headlights” look as if you are about to go to Defcon 5. Instead, you smile, thank her for breakfast, and gather the troops for a quick scrum. Your application was built for extensibility requirements like this, and RESTful APIs are your jam.
You and your team don’t have to write the base code to meet these requirements. You and your colleagues just have to buy the API licenses, download some code, access some cloud-based functionality, and work your magic on the APIs to wrap it into your platform.
Why Labor with Lumber When You Build with Lego?
Even if you had these requirements out of the gate, you probably would have sought out some APIs to help you build, right?
At the end of the day, the goal is to increase information worker productivity and effectiveness within your client base and delivering the kind of value which ensures lasting relationships. You never want to reinvent the wheel and add more project scope than necessary.
Should your team rush to build custom functionality into their native application, and risk non-compliance with regulatory standards like HIPAA, ISO 27001 or SOC 2 or 3?
Not likely, especially when there are probably additional requirements for viewing and marking up raster and vector files. There might even be extra requirements for integration with SharePoint or CRMs.
Buying vs. Building Applications
API code can always be skinned with a custom user interface to provide a seamless user experience. Developers shouldn’t be discouraged from hacking the development process by leveraging existing code to accelerate deployments.
It’s the premise platforms like GitHub were built on. Instead of months of development work, an API/SDK package can exponentially accelerate time to market. If an application development team has adopted DevOps or Agile methods, strategies to ensure continuous development and continuous integration are key.
An article in CIOReview.com says it best: “By using APIs, you extend the capability of your development team. The key is to ensure the vendor, application, and the APIs all exceed your expectations.”
ImageGear and PrizmDoc
Organizations in regulated industries like government, healthcare, and financial services create, manipulate, and store a lot of electronic and physical documents. They often need functionalities like viewing, converting, and redacting that help them stay productive but also compliant.
The combined API packages support more file types, compress files for faster loading, and offer customers a mature, secure, and proven document viewer for sensitive files. Healthcare facilities often have devices which don’t have the capacity to manage native document viewers, so browser-based viewers offer better viewing and editing capability.
Documents are often scanned in large batches and require post-capture enhancement, such as despeckling, deskewing, and OCR to enable keyword/phrase searching within individual files and images. Need to enable your customers to remain within your application, yet complete multiple tasks? Why not enable some functionalities like:
- Converting multiple file formats into one document
- Building and capturing forms
- Watermarking
- Signing a document without importing another third-party service
- Gaining quality control on a scanned image
- Comparing two similar documents such as contracts by redlining them for legal review
Building all of this functionality is a lot of heavy lifting. Our APIs are mature, QA-vetted, and ready to integrate with apps which are coded in C, C++, Java, or .NET.
For those applications which already use PrizmDoc or ImageGear, integrating the other API into production alongside it is a seamless process. Those developers who have worked with your support team will be familiar with getting any assistance they may require.
Start with Core APIs and Extend as Business Evolves
In fact, when PrizmDoc Viewer and ImageGear work together, they provide the ability to recognize more file types, search within PDFs and images, create new documents from different file types, and reorganize content to create a brand new document.
Especially useful for businesses that process a variety of different file types, these SDK and API integrations revamp your application, making your application’s end-users more productive and efficient.
Ready to enhance your document and imaging functionality in your applications, without complex coding? Create an Accusoft account, or use your login to get started.
One of the more challenging aspects of developing SDKs with machine learning models is deployment and productionization. TensorFlow in particular can be difficult to set up, and requires GPUs to evaluate large models. This post will share my experiences in skirting this process entirely to quickly evaluate a FasterRCNN-based model during a hackathon last year, usable on any office or CI machine.
During this hackathon, I implemented and trained a model from a paper from ICDAR 2017 on one of our physical machine learning-equipped machines. To achieve quick deliverables, rather than try to get the trained model and data off the machine, I simply used a tool called Luminoth running on the machine to expose the model’s prediction functionality. This also allowed anybody on my team to continue developing the model afterward with minimal friction, and required only a small networking shim in our codebase.
Luminoth is a Python-based tool that I like to refer to as “a command line wrapper around TensorFlow.” While the use of a YAML file to quickly set up and train some popular networks such as FasterRCNN is its main use, it also exposes a Flask-based server which allows prediction queries via a web page. As it turns out, it also exposes an (undocumented) API which is usable programmatically.
My codebase is in C++ with a C# assembly wrapping it. That being the case, I had to get my model’s predictions (a number of bounding boxes) into C++ code, and fast. Figuring out TensorFlow’s shaky C++ API (or even using Python-based TensorFlow) wasn’t an option. The model was already trained on our machine-learning computer, and would have required a large setup cost and data duplication by anyone else evaluating the model. I had my eye on a particular C++ networking library, CPR, that I have been meaning to use; so I thought, why not tackle all of these problems at once?
Let’s start by figuring out Luminoth’s API from the source and web page itself.
First, using Lunimoth’s server as per the documentation shows requests being made to an endpoint named `api/fastercnn/predict`. We can see it’s returning some JSON–great, we now know it’s probably possible to invoke programmatically!

Digging in Luminoth’s web.py, around line 31 at the time of writing, the corresponding endpoint `/api//predict/` method is our ticket.
The first thing we see is an attempt to retrieve the image data from the request to predict:
try:
image_array = get_image()
except ValueError:
return jsonify(error='Missing image'), 400
except OSError:
return jsonify(error='Incompatible file type'), 400
What is get_image() ? Well, it shows an expectation of a POST’ed file by the name of ‘image’.
def get_image():
image = request.files.get('image')
if not image:
raise ValueError
image = Image.open(image.stream).convert('RGB')
return image
This is a Flask web server. The Flask documentation for the files property in the Request object shows that this only appears (for our purposes) in a POST request, with a <form> object, and when an encoding of enctype=”multipart/form-data” is given. Right, sounds like we now know how to use the endpoint programmatically. Now, how can we call this from C++ using CPR?
Let’s start with the POST request. Using CPR, this is very straightforward. The required multipart/form-data encoding is handled by the cpr::Multipart object. At the time of writing, there is a bug with that and data buffers; so in order to proceed with the hackathon, the image was first written to a file, reloaded, and then sent. Don’t do that if possible.
extern "C" __declspec(dllexport) void* SendImagePostRequest(const char* url, unsigned char* data, int data_size)
{
std::string filename = WriteTemporaryFile(data, data_size);
auto response = cpr::Post(
cpr::Url{ url },
cpr::Multipart{{ "image", cpr::File{ filename } }});
std::remove(filename.c_str());
return ParseJsonResults(response.text);
}
Where url is the URL of the Luminoth endpoint we found, and data and data_size are the image we are trying to use FasterRCNN to predict. When used, it looks like this:
void* resultsHandle = predictTables("http://beast-pc:5000/api/fasterrcnn/predict/", image.data(), (int)image.size());
The POST request returns a JSON string. We need to decode it. Luckily, there is superb header-only Json library, Nlohmann Json (which I think has the potential to be part of the C++ STL; by all means use it), we can drop right in and get a vector of RECTs and their confidences back:
static std::vector* ParseJsonResults(const std::string& response)
{
auto json = json::parse(response);
std::vector* results = new std::vector();
for (const auto& object : json["objects"])
{
const auto& bbox = object["bbox"];
float confidence = object["prob"];
results->emplace_back(RECT { bbox[0], bbox[1], bbox[2], bbox[3] }, confidence);
}
return results;
}
Note that the boxes are returned in a X/Y/Right/Bottom format. If you need a X/Y/Width/Height format, it’s easily convertible. From then on, the bounding boxes can be passed on throughout the codebase, and improvements of the method over current methods can be measured.
You’ll have to excuse the use of void pointers, pointers to vector, new, and other frowned-upon items. The use of CPR also required an additional problem here. The C++ codebase is in MSVC 11.0, and CPR requires MSVC 14.0 or later. To integrate this, a separate DLL was created and loaded dynamically via LoadLibrary in the main source, so a C API was created. But these are implementation details. And again, it was simply the quickest way to get results.
That’s about it for this post. All-in-all, I believe Luminoth is an underrated, but also unfinished, machine learning tool. It’s a good choice for having a quick way to train, save state, and evaluate neural networks. The API allows high-speed integration of a model into existing code in any language, after which a results analysis can determine if to further productionize the model or not.
Financial institutions are spending on technology. As noted by IDG Connect, solutions such as AI-driven analysis and investment tools could boost revenue by 34 percent. In addition, 72 percent of senior management view artificial intelligence and machine learning (ML) as critical market advantages.
It makes sense. Banks, credit unions, and fintech firms must now meet evolving consumer expectations and satisfy emerging compliance legislation. The challenge? Ensuring existing processes — such as check image handling at ATMs and data verification during loan applications — are both streamlined and secure.
Fortunately, there’s a simple starting point: image processing.
Bridging the Data Divide
According to a recent Accenture survey, several emerging trends now inform the consumer landscape in finance. What’s the most important to data-driven organizations? Trust. While 67 percent of clients will now permit banks access to more personal data, 43 percent cite trust as the biggest driver of long-term loyalty. What’s more, 63 percent want banks’ use of personal data to drive more individualized, value-added services.
ATMs provide a key component of this data-driven strategy. For example, many ATMs use the X9.100-181 standard to store and secure .tif files. To ensure customers and bank staff have access to the right data at the right time, companies need image software capable of capturing, processing, and manipulating these images in real-time — in turn underpinning the development of agile web-based and mobile applications that engender consumer trust.
Processing, Permission, and Potential
Also critical for banks? Compliance. Consider the evolving standards of GDPR. As noted by Forbes, the regulation includes provisions for the right to access, which entitles consumers to information about how and why their data is processed by organizations.
Given the sheer volume of data now processed by financial institutions — and the growing risk of network data breaches — meeting compliance expectations is both time and resource intensive. Add in the increasing number of consumers now submitting checks via ATMs or mobile deposit software, and companies face the problem of accidental data misuse. What happens if check or loan data is shared across departments but customers haven’t specifically given their permission?
Redaction can provide the security you need to keep sensitive information secure. By combining ease of capture with straightforward redaction services, it’s possible for banks to ensure that check and application X9.100-181 .tif data is natively secured, in turn limiting potential compliance pitfalls.
Controlling Complexity: Not Always Black and White
In the years following 2008’s nationwide financial collapse, many financial firms drafted long-term plans designed to reduce complexity and streamline operations. According to a recent Reuters piece, however, despite ambitious plans “the level of complexity remains high for U.S. banks.”
Here, consumer expectations and compliance demands conspire to increase total complexity. From cloud-based technologies to mobile initiatives and ongoing compliance evaluations, streamlining processes often takes a back seat to mission-critical operations. Check imaging and recognition is no exception. Companies need tools capable of handling color, black and white, and multi-layered images. The solution? Powerful software development kits (SDKs) that integrate with existing apps to provide on-demand functionality.
Piece by Piece
Meeting consumer expectations, satisfying compliance requirements, and reducing complexity is a multi-faceted, ongoing process for financial organizations.
Accusoft’s ImagXpress SDK provides a critical piece of this long-term goal with support for 99 percent of common financial image types, optimized compression/decompression technology to minimize wait times, and enhanced redaction and editing capabilities to empower ATM, loan application, and mobile app image processing. Learn more about Accusoft’s SDKs and APIs here.
Gerry Hernandez, Accusoft Senior Software Engineer
Test-Driven Development (TDD) is a buzzword to some, yet a way of life to others. Some say Behavior-Driven Development, or BDD, is TDD done right. Cucumber made BDD popular, promising wonderful features such as:
- Writing specifications in Gherkin, an English-like grammar, rather than code
- Allowing anyone, even non-developers, to read and write tests since they’re English-like
- Reusing code by reusing Gherkin statements
- Driving tests with data
Everyone is happy. Developers don’t waste so much time, plus other non-technical stakeholders get to participate. All of this sounds fantastic, right? So we tested quite a wide gamut of BDD frameworks, all based on the original Cucumber reference implementation: Robot, Behave, CucumberJS, and Yadda.
Accusoft’s Services team, which is responsible for a large and growing collection of microservices, determined that none of these BDD frameworks work for us. The added magic of Gherkin and Cucumber impedes the natural progression of real-world systems built with modern, microservice architecture. Naturally, we made our own BDD-like methodology, affectionately known as SURGE: Simulate User Requirements Good-Enough. We went from barely being able to maintain 50 tests, to over 700 automated functional tests with continuous, rapid growth in coverage.
We specifically chose a silly name to philosophically align with our goals for this new methodology:
- Be practical
- Be productive
- Keep it stupid simple
- Keep it minimal – “you ain’t gonna need it”
This blog post begins a series of articles that will recap our journey toward effective test automation with our microservice architecture. For part one, we will share our experiences and lessons learned from prototyping traditional BDD into our development lifecycle. Part two will focus on the methodology and philosophies associated with SURGE. Finally, part three will provide an overview of our implementation. The series will mainly focus on conceptual patterns and practices that are universal to all programming languages and runtime environments, although we chose Node for our particular implementation of SURGE.
BDD: A Great Solution for the Wrong Problem
There’s no denying that BDD and Cucumber have positively influenced the software development industry and culture. Test-Driven Development is a sound idea, and BDD was the first widely established way of doing it right. For most, at least.We found the methodology crumbled as soon as we applied it to non-monolithic software with a wide set of features.
Accusoft Services is composed of an extensive and ever-growing list of independently deployable services, all of which work together to provide user-facing features. For instance, just logging into the Accusoft Services portal makes a trip through six services. Suppose we wanted to define a behavior and write tests for logging into an account. With that said, the big question is “where do we put our Gherkin?”
“In theory there is no difference between theory and practice. In practice there is.” -Yogi Berra
Yogi Berra nailed it; the ideal solutions that traditional, Gherkin-driven BDD afford sound reasonable, but don’t work in the real world. Here’s what we discovered.
Global Scoping is Evil
One of the primary goals of Cucumber-like frameworks is to make all step functions available globally to all feature files. The intent is to promote code reuse. Sounds great, but it simply does not scale in any sensible way.
Initially, when we wrote just a few Gherkin features, this was working fairly well. All the magic abstraction that Gherkin provides was happening, and it was glorious. Then we added a fourth test and ran into ambiguous step definitions. This was quickly solved by rewriting Gherkin statements to be more specific so that they wouldn’t collide. Then a fifth feature was introduced and we ran into the same problem. And again with the sixth.
Eventually, it was just unmaintainable and we couldn’t work with it. Let’s face it, there’s only so many ways we could say “click the search button” without sounding completely unnatural, which is the entire point of Gherkin.
Here’s where things get interesting, and where much of the community will disagree with us. BDD best practices state that no two step definitions should ever collide, that if they do, our behavior is likely ill-defined. But we challenge that with the following two example feature specifications (this is fictional, for brevity):
Feature: Able to Use a Search Engine Scenario: Searching on Google Given I visit google.com When I type "PrizmDoc" into the search box And I click the search button Then I see some search results
Feature: Able to Search on a Company Website Scenario: Searching on Accusoft.com Given I visit accusoft.com When I click the search link And I type "PrizmDoc" into the search box And I click the search button Then I see some search results
We feel that if a human were to read each scenario, the human would understand what to do. However, Cucumber-like BDD implementations will actually map the last three steps in each of the above scenarios to the same functions. So there are two ways of dealing with this: use unique statements or make the functions that they map to smart enough to deal with both comprehensions of a search button.
Using truly unique statements to avoid all collisions is intractable and effectively turns English into a programming language. English is already a confusing language when code isn’t involved, so why would you ever want to use it as an abstraction layer? Most developers have a hard enough time balancing curly brackets; we have zero interest in compounding those problems with literary devices, sentence structures, and debates on the merits of the Oxford comma. I can hardly even write this blog post!
For a brief moment, we experimented with the latter approach: making the step functions smart enough to deal with both search buttons. Then we introduced a third test that needs to click a search button. Then a fourth. That step function now did four different things, depended on twelve stateful variables, and had a cyclomatic complexity higher than most functions in our application code. Any more and it would be too expensive to maintain.
Independent Test Suites – Not Practical
At first, this may sound like a plan. Each microservice provides a small, finite set of functionality that is well defined, so why not focus on testing that?
The most obvious showstopper is the fragmentation; it doesn’t make sense to couple a test suite with just a fraction of the code it’s actually testing. Reversing the same logic, if the other five services involved in logging into your Accusoft Services account don’t have a test suite associated with the code, it simply won’t be organized, won’t be maintained properly, and likely won’t even be executed. Not to mention, this completely breaks code-reuse among functional tests since they’re quite literally separate.
Besides, if we were to limit the scope of the behaviors only to what one specific microservice is responsible for, the answer is simple: that’s what unit tests are for. Why overthink it? Be practical.
To Be Continued…
There has to be a better way. And that’s why we came up with SURGE.
The team really loves our new Test-Driven Development practices. We had an informal discussion as a team and unanimously agreed that our approach makes sense and is producing positive results. We were never this productive with the traditional BDD methodology, and it seems like our philosophies are contagious, as other teams are beginning to collaborate on our tooling. We can’t wait to share our unique spin on BDD, the SURGE methodology, in our next SURGE-series blog post.
Until then, if this stuff is exciting to you, or even if you think we’re completely wrong and know you can kick it to the next level, we’d love to hear from you.
Happy coding! 🙂
Gerry Hernandez began his career as a researcher in various fields of digital image processing and computer vision, working on projects with NIST, NASA JPL, NSF, and Moffitt Cancer Center. He grew to love enterprise software engineering at JP Morgan, leading to his current technical interests in continuous integration and deployment, software quality automation, large scale refactoring, and tooling. He has oddball hobbies, such as his fully autonomous home theater system, and even Christmas lights powered by microservices.
Jeffrey Hodges, Accusoft Senior Software Engineer
Many factors are important for generating the best possible OCR results from image documents. One of the most important factors is to start with the best quality image possible. The OCR accuracy has a direct correlation to the quality of the document. While OCR is usually done on a black-and-white binary document, it would be better to scan the document to an 8-bit or higher bit depth image. This greater image depth can be useful for many of the image processes necessary to clean up scanner artifacts. These include light and dark specks, skew, warp, and border artifacts.
Eliminate Border Artifacts
Scanned images always have some artifacts that affect the quality of the document. Pages are almost never exactly aligned within the scanner. One effect is the addition of a border line into the image. This border is outside of the original page being scanned, but is included in the scanned document. This also happens when the page is smaller than the scanner surface. These border effects are not part of the original page and should be removed. These documents should be clipped to remove the border defects, otherwise when performing OCR these regions may yield erroneous data, increasing recognition errors.
Correct Skew
Skew is a very common effect that occurs when scanning documents. It almost always needs to be accounted for when performing OCR, otherwise the text can be
Correct Perspective Warp
When images are taken from a camera or phone, and not from a flatbed scanner, more distortion will occur. The camera takes in the whole image and there is always some distortion at different angles. Perspective warp correction is required to allow for the non-linear transformation across the image.
Eliminate Noise
The most common type of noise is extra specks within the document. These specks could be both light or dark and are most likely to occur when the document is scanned in black and white. Speck removal is the elimination of these small stray marks in the image without removing important pixels. Overaggressive speck removal will negatively affect text recognition accuracy by removing correct objects such as periods, the dot above the letter i, or other small marks, but under-removal of the specks leave noise that may be incorrectly recognized as text.
Summary
Most OCR is performed upon binary images to enable faster analysis, transforming the scanned document to text data. By scanning the document in a higher bit depth, advanced image processing can improve the quality of the document for further processing. Following this, binarization (the process of intelligent color detection and reduction of the bit depth to 1 bit per pixel) is performed to change the document to a black-and-white image suitable for OCR processing. Choosing the correct binarization algorithm can also smooth the background and flatten color regions.
Accusoft’s ScanFix Xpress SDK provides advanced document image processing to automatically clean up and improve document quality. Automatic image cleanup processes within ScanFix Xpress yield improved accuracy of subsequent OCR processing. These clean up processes also improve forms processing and intelligent character recognition (ICR).
Jeffrey Hodges is a Senior Software Engineer in the SDK division of Accusoft Corporation. He is an expert in document recognition technologies with over 20 years developing innovative solutions.