Visual Testing with Histograms
Testing is a critical part of software development. On the other hand, massive testing done in an inefficient way can kill the development processes. This post discusses some traps in automated tests, and proposes a solution which works well in several of Accusoft's test suites.
Functional Testing and the Ground Truth Mousetrap
This post is mostly about functional testing, i.e. the tests which assure the product provides results the user wants to achieve, e.g. a document displays properly, a photograph has been cropped, a button is displayed on a web page, etc. This often means that result should look as expected, rather than just have specific text or parameters.
A simple approach to testing such cases is taking a screenshot of the working product and comparing it to rasterized "ground truth" (also referred to as "gold standard" or "reference data" – another screenshot that was taken when preparing the test and approved as "this is what we expect"):
Scenario: The viewer can remove sensitive content from the document
Given I have a document with sensitive content that matches pattern
When I upload the document into the viewer
Then the document shall display as groundTruth
This is often a bad thing.
Platform and Environment Dependency
Let's consider a case with an online document viewer application, which, among other things, can remove sensitive content when displaying documents. You want to ensure such content gets successfully removed.
With the raster ground truth approach, you would load the document into the viewer, take a screenshot of the browser window, and save this as the ground truth.
You later come to realize that you support three operating systems and four browsers. The screenshots are just slightly different but you don't want to lower comparison precision so you have to generate twelve nearly identical screenshots. After following this approach for some time, the test repository grows enormously and it takes a while to only download it to the test machine.
Dependency on Irrelevant Factors
Imagine you have a few hundred ground truth images, and one of the browsers gets an update, so its client window becomes a couple pixels taller. Boom! Now, you have to re-generate all ground truth for that browser even though nothing is wrong with your web page. Or maybe something IS wrong, and functionality got broken because of the new window dimensions? Oh no, you don't just need to re-generate ground truth, you need to review all of it!
Functional tests should be orthogonal. If a test fails, it should tell what exactly is wrong with the product, rather than just "something is wrong." For example, if text layout changes, comparison with raster ground truth will fail, although the content removal feature works fine. It would be much better for page layout tests to fail instead, and content removal tests to stay successful.
Need for Manual Validation and Investigation
In order to approve a screenshot as a "ground truth," a team member has to review it with their own eyes. Similarly, when a test which uses ground truth fails, you have to visually investigate what exactly went wrong.
Reviewing a single screenshot is not a big deal, but multiplying this by the number of test cases and the number of supported environments makes it really boring. Moreover, boring manual work introduces the risk of oversight.
Learn more about the histogram approach in the rest of my article here.
Dmitry Shubin, Software Engineer
Dmitry Shubin graduated from the Moscow State University with a Bachelor's degree in Computer Science and a focus in computer graphics. He joined Accusoft in 1996 as a software developer for the ImageGear product and contributed in many areas including core design, graphic file formats support, and image processing. Dmitry is currently working on PrizmDoc Viewer. While mostly contributing to the product development, Dmitry is also passionate in sharpening the test suites. Away from work, Dmitry enjoys traveling, hiking, and playing his guitar around a campfire.