From Continuous Integration To Continuous Delivery With PrizmDoc

Written by: Cody Owens

- How quickly can your team take a code base, package it, test it, and put it into the hands of your customers?

- Can you push a button to automagically make it happen?

And, once your customers have your product, can you be confident that it will work for them out of the box?

We at the Accusoft PrizmDoc group asked ourselves those questions in 2016 and discovered a wide array of opportunities to improve how we deliver our product.

Here we share how we reduced our feedback cycle from three months to three days, enabling rapid delivery of beta PrizmDoc builds and confident, seamless delivery of release builds.

What is Continuous Delivery?

Continuous Delivery, the movement toward rapid working releases, focuses our efforts on knowing as quickly as possible when to release a change to customers. Whereas Continuous Integration focuses on taking code and packaging it, Continuous Delivery goes a step further by identifying what to do with that package before release.

A common assumption is that when code works in one environment, it should work in others. But through the lens of Continuous Delivery, we have to assume our product is guilty until proven innocent. And how do we prove its innocence? Automated testing in production-like environments. In this way, we can be confident that at release time our product won’t explode on takeoff.

Moving from testing on a small, dedicated environment to a many production-like environments can be complex. But implementing a Continuous Delivery release workflow is well worth the effort. The product will be deployable throughout its lifecycle. Everyone – not just the development team – can get automated feedback on production readiness at any time. Any version of the product can deploy to any environment on demand. And in our case, beta builds can release to customers for early feedback on bug fixes and new features. Together, we realized that these benefits far outweigh the cost of putting up with release pain year after year.

Evaluating Our Starting Point

Like most modern software teams, we believe in the value of test-driven development. We already had many thousands of unit, contract, and integration tests verifying the ability of our products to solve business needs. So, we could be confident that the product could run on some specific environment with some specific configuration. But there were a few key problems we had to address:

Spawning many production-like environments was uneconomical
We could not automatically test the GUI of our product
There were no explicit, automated performance tests against business-valued behaviors

Testing On Realistic Environments

We tackled the expense of production-like environments first. At the time, we were using Amazon Web Services EC2 instances for build agents that could test and package code. On each code change, new instances launched to run tests. While these instances were fast, reliable and cloud-based, they were uneconomical. And because spending gobs of money is effortless when spawning instances for testing or development, access was guarded. Reevaluating our needs, we realized that scalability or flexibility of the cloud wasn’t necessary for testing purposes. We knew we needed to shut off the cloud-hosted cash vacuum – but what was our alternative?

Hybrid cloud is becoming attractive as a best-of-both-worlds solution to cloud-hosting needs. Perhaps a more accurate term is “local cloud hosting” – on-prem value but with most of the features offered by the “real” cloud. To this end, we turned to OpenStack as our EC2 replacement for development builds. With OpenStack, we can still spin up instances, store VM images and snapshots, create load balancers and more without the cost associated with the cloud. A single investment in the local hardware was comparable in cost to one additional year of cloud usage. If it didn’t turn out so well, we could just switch back a year later.

After flipping the switch, we transferred our build agents to OpenStack’s hybrid cloud. Before, some tests took many hours and could only run once per day or even once per week. But with the reduction in testing costs, we now run the dailies at every commit and the weeklies every day. This difference in feedback time is monumental; developers can be confident that their new code won’t fail automated tests a week later after the user story has closed.

As we increased our hybrid cloud test agent workload, we ran into a unique problem. As opposed to running instances in the “real” cloud, we now have to deal with hardware limitations. We have a specific number of physical CPUs available. We have a specific amount of memory to use. This forced us to rethink what tests we ran and how we ran them.

Failing Fast, Failing Cheap

To optimize our resource usage, we need bad commits or configuration changes to fail fast and early. When one stage fails, the next stage(s) shouldn’t run because that build isn’t releasable. We needed a way to schedule, chain and gate test suites.

Enter Jenkins. Jenkins is a flexible build system that enables a simple pipeline-as-code setup for all sorts of purposes. In our case, we opt to use it as the platform that pulls the built product installer, installs it and runs batteries of progressively stringent tests against it. A stage can run tests against multiple nodes. We created production-like nodes that launch from our hybrid cloud and use the built-in gating functionality in Jenkins. Subsequent test stages don’t run following a test failure. Since pipelines are version controlled, we always know exactly what changes affect a given run.

Testing Like A User

By this point, our tests can run inexpensively and easily across production-like environments. This enabled us to rethink what tests we were running and build upon our coverage. At release time, we spent a sprint across two teams just to test deploying the product and pushing buttons to verify the GUI worked. The plain English test instructions were subject to interpretation by the tester, leading to nondeterministic results from release to release. This cumbersome effort was necessary to test the GUI sitting on top of our core API product.

While this manual testing process uncovered bugs nearly every release, it was unthinkable in terms of ROI per man-hour. The late feedback cycle made product GUI changes stressful. A developer might not know that the GUI component they just added is unusable on an Android device running Firefox until the release testing phase three months later. Finding bugs at release time is dangerous, as not all bugs are always resolved before the release deadline. Regressions and bugs might make their way into the product if they’re not severe, or they might postpone delivery of the product altogether.

Automating these types of manual tests improves morale, reduces feedback time and asserts that the GUI either passes or fails tests in a deterministic way. Furthermore, it opens a route to Behavior Driven Development language that centers around business-valued behaviors on the front end of the product. For instance, we use the Gherkin domain-specific language to author tests in plain English that are parseable by a testing code base into real executed test code. Non-technical members of the team can author plain English “Given [state of the product], When I [do a thing], Then [a result occurs]” feature descriptions that map 1:1 to test code.

Today, all major browsers have automation REST APIs to enable driving them in a native non-JavaScript way without a real user. To eliminate the hassle of changing test code between browsers or authoring reliable tools to talk to those automation APIs, we use Selenium WebDriver. WebDriver is available in many popular languages including Java, Python, Ruby, C#, JavaScript, Perl and PHP.
From BDD test code, we execute end-user tests with WebDriver to verify real usage of the product. Because the WebDriver APIs enable “real” user events and not JavaScript event simulations, we can be confident that mouse, touch and keyboard actions actually do what we expect across a range of platforms. On test failures, we take a screenshot and save network traffic logs from the browser to trace the failure back to a front end or microservice source. Some test authors even automatically save a video of the last X seconds leading up to the failure to investigate unexpected, hard-to-reproduce behavior.

Altogether, these new front-end tests enable us to supplant the rote work of fiddling with the product across different browsers and devices for each release. They give us rapid feedback for every commit that the product has not broken for a front-end user. Before, we couldn’t know until release testing. Development confidence goes way up and agility improves as we can guarantee that we won’t have to interrupt the next sprint to fix an issue introduced by new code.

The Value Of Manual Tests

This is not to say that manual testing should be supplanted by automated testing. Exploratory testing is necessary to cover complicated scenarios, unusual user behaviors and platforms that automated tests don’t cover. Not everything is worth the time investment of automating. Bugs found during exploratory tests can be fixed and later covered by automated tests.

Your product’s test coverage should look like a pyramid where unit test coverage is thorough, integration tests are somewhere in the middle, and product-level end user tests are broad but not deep.

As expensive as manual testing can be, authoring and maintaining end-user tests can be expensive if done poorly. Changes to the front-end of the product can break all the GUI tests, though using the Page Object design pattern can mitigate this. Browser updates can also break end-user tests. Poor product performance can lead to unexpected behavior, resulting in failed tests. And not all browser platforms support all parts of the WebDriver spec, resulting in edge cases where JavaScript does need to be run on the page on that platform to fill in the gap.

Keep end-user tests broad and don’t use them as a replacement for in-depth, maintainable integration and unit tests. If a feature is testable at the unit or integration level, test it there!
On the PrizmDoc team, we’ve freed up weeks of regression testing time at release through adding these end-user automation tests. After cursory end-user regression tests, we host a fun exploratory Bug Hunt with prizes and awards.

Who can find the most obscure bug? The worst performance bug? Who can find the most bugs using the product on an iPad? Your team can gear testing efforts towards whatever components are most important to your customers and raise the bar on quality across the board.

Automating Nonfunctional Tests

Performance and security, among other nonfunctional requirements, can be just as important to our customers as the features they’ve requested. Let’s imagine our product is a car. We know that the built car has all the parts required during assembly. We also know that the car can start up, drive, slow down, turn and more.

But we don’t know how fast it can go. Would you buy a car that can only go 20 MPH? What if the car didn’t have door locks? These concerns apply similarly to our software products.

The next step, then, is to automate tests for nonfunctional requirements. Even one bad commit or configuration change can make the product unacceptably slow or vulnerable. So far, we have added automated performance tests using Multi-Mechanize. Many similar tools can accomplish this task so there’s no need to dive into details, but the key point is configurability.

Our customers don’t all use the same hardware, so it doesn’t make sense to test on every possible environment. Instead, we focus on measuring performance over time in a subset of production-like environments. If performance goes below a particular threshold, the test fails. With configurability in mind, if a customer is evaluating whether to use PrizmDoc, we can simply deploy to a similar environment (CPUs, memory, OS type, license, etc) and gather metrics that will allow them to easily plan capacity and costs, which can often seal the deal.

And since performance tests run on every successful change, we can gauge the impact of optimizations. For example, we found that a microservice handled only two concurrent requests at a time. The fix? A one-line change to a configuration parameter. Without regular performance tests, gathering comparative performance and stability would be difficult. With regular performance tests, however, we were confident in the value of the change.

Real Impact

Continuous Delivery has improved every aspect of the PrizmDoc release cycle. Customers praise our rapid turnaround time for hotfixes or beta build requests. We now thoroughly measure the delivery value of each commit. End-user tests verify the GUI and performance tests cover our nonfunctional requirements. The built product automatically deploys to a range of affordable production-like environments. Any member of the product team can get release readiness feedback of the current version at a glance. Instead of a three month feedback cycle, developers see comprehensive test results against their changes within a day. The difference in morale has been tremendous.

If your organization is not quite there yet, we challenge you to start the Continuous Delivery conversation with your team. Hopefully our experience has shed light on opportunities for your product to make the jump. You might get there faster than you expect.

About the author

Cody Owens is a software engineer based in Tampa, Florida and contributor to continuous deployment efforts on Accusoft’s PrizmDoc team. Prior to his involvement with document management solutions at Accusoft, he has worked in the fields of architectural visualization and digital news publishing. Cody is also an AWS Certified Solutions Architect.