Effective Test Automation in a Microservice World - Part 3

Gerry Hernandez, Accusoft Senior Software Engineer

This is a continuation of our series of blog posts that shares our experience with functional test automation in a real-world microservice product base. In part three, we will share our implementation approach to SURGE: Simulate User Requirements Good-Enough. Be sure to read part one and part two before getting started.

Our Implementation of SURGE

This final blog post will be kept brief, as we’ll be covering the nitty gritty details of our automated test framework in a future blog post. But we do want to immediately share the most important bits of how we implemented our methodology and how we chose to separate our concerns. While this is applicable to most platforms, we did write our implementation using Node.

Generalized Test Suite Management

At the time of writing this, we have over 800 tests for our product. That’s more than I can count with my fingers, so it’s imperative to have a smart way to deal with this. Obviously, if I’m only interested in a particular slice of functionality, running the whole suite would be asinine. Through a somewhat scientific approach, we have determined that shaving a yak is not a productive use of development time.

Necessity, the mother of invention, gave birth to our SQL-driven test filters. Our SURGE implementation builds a relational model in-memory and allows the invoker (build server, human, or otherwise) to specify a query that determines which tests should run.

Sound like overkill? It’s not. The thing is, we really drive the point home that SURGE is designed to deal with less than ideal situations. For example, it’s fairly safe to assume that not every test is properly tagged/annotated. We can do things like “run all tests that contain the word ‘API’ in the title,” for example, which will give us a fairly good representation of API-only functionality.

If you’re still not convinced, realize that we don’t have a crystal ball and are unable to determine every single possibility of how the test suite should run. What we do know is that SQL is a tried-and-true grammar for generalized manipulation of a relational model. In plain English: it works in many, many situations and everyone knows how to use it.

Suppose a user gets annoyed with using SQL for casual development. No problem; we have a “starts with” filter built into SURGE. It will run all tests whose feature file starts with a string that’s passed in as a variable. The secret is that under the covers, what it’s actually doing is a SQL query.

What if you can’t express what you want as a SQL query? What if, say, you wanted extremely granular control over which tests run? We have a lower level interface called SURGE Run-Lists; an array of features, scenarios, and test cases to run. Generate it however you’d like and the framework will act accordingly. For example, you could write a script that use a Git diff to determine which tests have changed, then run only those. Actually, this is exactly how our SQL-driven test filters work: the query is used to generate a run-list, which is then fed into SURGE.

For those of you paying attention, our “starts-with” filter generates a SQL query, which generates a run-list. This layered, generalized approach gives us extreme flexibility without compromise. Most importantly, though, is that all of this can be driven by automation.

Gherkish is Not Gherkin

Gherkin implies global scoping and several layers of leaky abstractions. We impose a dialect of Gherkin that we like to call Gherkish. It differs from Gherkin in the following ways:

All steps must start with Given, When, or Then keywords. The keyword And is intentionally not supported to avoid ambiguities. “You ain’t gonna need it.”
All step definition functions must be mapped using the entire Gherkish statement; the preceding keyword cannot be omitted, like in many Cucumber implementations. This ensures a one-to-one mapping between the Gherkish and the test step definition, keeping things stupid simple.
All feature files are scoped to their own step files; steps are not shared globally. We do this by a file naming convention. It doesn’t get any more obvious than that.
All scenarios begin with the “Given test case: [testCase]” step, and therefore, all rows in the data table begin with a testCase column. This is to provide a meaningful label that describes the intent of the row, which later gets reported by the framework. This keeps things practical.

With these limitations, we’ve constricted the role of the Gherkish to simply providing a specification of behavior, along with example data used to drive the tests. It should never do anything else and there should absolutely not be any more abstraction than this.

In our framework, we use Yadda’s Gherkin parser, along with some of our custom mixins. For the most part, that wheel did not need reinventing.

Synchronous WebDriver in Node

The entire point of Node’s concurrency model is to avoid long-running, synchronous IO. Well, we broke that rule quite heavily regarding our usage of WebDriver. Using WebDriver with Promises, callbacks, or Node Fibers is ugly, confusing, and impractical. So we use synchronous bindings via WebDriver-Sync. It makes the code exponentially more understandable.

Those with Node experience may point out that long-running operations would totally break the Node concurrency model, as the execution context of our code is single-threaded. This leads to the “then how do you run tests concurrently” question, which is answered in the next section.

The Different Layers of Test Runners

Under the covers, we use Mocha to run the tests. Mocha BDD-style tests are programmatically generated at run-time from the Gherkish and example data. Mocha takes care of error/exception handling and other niceties. We have imposed one opinion on our framework when using Mocha, and it’s that we feel that all step functions should be asynchronous, either by returning a Promise, or by accepting a callback. While Mocha does allow both synchronous and asynchronous tests, we don’t allow it. It keeps the code very consistent and more resilient to future changes.

To run any number of tests concurrently, we wrote a quick-and-dirty app that simply spawns multiple processes of our test suite, orchestrating specific features to run within each process. So yes, while each Node process is single threaded and potentially blocked by our synchronous WebDriver bindings, we can run any number of processes in parallel. See? Stupid simple and good enough.

The Small Role of Step Definitions

If the rest of the SURGE methodology is followed correctly, then the step definition files end up becoming very small with almost no functional responsibility other than to keep state. Within a given feature, each step function can read and write state to a context object. Depending on the test’s context and state, it can decide what to do, which will likely either make a call to one of the shared libraries, or make an assertion. That’s it.

In a nutshell, the only thing a step definition should do is map a Gherkish statement to an appropriate action that exists in a well-designed shared library.

The Small Role of Page Object Models

We have folders full of code that just deal with finding elements on various web pages. This is where we have XPath and CSS selector mappings for buttons, text boxes, images, and all sorts of points of interest when testing our software. That is the only thing they do; they find and return elements from a page.

One page corresponds to one file. These so-called “Page Object Models” are automagically injected at runtime when they’re used, so there is no need to litter the code with countless require statements and various initializers. The framework is smart enough to initialize the models and “bind” them to the WebDriver instance being used by the test. Truly zero configuration; write code that reflects your intent and the framework will fill in the details.

API Testing is Ridiculously Easy

I personally consider Node to be a RAD tool for RESTful web services. It’s just so easy to write a service or a client in Node. There’s really no trick to it. We write API clients, which are trivially easy in Node, then use those clients from the step definition functions. If you can write a Hello World in Node, you can write an API test in our test framework.

Wrapping Up

Our sprint team has changed the definition of “done” for our stories to released. There are no special qualifiers for this; released means it’s in production. This means unit tests, code review, automated functional tests, and deployments. If it hasn’t been delivered to our customers with quality, it’s not done. Period. There are many cogs in the machine that make this happen, but SURGE and our test suite plays a major role.

We usually do production releases between one and three times a day. Combined with other deployment tooling (which I will blog about shortly, I promise), our team is extremely confident in what we release. At the time of writing this, we have only ever rolled back one production deployment over the last three months. Our completion rate for sprints and stories have been quite predictable, minus an outlier or two.

But best of all, we created something that truly works for us.

The only issue is that SURGE is a victim of its own success. It began life as a prototype, but now it’s spreading like a contagious smile. That means we need to clean it up and get it ready for general consumption! Before you ask: no official comment on that, but stay tuned.

We’re always looking for talented engineers and QA analysis to help us kick it up a notch. We’re even okay with you telling us about how completely wrong we are. Whatever the case, if you have something to bring to the table, we’d love to hear from you.

Happy coding! 🙂

Gerry Hernandez began his career as a researcher in various fields of digital image processing and computer vision, working on projects with NIST, NASA JPL, NSF, and Moffitt Cancer Center. He grew to love enterprise software engineering at JP Morgan, leading to his current technical interests in continuous integration and deployment, software quality automation, large scale refactoring, and tooling. He has oddball hobbies, such as his fully autonomous home theater system, and even Christmas lights powered by microservices.

SURGE: Effective Test Automation in a Microservice World – Part 3