Effective Automated Testing in a Microservice World - Part 1

Gerry Hernandez, Accusoft Senior Software Engineer

Test-Driven Development (TDD) is a buzzword to some, yet a way of life to others. Some say Behavior-Driven Development, or BDD, is TDD done right. Cucumber made BDD popular, promising wonderful features such as:

Writing specifications in Gherkin, an English-like grammar, rather than code
Allowing anyone, even non-developers, to read and write tests since they’re English-like
Reusing code by reusing Gherkin statements
Driving tests with data

Everyone is happy. Developers don’t waste so much time, plus other non-technical stakeholders get to participate. All of this sounds fantastic, right? So we tested quite a wide gamut of BDD frameworks, all based on the original Cucumber reference implementation: Robot, Behave, CucumberJS, and Yadda.

Accusoft’s Services team, which is responsible for a large and growing collection of microservices, determined that none of these BDD frameworks work for us. The added magic of Gherkin and Cucumber impedes the natural progression of real-world systems built with modern, microservice architecture. Naturally, we made our own BDD-like methodology, affectionately known as SURGE: Simulate User Requirements Good-Enough. We went from barely being able to maintain 50 tests, to over 700 automated functional tests with continuous, rapid growth in coverage.

We specifically chose a silly name to philosophically align with our goals for this new methodology:

Be practical
Be productive
Keep it stupid simple
Keep it minimal – “you ain’t gonna need it”

This blog post begins a series of articles that will recap our journey toward effective test automation with our microservice architecture. For part one, we will share our experiences and lessons learned from prototyping traditional BDD into our development lifecycle. Part two will focus on the methodology and philosophies associated with SURGE. Finally, part three will provide an overview of our implementation. The series will mainly focus on conceptual patterns and practices that are universal to all programming languages and runtime environments, although we chose Node for our particular implementation of SURGE.

BDD: A Great Solution for the Wrong Problem

There’s no denying that BDD and Cucumber have positively influenced the software development industry and culture. Test-Driven Development is a sound idea, and BDD was the first widely established way of doing it right. For most, at least.We found the methodology crumbled as soon as we applied it to non-monolithic software with a wide set of features.

Accusoft Services is composed of an extensive and ever-growing list of independently deployable services, all of which work together to provide user-facing features. For instance, just logging into the Accusoft Services portal makes a trip through six services. Suppose we wanted to define a behavior and write tests for logging into an account. With that said, the big question is “where do we put our Gherkin?”

“In theory there is no difference between theory and practice. In practice there is.” -Yogi Berra

Yogi Berra nailed it; the ideal solutions that traditional, Gherkin-driven BDD afford sound reasonable, but don’t work in the real world. Here’s what we discovered.

Global Scoping is Evil

One of the primary goals of Cucumber-like frameworks is to make all step functions available globally to all feature files. The intent is to promote code reuse. Sounds great, but it simply does not scale in any sensible way.

Initially, when we wrote just a few Gherkin features, this was working fairly well. All the magic abstraction that Gherkin provides was happening, and it was glorious. Then we added a fourth test and ran into ambiguous step definitions. This was quickly solved by rewriting Gherkin statements to be more specific so that they wouldn’t collide. Then a fifth feature was introduced and we ran into the same problem. And again with the sixth.

Eventually, it was just unmaintainable and we couldn’t work with it. Let’s face it, there’s only so many ways we could say “click the search button” without sounding completely unnatural, which is the entire point of Gherkin.

Here’s where things get interesting, and where much of the community will disagree with us. BDD best practices state that no two step definitions should ever collide, that if they do, our behavior is likely ill-defined. But we challenge that with the following two example feature specifications (this is fictional, for brevity):

       Feature: Able to Use a Search Engine
                 Scenario: Searching on Google 
                            Given I visit google.com
                            When I type "PrizmDoc" into the search box
                            And I click the search button 
                            Then I see some search results

        Feature: Able to Search on a Company Website   
                 Scenario: Searching on Accusoft.com
                            Given I visit accusoft.com 
                            When I click the search link
                            And I type "PrizmDoc" into the search box
                            And I click the search button
                            Then I see some search results

We feel that if a human were to read each scenario, the human would understand what to do. However, Cucumber-like BDD implementations will actually map the last three steps in each of the above scenarios to the same functions. So there are two ways of dealing with this: use unique statements or make the functions that they map to smart enough to deal with both comprehensions of a search button.

Using truly unique statements to avoid all collisions is intractable and effectively turns English into a programming language. English is already a confusing language when code isn’t involved, so why would you ever want to use it as an abstraction layer? Most developers have a hard enough time balancing curly brackets; we have zero interest in compounding those problems with literary devices, sentence structures, and debates on the merits of the Oxford comma. I can hardly even write this blog post!

For a brief moment, we experimented with the latter approach: making the step functions smart enough to deal with both search buttons. Then we introduced a third test that needs to click a search button. Then a fourth. That step function now did four different things, depended on twelve stateful variables, and had a cyclomatic complexity higher than most functions in our application code. Any more and it would be too expensive to maintain.

Independent Test Suites – Not Practical

At first, this may sound like a plan. Each microservice provides a small, finite set of functionality that is well defined, so why not focus on testing that?

The most obvious showstopper is the fragmentation; it doesn’t make sense to couple a test suite with just a fraction of the code it’s actually testing. Reversing the same logic, if the other five services involved in logging into your Accusoft Services account don’t have a test suite associated with the code, it simply won’t be organized, won’t be maintained properly, and likely won’t even be executed. Not to mention, this completely breaks code-reuse among functional tests since they’re quite literally separate.

Besides, if we were to limit the scope of the behaviors only to what one specific microservice is responsible for, the answer is simple: that’s what unit tests are for. Why overthink it? Be practical.

To Be Continued…

There has to be a better way. And that’s why we came up with SURGE.

The team really loves our new Test-Driven Development practices. We had an informal discussion as a team and unanimously agreed that our approach makes sense and is producing positive results. We were never this productive with the traditional BDD methodology, and it seems like our philosophies are contagious, as other teams are beginning to collaborate on our tooling. We can’t wait to share our unique spin on BDD, the SURGE methodology, in our next SURGE-series blog post.

Until then, if this stuff is exciting to you, or even if you think we’re completely wrong and know you can kick it to the next level, we’d love to hear from you.

Happy coding! 🙂

Gerry Hernandez began his career as a researcher in various fields of digital image processing and computer vision, working on projects with NIST, NASA JPL, NSF, and Moffitt Cancer Center. He grew to love enterprise software engineering at JP Morgan, leading to his current technical interests in continuous integration and deployment, software quality automation, large scale refactoring, and tooling. He has oddball hobbies, such as his fully autonomous home theater system, and even Christmas lights powered by microservices.

SURGE: Effective Automated Testing in a Microservice World – Part 1

BDD: A Great Solution for the Wrong Problem

“In theory there is no difference between theory and practice. In practice there is.” -Yogi Berra

Global Scoping is Evil

Independent Test Suites – Not Practical

To Be Continued…