The focus of the article will be on our take on BDD for automation testing, and how we implemented this solution in one of our projects; what problems it solved and what challenges it introduced.

How did we get here?

Our automation team is small, only four engineers, and we struggle to keep up with automating the already existing tests while accommodating the new features that come in. At the same time, we’re running all the regression that is needed, investigating failures and maintenance on the framework.

At the request of our client we thought of a way of involving more people from the team to help us write more tests, without having technical knowledge, meaning they can’t code, the only prerequisite is knowledge of how the app works.

The inspiration hit when he took a closer look at the mobile automation tests because at that time we were using Calabash to automate. It was an open source tool that worked for both ios and android. We try to keep a parity between the mobile clients because they look and feel very similar so it’s easier to automate and maintain. What was special about this tool was that test scenarios are written in Cucumber, which is a user-friendly script. Anyone reading the script could understand what was happening because it used plain English.

Example:
Feature: Shopping as user
Scenario: Add item to cart

Given I log in the app
When I add “item x” to cart
Then I verify that “item x” is added to cart

We took inspiration from Behaviour-Driven Development, or in short, BDD. It is a software development process that helps manage software both from the business perspective and the technical point of view. It combines the general techniques and principles of TDD with a domain-specific language and by doing so, it removes what’s called “the cost of translation” – what a developer and a non-technical person understand when they communicate, because it uses plain English.

Behavior specs are written to illustrate the desired flow with realistic examples, rather than being written with abstract, generic jargon. They serve as both the product’s requirements or acceptance criteria (before development) and its test cases (after development).

Since BDD focuses on actual feature behavior, behavior specs are best for higher-level, functional, black box tests. For example, BDD is great for testing APIs and web UIs, exceling for acceptance testing. However, behavior specs would not be suited for unit tests, and it is also not a good choice for performance tests that focus on metrics and not pass/fail results.

This is done via Cucumber and Gerkin. Cucumber is a software tool used by computer programmers for testing software that uses plain language parser called Gherkin. Test suites are broken down into features. The Feature section has one or more Scenario sections, each with a unique title. Each scenario is essentially a test case. It can have any number of steps. The steps are written in plain english, but behind the scenes we use actual code that can be written in many languages like java, ruby, c++, etc.

The code for the steps would look something like this.

@Given(“ I log in the app”)
public void iLogInTheApp(){
enterUsername();
enterPassword();
PressTheSignInButton();
}

For the example above, the log in step contains other steps that are called here, enter the username, enter the password, and press the sign in button.

public void enterUsername() {
waitForElementExists(USERNAME_FIELD);
clearTextField(USERNAME_FIELD);
enterText(USERNAME_FIELD, USER_NAME);

And then for enter Username we would use lower level functions and so on. Functions that we defined over the basic library of interacting with objects.

Gherkin is the language that cucumber understands. It uses a simple syntax very similar to English. Every statement in Gherkin is preceded by the keywords Given-When-Then which are interchangeable. They exist mostly for grammatical sense while also indicating the manner in which that steps should be used:

Given: Sets the preconditions the app or user must be in, before the rest of the scenario is executed – for example, “Given I am logged in” – navigates through the app in order to log him in
When: Refers to some action the user is performing on the app (ex: “When I add “item x” to cart”)
Then: Refers to the expected outcome after the action defined in the When statement (Then I verify that “item x” is added to cart)

An even easier way to think about this would be to consider “Given” for entry data, “When” for data manipulation, and “Then”for checking the exit data.

Another very useful feature of the Gherkin language is the fact that it supports variables.

Example:
Feature: Shopping as user
Scenario: Add item to cart

Given I log in the app
When I add “item x” to cart
Then I verify that “item x” is added to cart

For the example above: When re-using the step: “When I add “item_ x” to cart I can use a different value for example “item_toy”, and it will execute correctly. Using parameters, makes it easy to run the same scenario with different combinations of inputs.

Now, all the above are just English sentences, they do nothing by themselves. They all need to be implemented using some sort of programming language and perhaps some libraries, depending on what you’re looking to do.

The client asked us to write the tests using BDD for all three platforms: web, android and iOS. We advised against this because we would have had to dedicate considerable amounts of time and effort to implement the code in this manner and it was slow for test creation. We were considering this from the context where we’d be the only ones writing the tests, it wouldn’t be as viable as other alternatives. We made our estimations and decided the project itself is just too complex to have this pay off.

Working with this technology wasn’t as easy as having to write pure code (for example one struggle was having to pass data between steps needed workarounds)
The actual code implementation still required a high degree of technical knowledge
Working with this technology is not as easy as writing pure code

We shared our findings with the client, but against our advice he decided that he was willing to invest to make it work.

We started this by writing down of all the steps that a user would need to cover the existing functionality in the app, but at a very modular level. That way a test script could be written just by choosing the steps and putting them in the correct order to cover the feature that was being tested. It doesn’t require any technical knowledge the only prerequisite is knowledge of how to use the app, and this is something that our testers already had.

By the end of the meeting we came up with around 700 steps, which was a clear indication of the complexity we were about the face. We needed a place to store all these steps, with people having access to it. This led to the birth of our automation Database. A place to store the dictionary of steps we just created. Of course we knew what all the steps did because we created them, but what about other people ? This needed proper documentation to make it usable. We wrote it as JavaDoc, that we extracted and stored in the DB to be later used as support when creating a test case.

Each step was documented with three major mentions: precondition – in what state does the app have to be in order to use that step?; execution – the action of the step itself; and postcondition – the state the app would be left in after execution.

Example:
Step: Then I verify that “itemName” is added to cart
Pre-condition: The app has to be on the cart section.
Execution: Verifies whether the item sent as parameter is added in the cart
Post-condition: The app remains on the cart section
@param itemName – the name of the item to verify if it exists in the shopping cart

Once the dictionary was completed we took it for a test drive ourselves. We began writing the Smoke test suites using the steps we created, implementing the functionality behind them as we went on.

The goal was to write a test case once, and be able to run it on all three platforms: web, iOS and Android.

We chose a layered architecture, the bottom layer consisted of wrapper functions over third party libraries: Calabash for mobile and Selenium for web, the middle layer consisted of page object model and the top layer which was in fact the integration between the three clients, where all three had one thing in common – the Gherkin language. For example we had the Login step on all three clients but behind the scenes they were implemented in different ways: On android and iOS it was somewhat similar because they were both using Calabash, but on web we used selenium. On mobile the code was written in Ruby while on web we used Java.

The layers are separated in such a way that when problems arise or maintenance is required, it will have minimal impact and can be done quickly. And it’s good that we used this implementation because six months in we found out that one of the libraries we were using for mobile, specifically Calabash, was dropping their support. This could have been a huge complication, but since we had this layered architecture we just rewrote the low level functions in the new library (Appium and switched to Java) then everything else worked. Meaning everything from layers 2 and 3 wasn’t affected.

After experimenting with the flow of creating test cases ourselves, we figured out that the process wasn’t accessible and easy to use for someone outside the automation team. So we created a tool, a web based tool. The expected flow was like this: A user could log in and have access to all the steps we created, it had three section test steps, test cases and test suites.

Following the existing test cases written in our test management tool, they could automate the tests by selecting “Create New Test Case” then finding the needed steps and putting them in the right order to cover the functionality required. They had the documentation I mentioned earlier as support. If a step was missing they could request its creation from within the tool and we would implement it. Once the test was completed to a satisfactory level it could be saved and everyone going forward could use it in their suite

As we were working on this we thought, “well why stop here? We can take this tool even further… let’s make it possible for our testers to customise the environments they need and run the suites from within the tool, have everything in a unified place.”

For web this meant they could choose the browser they wanted from the ones we support and run the tests on any of our servers, and for mobile create new emulators for android or iOS with the specific OS they needed.

We also added the reporting functionality to the app. To have all the results of the runs stored here.

To set all this up, before rolling it out to QA manual team, we invested in our project about a year and a half of work until everything was as automated as possible. Which is a lot more than we initially anticipated, but with all the features we added the complexity of the tool and the functionality we ended up having, this time would be later recuperated when maintaining the framework and automating new features that come in. Our client was supportive of each idea and improvements that we added.

Writing the tests in this manner presented some valuable advantages as we soon came to understand:

Test scenarios became easier and faster to write and automate as more steps are added, creating a snowball effect
Easy to turn scenarios into automated tests, the only prerequisite is knowledge of how to use the app, which the team already has
Code can be reused since the implementation for each step does not change
BDD scenarios are easy to update as the product changes. Plain language is easy to edit.
Modular design makes changes to automation code safer
Living documentation (maintenance on the tests implies also maintenance of the documentation)

Closing Thoughts
Although the benefits sound great in theory, the initial development time is quite high and it requires very skilled engineers. If done properly it can scale for complex projects but this needs a lot of work right from the get go. With that in mind, be willing to actually invest the time and resources to make it work. If you don’t need this kind of transparency in the team, there are easier solutions to implement and choose from. For us, it offered what we needed at the time, with the specific goal we had: the possibility of involving the whole team in writing the test cases: QA and Devs.

And this brings me down to the end of our story, hopefully it all sounded amazing and you’re wondering how much success we got with the tool, the steps, our version of BDD. Well, we didn’t… because ultimately to make this work at the scale it was planned, precious resources had to be spared: manual testers; the costs of everything got too high to make it all worth in the end.

We had some people testing it out, but not full time and although the ramp up with the app took only two to three hours, the velocity of actually producing a test case was quite low. Priorities shifted over the year that passed and having manual testers working on this full time or part time as it was initially planned was no longer an option.

For us to continue by ourselves to write the tests in this manner was slowing us down, so we ended up dropping the framework. We migrated most of the code, into a new framework, that we’re still developing today. Around 90% of the code was salvaged, and we kept the web app, with the functionality of anyone being able to create test suites and run them on custom environments.

Was it a failure?

No, because it worked for all purposes it was designed for, and even more. But it was never used in the way it was developed, meaning, involve as many people as possible to help write the tests. We still ended up with this awesome app to run the tests from and a more organized way of storing and executing everything.

What could have been done differently? If the roadmap of our journey would look something like this, I guess somewhere around here was the crucial step we missed.

And here’s why: after this point we had most of the steps to cover the functionality of the app. If we would have rolled it to the team at that point, with the request of each person QA and Devs, to write even one test case per week, just putting the existing steps in the correct order, even if the functionality behind it wasn’t yet implemented, and figure out what steps are missing, we would have had 30 tests per week. In a year, which is how long we took for the framework, that one test per person would add up to almost 1,500 tests. This is far more than we managed to write in that timeframe, and that number would have made the difference, in my opinion.