Practical Applications of Behavioral-Driven Development

The challenges, solutions and benefits of using BDD


What is BDD?

BDD, or behaviour-driven development, is a software development process that’s an extension of TDD (test-driven development). The main focus is how to manage software development both from the business perspective and the technical point of view. BDD tries to remove what is called “the cost of translation” – what the stakeholder says and what the technical person understands. With BDD this should be much more clear. BDD is implemented through a simple domain-specific language using simple English sentences that express the behaviour and the expected outcome. Dan North, the originator of BDD, gave the following description of BDD at the 2009 “Agile specifications, BDD and Testing eXchange”:

“BDD is a second-generation, outside-in, pull-based, multiple-stakeholder, multiple-scale, high-automation, agile methodology. It describes a cycle of interactions with well-defined outputs, resulting in the delivery of working, tested software that matters.”


  • Given I open the app
  • And I log in
  • When I arm the system from the client app
  • Then the security system should be armed

Cucumber and Gherkin

Cucumber offers an easy to use language called Gherkin. Gherkin uses a simple syntax very similar to English with the possibility to use variables. Every statement in Gherkin is preceded by the keywords Given-When-Then. They are there mostly for grammatical sense while also indicating the manner in which that step is to be used:

  • Given: Sets the preconditions the app or user must be in, before the rest of the scenario is executed – for example, “Given I am logged in” – ensures that the user is logged, navigates through the app in order to log him in;
  • When: Refers to some action the user is performing on the app (ex: “When I arm away”)
  • Then: Refers to the expected outcome after the action defined in the When statement (Then the security panel is armed)

There are also And and But available; they are there only to define more complex preconditions/actions/expectations. All these keywords are, from a technical standpoint, interchangeable. Once a step is defined with one of these keywords, it can be reused with any other and will still do the same thing. Another very useful feature of the Gherkin language is the fact that it supports variables.

For example: When I log in with “” and “secretPassword”; the username and password are variables here. When reusing the step we can send in any values we like.

  • Now, all the above are just English sentences, they do nothing by themselves. They all need to be implemented using some sort of language and perhaps some libraries, depending on what you’re looking to do.

Why we chose to go the BDD route

Automation testing

  • As we’ve seen above, the main idea of BDD is to have a common language between the stakeholders and the technical team. Combining this with Cucumber and the Gherkin language, where you have a precondition, an action and then a verification (an expectation), we can easily see that this molds very well to app testing: the app is in a certain state, we perform some action on the app and we verify the result – this is what most acceptance testing looks like. As such, we can go as far as saying that this method of discussing an application could serve as a documentation of the app, a high-level design document.
  • We mentioned that all these English sentences, or steps as they are called in the BDD world, have code behind them. This opens the possibility to have those steps executed by a machine, that would go through the app, bring it to a known state, executing some actions upon the app and asserting the state it remains in – aka automation testing.
  • In order to automate the app, we need some libraries that are able to interact with the devices (for mobile automation) or the browser (web automation). As such, the code behind the steps is defined using whatever programing language we like (we decided on Java and Ruby) and Appium and Calabash for mobile testing and Selenium WebDriver for web testing.
  • In order to have things as maintainable and scalable as possible, we went with a layered architecture, where at the very bottom we have functions that are wrappers over the third party libraries used (and other custom functions we used). On top of that, some common functions to the app under test, and on top of that, the implementation of the steps themselves. In this manner, we separate the layers so that when problems arise or maintenance is required, it will have a minimal impact and can be done quickly. A problem that we encountered where our architecture style helped was the switch from one library to another, specifically from Calabash to Appium, when Calabash was no longer being developed. We just rewrote the low level of the functions in the new library (Appium) and everything else worked.
  • We tried to keep the steps as modular and individual as possible, documenting them with three major mentions: precondition (in what state does the app have to be in order to use that step), the action of the step itself and the state the app would be left in.
  • We tried to avoid steps that were to general, such as Then I click “Submit” button, where the Submit value could be any selector that would identify that button in the app’s hierarchy. This would require the non-technical users to have knowledge about the selector strategies such as CSS selectors, or, in the worst case, XPath selectors. Another thing we took into account is the number of parameters used. We can’t have too many parameters in a step because at some point the step itself would become too unstable, hard to maintain and difficult for the user to understand. However, we can’t have too few parameters either because that would mean a step is very rigid and we would end up having too many steps. So, we decided on a maximum of three parameters per step as the magic number.
  • We still ended up with a lot of steps, given that our AUT (application under test) is quite complex and that there are three different clients for the same app, each with its own particularities (web, android and iOS). We needed a way to manage all those steps in a nice and easy to use manner, so we created a DB to hold them, added different attributes to each step (platform(s) to use in on, documentation, etc.) and an app to interact with that DB. An app that in time became much more; it became a tool in which users can get all the steps from the DB, create new test cases with those steps, save them to the DB, create test suites with the previously created test case, save these to the DB as well, and most importantly, run those suites on customizable (from the app, ofc) environments.
  • We can’t say the we followed the BDD paradigms entirely, but we used what we found best fitted our needs. In this case, automation testing and the possibility of anyone being able to create new automation scripts
  • What we ended up with was a collection of steps that can be read and understood by anyone, without any technical background that could write automated tests for the application. The only thing they needed was knowledge about how to use the app, which is something that both technical and stakeholders had.

automated tests

Ease of use and understanding

The BDD paradigm is so much more than what we did with it, but we took it and used in the manner that it fitted us best. The greatest feature of BDD is its English-like syntax, which anyone can understand. Non-technical people loved it because they could express their tests in English using our dictionary of steps through the DB and the app we wrote. Whenever new features were added to the app, the automation engineers wrote new steps for them, most of the time reusing old ones and adding very little new code. The same was the case with any requests that came from non-technical people: “Hey guys, I’d like to do this. I don’t see any steps that can handle that. Could you write them?” Then we got on it and wrote it.

Headaches and solutions

Commonality between the different clients

One of the things that our framework was looking to do is to have the ability to write tests that were platform agnostic. As in, for all three clients – web, android and iOS. We wanted to give users the ability to write a test case and run it on all three clients, if that test verified a feature that was available on all three.

Let’s say we are doing some negative testing on the login of the app. All three clients have the login feature, se we wrote the steps in such a way that were usable on any platform. This, of course, is not that easy behind the scenes, when we had to implement those steps, iOS uses a library, Android another, and the web platform a third library for interacting with each respective client. On top of that, they were written in different programming languages, too – Java and Ruby. So the layer of integration between these three clients was to top layer, where all three had one thing in common – the Gherkin language. In Gherkin, everything looks the same, no matter what language you have behind it or what library you use. “Then I Log in” for example makes no mention of a programming language and no specific library requirements, it just logs in the user.

Was it all worth it?


  • Easy to turn scenarios into automated tests
  • Code can be reused since the implementation for each step does not change. Automation code becomes very modular.
  • Test scenarios become easier and faster to write and automate as more steps are added, creating a snowball effect
  • BDD scenarios are easy to update as the product changes. Plain language is easy to edit. Modular design makes changes to automation code safer.

Time spent on developing a proper framework

This is very hard to estimate for each project. I can say that whatever time is spent developing the framework is later recuperated when maintaining it and automating new features that come in the future. I can say that for our project we needed about a year and a half of work until the app was as automated as possible.

Future improvements

Free Speech Component

This will help create tests by using normal speech. How can this be done? One example is using Apple SpeechFramework in combination with Apple NLP (Natural Language Processing). Using first framework we get the actual text from sound, and after that NLP will help us to tokenization and lemmatization of the whole text. This can be done by paragraph, sentence and even word. Doing so we will get actions (like arm/disarm) and subjects (camera, panel, me) and not have strict sentences/vocabulary imposed by current frameworks. We think that this can work with different languages, since NLP supports it.





Background Image