An Approach to Defining Quality in Software
A Hierarchy Pyramid For Software Quality
In a lot of my projects lately, I have been trying to view software quality more holistically – that it’s not something only testers (or only developers) should be concerned about. In doing this I’ve started formulating an idea that isn’t fully baked yet, but it helped me explain software systems better to myself. I’d like to introduce this as a discussion with the team whenever a testing strategy planning comes up.
Much of the confusion about software quality is that it means different things to different people, and the best definitions so far are zen-like, e.g. Weinberg’s “value to someone (who matters).” They don’t describe things like technical code quality, which I intuitively know matters but doesn’t directly provide value.
Lacking a holistic definition of quality, many teams measure things like bug trends, code coverage, etc. What gets measured gets optimized, so we end up overly optimizing things beyond the point that it makes sense. Air and food are a necessity, but having more air and food than we need does not really improve the quality of life. Similar to that, technical correctness and performance are necessary, but going beyond a certain point gives us diminishing returns.
Top companies always make it a priority to have their apps in the same range of performance on the same level. For example, Amazon and AliExpress apps have a similar performance even if some searches/categories load in different times but is not noticeable to most users. As with any local optimization, there is a potential that we can hurt the whole pipeline by working on the wrong thing. As I was explaining this comparison to a colleague, it hit me that it might be worth trying to build a parallel between Abraham Maslow’s hierarchy of needs and software quality.
The famous Maslow’s pyramid lists human needs as a stack from physiological – necessary for basic functions (such as food, water), safety (personal security, health, financial security), love and belonging (friendship, intimacy), esteem (competence, respect) to self-actualization (fulfilling potential). The premise of the hierarchy of needs is that when a lower level need is lacking, we disregard higher level needs. For example, when a person doesn’t have enough food, intimacy, and respect, food is the most pressing thing. Another premise is that satisfying needs on the lower levels of the pyramid bring diminishing returns after some point. Our quality of life improves by satisfying a higher level needs more. Eating more food than I really need could cause weight gain. More airport security than needed becomes a hassle. The key idea of the pyramid model is that once the basics are satisfied, we should work toward satisfying higher level goals.
Maybe software quality isn’t as simple, and maybe measuring bugs actually does provide some value. Maybe we should stop trying to make it simple and instead model quality on different levels. I tried modeling this with on one particular project but it might apply to others as well.
Starting to lay a foundation for quality
The first level of the Maslow’s hierarchy is physiological needs, and just as these are humans’ most basic needs, the same could be said regarding software. If the most basic things aren’t satisfied, software is completely useless. For this particular situation, we identified two things: that the software has to be deployable and it has to satisfy the minimum functionality so that “it works.” Activities such as TDD, functional testing (automated + manual), post-deployment testing and similar help us prove that this category of needs is satisfied. Measuring bug counts, code coverage and so on also works on this level. And similar to the human physiological needs, enough is enough and after a certain point, there is no more value to be gained. As an example, newer versions of Microsoft Office have thousands of features that nobody ever uses, because even Word 95 had enough. Investing in more features, developing, testing, and maintaining them is overkill.
Establishing all the non-functional requirements
Once the software “works,” the second thing we need is for it to “work well.” Looking at parallels between human security needs and what we need from software, this is where a lot of what people typically call “non-functionals” comes in– performance, reliability, security, etc. Activities such as architectural design and performance optimizations come into this level, and performance testing, penetration analysis, stress tests etc might prove that we have satisfied enough the client’s needs. And similar to human security needs, enough is enough. Recent examples of Apple Itunes store security questions demonstrate that more features than needed in this space annoy users or waste money. Building a system that can handle millions of concurrent users when in the next year or so we’ll only have thousands is a horrible waste of time. I’m betting many companies lost a lot of development money for gold-plating a system “architecturally” instead of shipping more stuff that makes money.
Adding the next layer: Usability of the product
Providing the software performs well and is secure enough, the next level up is love and belonging – and now we cross over to the users of the software. Case in point is Twitter. Famous for its fail whale, Twitter’s second-level qualities are often just good enough, but that hasn’t stopped them from building a huge community of loyal users. Activities such as user interaction design, graphical design, community engagement and similar support software in fulfilling the needs on the level of love/belonging. Usability testing proves it. Of course, different types of software need different levels of this.
So far this is nothing revolutionary. If I look back at most of my project engagements, these three levels are where most of the investment in specifications, development, and testing was, and roughly proportional to the levels as well. Most of the investment goes in building in functionality and testing it. Performance and things like that are sometimes planned for, often reactively built in, and tested less frequently. I often worked with teams that were serious about usability and invested a lot of time or money designing for it and testing it. The pyramid model suggested this concrete project that I worked with that maybe we should start shifting investments a bit. But the real surprise came out when we started looking at the top two levels.
Usability on its own can be overkill
If ever there was an IT equivalent of harakiri, this was it: back in 2009, Nokia started working on a new service, Ovi, to integrate mail, chat, IM and other current and future services together. They spent a lot of money on developing this. Nokia – a technology leader – has “reinvented itself as a service provider,” but nobody wanted their service. Don’t get me wrong, I am not saying that user interaction design is bad, or that design thinking is wrong – just that it’s not the end. It is a need to be satisfied but brings diminishing returns after a point. There are two more levels on the pyramid above it.
Understanding the target audience
The key thing missing from the Nokia Ovi story is the fulfillment of potential. Usability marks potential, and if nobody is using it do we care? The level above usability is usefulness. I did a study with some KPI metrics recently and looking at the log files we determined that roughly 22% of the features aren’t used enough to justify investment in further maintenance. Maybe instead of investing a lot of money in functional testing, we can invest in measuring the usefulness of software? This, I think, should be the fourth level. Many of the serious businesses these days will define their quality more at the fourth level than the first level, which enables them to be much more productive. If the indicators expected by the business users aren’t there, the feature is taken out. Think of this as a business-bug.
The measurement of success
Finally, the fact that someone uses a feature doesn’t necessarily mean that the feature was right for the business in the first place. This is the last level in the hierarchy, corresponding to self-actualization. Does the software achieve what it was originally intended to? Does it save money, earn money or protect money? Or whatever the key business goals were originally. If not, then it doesn’t really matter that people use it, that it is usable or performant or that all unit tests pass. The top level is really where “more is better”, with perhaps a gradual transition to “good enough is good enough” on the two levels below, with the lowest two levels definitely falling into the “good enough” category. Yet from what I experienced most software teams invest, build and test only at the lowest two levels, gold-plating things without a way to explain why that is bad. Breaking things down in a visual model, such as the one with the five levels of the pyramid here, helped me get to start thinking better about what we should really want to achieve.
Whatever the timeline or budget for a project, it is always better to have a clear quality baseline that should be adopted. This will help with planning the strategy in the beginning, following it through the development and delivering a better, customer-oriented product in the end.