Overview

This article will attempt to provide an argumentative overview, with slight contemplative hints, of the necessity of having performance checks in place as part of the QA integrated process, and will also tackle the idea of proactive performance benchmarking with profiling tools. Since my own expertise resides in manual iOS QA, this article will focus primarily on the mobile side of things.

The most useful way to take advantage of debug tools (i.e. Xcode Instruments for iOS, Android Studio TestConfigurator for Android) for testing mobile apps is measuring your app’s performance. This article will attempt to emphasize the importance of testing performance using such tools, otherwise generally used by developers for debugging.

Let’s say we’re at the beginning of the integration week and the scrolling experience might seem off to QA. The natural steps, in this case, would be: QA logs a ticket named “Scrolling performance is off on this screen” and provides little to no useful information. Then, devs need to investigate and, hopefully, fix if it’s safe. But, as always, problems might occur: the fix might not be safe, maybe your Dev and QA teams are in totally opposite time zones and an entire day might be lost on communication alone etc. At this stage, the integration week is nearing its end and the best thing to do is to push to the next release. Could all this be avoided?

On the project, I’m working on (as a manual iOS QA engineer), we’ve found one possible answer to this question. Various performance issues began building up with each passing release to the point where scrolling became almost unbearable. Note that several performance tickets had been logged in the past, but they were either ignored due to lack of information or abandoned because debugging takes time and the problem was quite old and there was no easy way to pinpoint one to fix.

That is until we started logging performance related tickets with trace documents (containing information gathered in Instruments on framerate drops, CPU and memory usage). Then, more devs seemed eager to jump in with possible fixes, several causes were identified and many of them were fixed immediately.

This enforces the idea that as much information as possible always helps when logging a ticket (well, obviously), but by also providing these trace documents, we gave the devs useful info and laid out a path for them to follow for clues.

Problem

The performance of an app is, regrettably, an aspect that is either taken for granted, or waived as a consequence of any feature, refactor, or other actions that need to be taken within a project.

In many cases, performance isn’t insisted upon enough because either the devices are getting faster and faster, or the project’s development cycle might just be too feature-happy (i.e. adding as many features as possible, with little to no regard to the effect this might have on the performance), or a combination of both.

This may lead to a point in which performance issues arise, as a result of actions with no specific fault of their own (nothing wrong with adding features, right?). At this point, you would have to put up a (usually) collective effort to find changes that could have caused a performance drop, then change or remove them.

I don’t think justifying the need to fix performance issues is necessary, but it usually gets critical when it starts costing you money because either the app launches unreasonably slowly, or the scrolling experience is horrible.

This is when performance gets the most attention. This is the point where a precious part of your developers, that otherwise would have implemented forward-looking features using new and exciting technologies have to dig through the app, figure out what’s wrong and then fix it so it can run normally.

There is a risk that the reactive approach to the app’s performance may lead to frustrations and dreaded crunches along the line. There are also worst-case scenarios to consider, such as no fix is possible without the need to refactor large and essential chunks of the app, which may lead to compromises that might consume time or money you can’t afford.

There’s nothing inherently wrong with the reactive approach, not if you can afford it, but even if you do, it’s an expense worth diminishing as far as possible.

As an example, let’s take a development cycle that looks like this:

A feature is developed in a branch called feature_branch;
When it’s done and tested, the feature_branch is merged to the release_branch
The release_branch is published every two weeks.

A. Feature development and testing takes up the first week
B. Integration testing takes up the second week

I am not aware of any project that has such a setup, but it’s an oversimplified development cycle model that we can grasp easily.

Let’s say the workflow looks something like this in case a performance issue is discovered after a couple of releases:

Solutions

There is no single solution to this problem, but there are some approaches to consider in order to avoid late reactions to performance issues.

1. Close(r) monitor on each release

As stated before, there is nothing wrong with the reactive approach; it can still be useful and even cost-effective in some cases. Especially if your release cycle is relatively short and you have proper metrics implemented.

Because you have proper metrics (Fabric, Google Analytics, or other logging services), you can trust your numbers, especially if you have a large user base. Because you can trust your numbers, you can confidently signal the problem without having to wait for a trend to form.
Because your release cycle is short, you can react and deliver promptly, without losing users who are in a hurry or just value performance.

Profiling tools can also be used in this case. There are probably many good tools out there, but this article will mainly touch on Xcode Instruments and Android Studio Test Configurator since they are generally readily available for iOS and Android QA teams respectively.

The most common problems you would hope to find in such a situation would be memory leaks and bottlenecking processes.

For iOS, these can be easily found using the Allocations and Time Profiler instruments. Pairing both with Core Animation makes it much easier for QA to follow along and identify where the bottleneck might be.

For Android, you can use the Test Configurator in a similar manner to monitor CPU, memory and network usage in real time.

Picking a couple of essential flows to be tested this way will give an idea as to the potential performance difference between the current release and the last. The profiling tools won’t yield absolute, precise numbers to serve as benchmark results, but they are useful for spotting differences within consistent testing conditions throughout the release versions.

The workflow under this procedure would look something like this:

As you can imagine, this procedure has both pros and cons.

Pros:

Early results that might indicate a difference in performance between the current and previous release, leaving more time to react to a performance drop.
Greater confidence in the results to expect from the live environment.

Cons:

More time and manpower (optional) required.
Can seem redundant after a while, especially if no issues are encountered for some time.

2. Feature performance testing

This is a process we have started to slowly ramp up on the project I’m working on, and the main idea behind it is that we should test for a feature’s potential impact on performance before it gets merged into the main branch, and afterward, released to the public.

As opposed to the previous process, this one moves the performance testing phase as early as the first week in our dev cycle model.

Obviously, at this stage, you cannot rely on any logging services to provide you with metrics, since there is no audience for the said feature yet. So, the only way to test for performance in this situation is by using debug tools.

Being more of a preliminary test procedure, it doesn’t imply that every new scenario needs to be tested thoroughly and analyzed carefully via these tools because it would become unfeasible for a large number of features, however great and numerous your QA manpower might be. Instead, focusing on the main scenarios that are most likely to be used in the live environment and cause problems should generally suffice for most apps.

The tools are usually easy enough for anyone to learn, so you might not even need additional QA resources. However, having a special team for this and other tasks might also come in handy.

By now, you can probably imagine what the workflow would look like:

Like everything else, this also has pros and cons:

Pros:

Early results that can dictate whether it’s safe to merge a feature or not;
Paints a more detailed picture of the overall performance impact of each tested feature;
No separate QA team is needed if the current one is willing to learn the profiling tools;
If performed successfully, it removes the need to react to performance issues in the live environment.

Cons:

Can be very time consuming if a large number of scenarios is tested for every feature;
Can seem redundant after a while, especially if no issues are encountered for some time.

Conclusion

The benefits of any of these procedures aren’t immediately obvious, as it can be hit or miss, especially at the beginning (and especially if your QA is new to the tools). However, in time, with some practice, depending on the resources allocated, you might find that there are fewer performance problems in the live environment to react to and your app rating goes up, along with your sales, because there’s nothing like a high performing well designed easily accessible app.

I will follow up with a more technical article describing how we’ve implemented each of the procedures described herein.