Sep 28, 2020 · by Oleksandr Demeniuk

SRE: The Next Big Thing in IT?

A two-part series on Site Reliability Engineering

The first part of this series described the definitions of SRE and covered how DevOps fits into this broad discipline. In the second and final article, we discuss if Test Engineers should consider changing roles to SRE by reviewing responsibilities.

Testing in SRE

If you’re a Test Engineer, it’s likely that you:

  • Are a curious person and question everything
  • Tend to break things on purpose
  • Take on the roles of users and apply them as needed
  • Feel that using a context is a guiding light

A Test Engineer knows how the system or feature works, how it could break and how to fix it (or at least who can fix it). An SRE engineer understands how this code fits into the bigger picture of the company’s architecture and tries to set the whole system up for success by maintaining reliability. 

During the past couple of years software testing had shifted toward production, leaving traditional testing in non-production environments behind.

There are multiple reasons Test Engineers might find SRE as a logical evolution in their career. Traditional tests are more common in software development to evaluate the correctness of software offline, during development done by Test Engineers. They are often among the most astute chaos experiment designers and help teams find ways to be even more intentional about finding and fixing problems before anyone else does.

Test Engineers are experts at automation, at designing tests, and at imagining potential problem areas and attack vectors. All classical software testing techniques are applied and adapted for SRE in systems at scale.The amount of testing needed to be done for given systems depends on the requirements, but in our case it’s reliability requirements. The only thing we have to keep in mind is that all these testing activities are being made against the production environment following the latest trends in the industry.

Production tests, on the other hand, are performed on a live web service to evaluate whether a deployed software system is working correctly done by SRE engineers. The Test Engineer’s goal of ensuring consistent product quality marries well with the SRE goals and their experience helps them fit into SRE teams quickly. Mixing this with the slow move away from QA testing, which really came of age during the era of waterfall software development, and moving from a dying field into the new frontier of continuous testing in DevOps at runtime makes it a natural fit.

Below are some key skill areas for an SRE:

  • Release engineering
  • Operating systems 
  • Databases
  • Cloud computing
  • Security
  • Troubleshooting
  • Customer support
  • Networking

And every area can be boosted by applying testing techniques.

The traditional testing industry is slowly fading along with waterfall software development, and shifting from it makes a lot of sense for Test Engineers to apply their insights as SREs. This is not to say that every Test Engineer at some degree should change their role to SRE, but simply to consider this option when that “What’s next?” question arises.

SRE as a Career Opportunity

If you look across a number of job posting sites, there are thousands of open positions around the world. The table below gives an idea of the numbers on some popular sites as of January 2019. 

SRE was listed as one of the top 20 “Most Promising Jobs” in LinkedIn’s annual reports for 2017, 2018 and 2019.

Final Thoughts

The IT industry is full of buzzwords and trends. First it’s DevOps, then it’s Docker, Kubernetes and RPA. SRE is in a promising position to become bigger than that, especially since it’s more about people and process rather than tools (Hello, Agile). And the tooling is already on the market, so you don’t need new tools to align your development, testing and operations around the principles of Site Reliability Engineering. 

If you’re interested in SRE and want to understand the roles and responsibilities further, check out the resources below.

Resources:

Google’s SRE Resources -  https://landing.google.com/sre/

SRE in the spotlight – https://youtu.be/cg8wdrm-B1g 

SREcon videos – https://www.usenix.org/srecon

Keeping Google up and running 24/7 –  https://youtu.be/yXI7r0_J29M

SRE at Dropbox  – https://youtu.be/ggizCjUCCqE

SRE at Netflix  – https://youtu.be/koGaH4ffXaU

DevOps Handbook – https://www.amazon.com/DevOps-Handbook-World-Class-Reliability-Organizations/dp/1942788002

The Phoenix Project – https://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/1942788290/

The Unicorn Project https://www.amazon.com/Unicorn-Project-Developers-Disruption-Thriving/dp/1942788762/

A Practical Guide to Testing in DevOps – https://leanpub.com/testingindevops

Accelerate https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations/dp/1942788339/

Oleksandr Demeniuk

Oleksandr Demeniuk

Quality Engineer
Oleksandr Demeniuk

Latest posts by Oleksandr Demeniuk

Share This Article

Post A Comment