While DevOps teams and site reliability engineers (SREs) have gained prominence in IT circles, the similarities and differences between the two are not always well understood. They are closely linked in the services they provide to their business, but there are clear lines of distinction between the roles they play, the sets of tools they use, and the way they are stimulated, both organizationally and internally. Here is a brief overview:
Where are they focused?
Everything that is pre-production is DevOps, while post-production is SRE. While DevOps focuses primarily on the ability to develop and produce applications, SREs are much more focused on the stability or reliability of the platform once it is in production.
What tools do they use?
Given the differences in their goals, the tools they use are also different. DevOps teams are more focused on the IT workflow and automation tools such as Jenkins, Chef, Puppet and Harness. It also relies on cloud engineering and infrastructure such as code platforms such as Ansible, Hashicorp and Pulumi.
SREs are more focused on monitoring through Data Dog, Prometheus and similar platforms. They are always on call, so PagerDuty or similar tools are crucial to them. They should also be familiar with the tools for defining a service level goal (SLO) and service level indicator (SLI), such as Blameless or Nobl9. These tools in combination give them the information they need to find these indicators and track and report on them.
Which is more difficult technically?
In terms of the training required and the overall technicality of the role, DevOps are likely to be more technically practical, given their need to know how to build a pipeline and maintain it in a way that meets the needs of a wide range of stakeholders. .
SREs need to be better versed in software engineering. Being able to diagnose problems and direct them to the right people is crucial in their world. Although SREs do not need to know the details of infrastructure provision, they need to know how to determine when they first see latency in a particular part of the cloud infrastructure and why.
How did they get there?
When people are just starting their careers, they need to be flexible and may not have a strong voice when joining a new organization. What they know and how they can demonstrate it will determine their roles. Either they are an expert in platform engineering and know a lot about how to build cloud platforms, or they know about usability monitoring. If their experience was in a system administrator, DevOps is probably more appropriate. This is a natural progress from setting up your Linux VM to automating the process. If, on the other hand, bringing some order to the chaos is your thing, SRE is probably the way you want to go.
What are their bad days?
So which is a bad day for DevOps or SRE? For SRE, this is fire after fire after fire. Especially in large organizations, SREs are in many cases the first line of defense. They are on call. They do triage. They take things back, do what they have to do to get the service back. When everything is burning and you don’t even know who to escalate to, it’s a bad day for the SRE team.
For DevOps, it’s a bad day when Jenkins doesn’t work and the DevOps pipelines don’t work. Someone launches a new change or migration and then realizes that a critical service along the way has not yet migrated, and so this team yells at DevOps. When engineering teams can’t do their job because of something DevOps did as part of a migration process, it’s a very bad day.
What are their great days?
The best thing that can happen to an SRE is the recognition of net business value. When someone’s boss says, “Okay, we saved $ 5 million in staff hours this quarter because we had 70% fewer interruptions and 50% of our interruptions were automatically resolved because of the workbooks we put in,” that’s a good day for WED.
A great day for DevOps is a day of silence. When people develop their infrastructure, deploy things and everything works the way it should work, it’s a good day. When people can do what they have to do, the pipelines work and everything is a simple machine moving forward, this is a good day for a DevOps engineer.
Add value every day
The last few years have led to hundreds, if not thousands, of new roles, terms, acronyms, platforms and organizations, all pursuing the same goal – excellence and speed in software delivery. The term DevOps, introduced more than a decade ago, today means something very different from then. Website reliability engineering, a newer but also rapidly changing role, is gaining in popularity. Regardless of how they intersect and diverge and how this changes within a single company (and it happens), these two roles are at the heart of the software lifecycle in their organizations and are becoming more valuable and strategic, as time and production continue.
https://www.informationweek.com/software/where-devops-and-site-reliability-engineers-intersect-and-diverge