DevOps/SRE Survey Part 1: The People

Josh @ Overseer
Overseer Engineering Blog
7 min readMay 4, 2017

--

The Survey

DevOps is a term that’s been thrown around in the software world for some time now (since 2008, apparently). A related term that has been around longer, and is a bit more specialized in scope, is Site Reliability Engineering. There has been a lot of discussion generated on the culture and ideas associated with these movements. However, what gets discussed a little less often is what it looks like to be someone practicing DevOps/SRE. What does the day-to-day look like? What are the main concerns/joys practitioners have? What do the organizations/codebases these individuals work with look like? And finally, what does the landscape of tooling for DevOps look like at present?

To shed some light on these questions, I put together a survey and posted it to a couple of the favorite haunts of software junkies- reddit (/r/programming and /r/devops), and Hacker News. Just over 200 people responded, resulting in lots of interesting data on the life of engineers doing DevOps/SRE. In this post, I’ll cover the responses to questions about the individuals themselves and their teams. I plan to follow this up with a couple more posts discussing the respondent’s daily concerns, their favorite parts of their jobs, and finally some discussion of their codebases/toolchains.

DevOps is not a job title… except when it is

One of the core struggles in the DevOps movement has been the drive to have it understood as a culture and a methodology. The very word DevOps is crafted as the combination of development and operations into one happy whole, epitomized in the statement by Amazon CTO Werner Vogels: “The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it.” To help cement this understanding and culture, some in the DevOps world have urged people not to use DevOps in a job title (though others may disagree).

“You build it, you run it”

Controversy aside, I tried to find out what’s actually happening out in the world with question #1:

*from 203 responses

Welp. It looks like a chunk of the people hanging out at /r/devops and friends do self-identify DevOps as a part of their job title, despite the broad cultural resistance. However, as we’ll see in some responses (particularly in the next post), in many cases there’s a disconnect between the people doing the work and the rest of the organization regarding what DevOps means and the value that it adds. So this could just be a reflection of the fact that DevOps job titles are imposed from above, and not chosen by the employees.

As for the results hiding in that “Other” category — they cross a broad range from the highly specialized “Database SRE” to the all-powerful “CTO.”

Soloists, two-pizza violators, and many in between

The next question addressed the size of the survey-taker’s team. Let’s face it, the size of your team plays a huge role in your day-to-day. For this question, I painted with a broad brush and gave people just three options: 1, 2–9, or 10+. In my mind, this breaks things somewhat cleanly into three distinct team dynamics:

1: There is no “team dynamic”! Everything you do related to DevOps/SRE is done alone
2–9: You have a team where everyone can easily keep track of what everybody else is doing and therefore easily coordinate efforts
10+: It is definitely no longer easy to keep track of who’s doing what, communication is kind of a burden, and you’ve violated Jeff Bezos’ rule that you should be able to feed a team with 2 pizzas.

There were 202 answers that went into this chart. If my standard error stats are right, a larger sample could easily have changed the percentages in the software engineer column by 10 percentage points or so, so it’s probably best not to draw too many conclusions from it.

As you can see, most people are on teams in that middle range of 2–9. Two pizzas, no problem! That is, assuming the people on the “2” end get personal pizzas and those on the “9” end are not very hungry 😉. However, there are still a non-trivial number of people on very large teams, as well as people doing their work solo. SREs are more likely to be in the sweet spot of 2–9 than those with DevOps in their title. As mentioned in the caption, there weren’t a lot of answers in the “software engineer” bucket, so take those numbers with a hefty grain of salt.

Teams of teams

After asking how many people were on the survey-taker’s team, it made sense to ask just how many teams as a whole were dedicated to DevOps and/or site reliability. Here we enter another realm of controversy, related to the job title controversy. If DevOps is a culture, and there shouldn’t be anyone with a DevOps job title, how could there be a whole team dedicated to DevOps?

Perhaps one of the best resources for understanding the relationship of DevOps responsibilities to team structures is this page on DevOps topologies. It does contain some structures where there is a dedicated DevOps team, but (spoiler) most of these are considered anti-patterns. The preferred structures either have DevOps responsibilities being shared between dev-oriented teams and ops-oriented teams (ex: types 1 and 2), or have some cross-functional people embedded in each team (ex: types 3, 6 or 9). Where does site reliability fit into this mix? The DevOps topology page considers SRE to be a team that serves as intermediary between Dev and Ops, with tasks traditionally considered “DevOps” falling as shared responsibilities between the Dev and SRE teams (type 7).

Again, let’s table the controversy and take a look at what’s “in-the-wild”:

202 people answered this question

Only 7 percent of respondents said none of their organization’s teams are dedicated to DevOps/SRE, indicating that the “no dedicated team” ideal is far from realized at most organizations.

For the remainder of organizations that do have dedicated teams for DevOps and site reliability, the largest chunk have only one such team, but the amount of organizations with tons of teams (10+) is not insignificant, either. When we break down the number of teams by the size of a given team, we see that the pool of survey takers spans organizations in a wide range of categories — from the 11 respondents who are the single individuals on a one-person DevOps/SRE team (you guys are the real MVP) to the 11 respondents who are one of roughly 100+ (10+ teams of ~10+ people) employees dedicated to DevOps/SRE.

A matter of time

With the structure of people out of the way, the next thing to address was the structure of the people’s time. The exact question was:

Treating a “Never do” as a 0 and a “Do Constantly” as a 3, the results were:

0 equates to “Never do” and 3 to “Do constantly”. For you statistically inclined readers, the error bars shown are standard deviations, not standard errors. So they illustrate the spread of the data, not the uncertainty in the mean. About 200 results went into each bar, which translates to an uncertainty of around 0.05 in the means given these standard deviations.

There was clearly a lot of variation in each category, but the overall story is that working on continuous deployment tooling was the No. 1 time sink for the survey takers. Tasks not addressed in this question came in second, and a tight race for third/fourth were diagnosing production issues *shiver*, and watching system metrics *snooze*. Feature work came in dead last, another sad indicator that many of the respondents are not closely integrated with the dev team.

It’s worth mentioning that I tried filtering the data by job title to see if any particular titles were more closely related with any particular kind of work. For the most part, the values looked mostly the same regardless of title, with a couple notable exceptions: Self-described general “software engineers” were less likely to code/debug Continuous Delivery tooling than average (1.8 versus 2.1) and more likely to code/debug features than average (2.1 versus 1.3, the largest job-title dependent variation). SREs were more likely to diagnose production and watch monitoring than average (2.0 and 1.9 versus 1.7 and 1.6, respectively). Despite this, the difference between a random two participants eclipsed the variation between any two job titles.

Get you some data! + conclusion

I told the respondents that I would make the results public if I got >100 responses. We easily met that goal, so here it is!

Join us next time for the heart of the survey — the daily concerns and joys of DevOps & SRE practitioners. After that, it’s code and tool time!

Shameless plug: I would imagine nobody likes spending a lot of time debugging production issues, nor can it be fun to keep an eye on 100s or 1000s of metrics. Overseer Labs is working to help reduce the time people spend on these tasks as much as machine learning can muster! If you want to learn how we’ve worked with Rainforest QA to reduce their root cause analysis, check out this case study. Want to talk to us directly? Shoot me a line at josh@overseerlabs.io .

--

--

I love the craft of software engineering, the rigor of science as a means of discovering truth, and the fun of board games with good friends.