The DevOps tool arsenal: Results from ~100 DevOps/SRE surveys

The survey

DevOps is associated with a multitude of ideas and a multitude of tools. For this reason, when some people think “DevOps,” they think about Docker, AWS, Jenkins, etc. But doing DevOps well is no more about using tools than creating a winning football team is about using helmets and cleats. What makes a good football team is people, working in different roles in a tightly integrated way to accomplish a common goal. The same is true for DevOps — it’s how an organization defines roles for its employees and teams and sets them up to work in concert that determines whether they’ve scored a DevOps touchdown (and earned 99.999 points 😉). So if you want to know what DevOps is about, you can’t just look at the tools; you have to talk to people.

In order to shed some light on the DevOps culture (and the related Site Reliability Engineering world), I asked engineers practicing DevOps/SRE if they’d fill out a survey covering their day-to-day. I posted this request to Hacker News and the subreddits /r/devops & /r/programming. I’ve been blogging about the results of the survey, which covered the roles & teams, the daily concerns & favorites, infra & architectures, and finally the tools of DevOps. Despite the fact that DevOps is not about the tools, tools are still an absolutely critical part of success (going back to our football analogy, I doubt a team could win the Super Bowl wearing flip-flops, no matter how skilled and organized its members are), so today we’ll talk about the Nikes and Under Armours of the DevOps world. Disclaimer: I don’t really follow sports, but I liked the football thing so I ran with it!

Cooking up some code

CI/CD seemed like a good place to start asking about tools since it’s something almost any organization doing DevOps has and it’s where stuff from Dev starts its journey to Ops, as far as code is concerned. So the first “tools” question was:

Which of the following are used in your build/CI?
you can select multiple options
<options here>

If you checked other, please specify
<write-in option here>

I normalized the free-form text (fixing spelling mistakes, consolidating JetBrains TeamCity with TeamCity, etc), and shoved the free-form answers in with the preset options. Here are the results:

The tools used in CI/CD from 84 answers. Blue bars are answers that were explicitly listed as answers, orange ones were added by the respondents. Everything from “Django” down got one response. The top axis shows the number of respondents selecting an option, the bottom axis shows the percentage.

Ok… so some of this is comparing apples and oranges. For instance, Sonar (presumably SonarQube) is a static analysis tool that plugs into your CI pipeline manager (like Jenkins). But since respondents could choose multiple options for this and other questions, and pretty much all of these can be used at some point in the CI process, we can still make some interesting comparisons (plus each percentage is interesting in and of itself). For instance, Jenkins is literally a fork of Hudson, but it’s pretty clear that Jenkins won that war (sorry Oracle). I was also impressed at just what a high percentage of users (68% +/- 5% of respondents) Jenkins has claimed.

For those of you scratching your heads about how “Django” made it on the list, the raw text answer was “Django with custom on host rpms to pull configs.” This respondent also said they’re using Puppet, so perhaps they’re in transition from a home-brew solution to something more standard.

Cooking up some infrastructure

Preparing infra for production and preparing code for production seemed related enough, so the next question was about provisioning tools:

Which of the following are used in your provisioning?
<multiple options, write-in box>

Tools used for provisioning from 82 responses. Top axis shows number of respondents, the bottom shows the percentage. Everything from AMI down had only one response

Looks like Docker is the winner for provisioning at 54% +/- 6%! Lest you protest me putting Docker into provisioning in the first place (since a container technology doesn’t necessarily have provisioning as its primary aim), I would contend that defining a Docker image meets most of the same goals as defining, say, a Puppet manifest. Ansible, a more “traditional” provisioning option, came in a close second. I’m sure they’ll be #1 as soon as they tack-on the faster-than-light communication capabilities their name implies (shout out to all you Ender’s Game fans 😁). Hashicorp would be happy to know that they showed up three times on this list, with Terraform as the highest for a write-in at 11%+/-3% (Packer and Vagrant are the other HashiCorp offerings).

One final thing worth mentioning here is that one respondent (the Elastic Beanstalk dude) said that they’re mostly serverless, which negates the need for provisioning. A few other respondents mentioned serverless, but only in the “favorite tool” section, and one of those said their organization wasn’t very open to it. Only time will tell whether this is the next big thing or a fad that goes the way of program design language (don’t worry, I’m too young to remember when it was a big thing too — apparently it hit its stride in the mid-80s).

Docker wins again…

Speaking of hot topics, containers are all the rage of late. So the next question was:

If you use a container technology, which one do you use?
<multiple options, write-in box>

Container technology used from 64 respondents. Top axis is number of respondents, bottom axis is percentage. Everything from Rocket down was only selected once.

It looks like Docker took the cake pretty soundly! All the others combined total 43 compared to Docker’s 51. If we exclude Amazon’s container service, Docker has more than 3 times the total of the remaining options. Aside from container services explicitly associated with cloud providers, the next most popular option (admittedly only by 1 vote) is straight-up native Linux containers (LXC, on which Docker is built). Also, those three cloud-provider container technologies are all really just management tools built up around Docker. I guess Docker was right to choose a whale as their logo since they’re certainly the biggest player here!

As for orchestration, I didn’t dig in too deeply other than to ask whether those using containers were using some sort of orchestration tech. Of those using containers, 59% +/- 6% were. I suppose the great whale requires some shepherding.

Amazon’s turn for world domination

From containers, I went ahead and moved onto the cloud, with the question:

If you use a cloud provider, which do you use?
<multiple options, write-in box>

Cloud providers from 81 respondents. Top axis is number of respondents choosing that option, bottom is the percentage of respondents. Everything from the VMware provider to Heroku had one option. Joyent and Century Link were listed as options on the survey, but were not selected.

If the container world is so dominated by Docker that Docker’s symbol is a whale, then the cloud world is so dominated by AWS EC2 that Amazon’s symbol should be… that giant space worm from Star Wars? 82% of survey-takers using a cloud were using AWS EC2, which is a solid 4 times that of Azure’s roughly 20%.

One thing that jumped out at me as I was going through these results was the fact that several respondents are using multiple clouds. Doing a little Googling, I found this excellent survey from RightScale, which shows from >1000 participants that enterprises are on average:
- Using 1.8 public and 2.3 private clouds
- Experimenting with 1.8 public and 2.1 private clouds

My goodness, that totals about 8! As for why organizations are using multiple clouds, this Rackspace post speculates from conversations with their customers that it boils down to (1) incremental adoption of different cloud technologies and (2) leveraging the strengths and weaknesses of different providers for different workloads.

Yo dawg, I heard you like measuring stuff…

We’ve covered a lot of tech (in this post and previous), and much of it should be measured — be it the memory usage of your EC2 instances or the number of requests hitting your microservices. Unsurprisingly, there are a lot of tools associated with monitoring. Here’s how people responded to the question:
Which of the following do you use for your monitoring?
<multiple options, write-in box>

Monitoring tools used by 82 respondents. Top is number of respondents, bottom is percentage. From Wavefront to Pingdom each option was chosen only once. SignalFX was a listed choice, but wasn’t chosen.

Count ’em. That’s 35 different monitoring tools listed from 82 people. Apart from the top 5 (CloudWatch, New Relic, Elasticsearch/Kibana, Graphite/Grafana, and Datadog), the difference between each tool and the next most popular isn’t more than one or two people. Why so many tools and so much more of an even spread than we’ve seen in previous sections? It’s hard to say, but one explanation is that there are just a lot of different things that need measuring. CloudWatch, for instance, will monitor your AWS resources, while Graylog will help you monitor your log data. It can’t just be that there are a lot of different types of things to be monitored, however, as most of these are general purpose monitoring tools.

Personally, my best guess is that monitoring is both a complex, and a very important problem. Collecting data from disparate sources, and making sense of the onslaught of all that data are both challenging problems. Different tools will solve these challenges in different ways, with one tool working better in some situations, another tool working better in others. But you can’t just ignore these complexities and avoid facing these challenges — the cost of not having the right kind of visibility into your systems is just too high. Those are my two cents, but I’d love to hear yours!

Where the rubber meets the road

Of course you’re not doing all that monitoring only because that Grafana dashboard is just so gosh-darned pretty — you want to know how your system is behaving so you can keep it behaving the way you want. And when it isn’t behaving that way, you need to know it, and you need to do something about it. So the next question was:

Which of the following do you use in incident management?
<multiple options, write-in box>

Incident management tools used by 75 respondents. Top axis is number of respondents, bottom is percentage. Everything from Everbridge to Pushover was only selected once. The remaining three were provided options, but were not selected.

Here the top two tools are very different in nature. Slack is about team communication, whereas PagerDuty is more strictly about the incident lifecycle — from alert to postmortem. Based on the high percentage of Slack usage and the high prevalence of other communication oriented tools (e.g. email, HipChat, people coming to my cube and screaming), we can clearly see that communication is an important part of resolving problems when they arise.

I find that a nice message on which to end this series about DevOps survey results. When it comes down to it, at the end of the day, our tools are only there to empower us, be it to do things more effectively, or to do more things than we could before. There will always be some things that a person or a well-organized team of people can do better than any tool … until the singularity arrives, then all bets are off 😉.

Overwhelmed by the amount of data your team is measuring with all your monitoring tools, and unable to leverage that data to its fullest? Check us out @ overseerlabs.io. Don’t worry, we’re not another dashboard UI to add to your mix. We snuggle up between your metric collection and your existing dashboards/alerting channels, derive insights from your metrics using machine-learning, feed those insights back into your existing tools, and get out of the way to let you and your team shine! We’re currently looking for private beta partners — if you’re interested, give me a shout at josh@overseerlabs.io.

Follow us on Twitter! And/or leave a Medium 💚 so I feel loved