Site Reliability Engineer

Datadog

(New York, New York)
Full Time
Job Posting Details
About Datadog
Datadog is the leading service for cloud-scale monitoring. It is used by IT, operations, and development teams who build and operate applications that run on dynamic or high-scale infrastructure. Because Datadog collects metrics and events from 100+ different technologies and services out of the box, including other monitoring tools, you can monitor your entire stack in one place, without any gaps.
Responsibilities
* Keep our service reliable, available and fast as a member of the operations team. * Respond to, investigate and fix service issues, whether they be deep in the OS kernel or in the application code. * Design, build and maintain the infrastructure we need to support orders of magnitude more customers.
Ideal Candidate
**Who you must be** * You have a BS/MS/PhD in a scientific field * You have a track record as an engineer in the operations of a large site * You value correctness and efficiency; you leave no stone unturned when diagnosing production issues * You handle infrastructure with code because automation lets you focus on the more difficult and rewarding problems * You have production experience with distributed compute/storage tools, e.g. zookeeper, cassandra, postgres, kafka, elasticsearch redis **Bonus Points** * You have submitted bug fixes to the aforementioned projects * You are fully fluent in python, ruby and go

Questions

Answered by on
This question has not been answered
Answered by on

There are no answered questions, sign up or login to ask a question

Want to see jobs that are matched to you?

DreamHire recommends you jobs that fit your
skills, experiences, career goals, and more.