Site Reliability Engineer, Data Center

New Relic

(Portland, Oregon)
Full Time
Job Posting Details
About New Relic
New Relic is a Software Analytics company that makes sense of billions of metrics across millions of apps. We help the people who build modern software understand the stories their data is trying to tell them.
Responsibilities
* Lead the development of a robust set of micro REST APIs to help create agile and robust infrastructure management and reporting workflows. * Improve and build upon our existing automation tools for systems provisioning and management. * Independently learn new technologies and master the New Relic infrastructure so that you can provide 'full stack' diagnostics, when necessary, to help to figure out the root cause of internal problems. * Communicate effectively with fellow SREs and other engineering teams, and describe problems succinctly with sufficient detail that you can hand-off an ongoing problem to another team or a peer for completion. * Strategize with fellow SREs and other engineering teams on complex problems, and make decisions and recommendations about systems improvements after analyzing possible courses of conduct. * Perform periodic on-call duty as part of a global team maintaining the availability and performance of the New Relic site and APIs used by third-party services, as well as the various internal services and systems that these core interfaces depend on. * Physical requirements: Ability to lift 50 lbs repeatedly, with our without accommodation.
Ideal Candidate
* Proficiency in one of the following languages is expected: Perl, Python, Ruby, or Go. Ruby and/or Go experience is strongly preferred. * Experience with Linux systems administration and tuning. * Solid understanding of TCP/IP networking and switching. * Troubleshooting skills that range from diagnosing low-level hardware and software issues to large-scale failures. * Ability to work independently. * Experience with monitoring, trending, and logging tools such as Nagios, Kibana, and Cacti. * Shown success resolving multiple interrupt-driven priorities simultaneously. * Experience with Incident management. * Experience with load balancing, storage, and clustering technologies.

Questions

Answered by on
This question has not been answered
Answered by on

There are no answered questions, sign up or login to ask a question

Want to see jobs that are matched to you?

DreamHire recommends you jobs that fit your
skills, experiences, career goals, and more.