Central Site Reliability Engineer

New Relic

(Portland, Oregon)
Full Time
Job Posting Details
About New Relic
New Relic is a Software Analytics company that makes sense of billions of metrics across millions of apps. We help the people who build modern software understand the stories their data is trying to tell them.
Responsibilities
The Central Site Reliability Engineering (CSRE) Team serves as Site Engineering’s “frontend”, where developers turn to for hardware (physical or virtual), network configs, Docker setup, or anything else needed to run code in production. Our goal is to ensure that devs can do ops with ease, and that their experience with the Site Engineering team is something they would write home about. Being on the CSRE Team means working with a group of supportive, talented, friendly people who share a desire to deliver great service, implement highly reliable systems, automate all the things, and grow as engineers. If you get a kick out of delighting customers and helping other people, you’ll enjoy this job. If you have an eye for simplifying tools and processes, you’ll fit right in. Good candidates will love many of the things that we love, such as: deploying new core functionality, configuring production services for rock-solid reliability, helping teammates, automating processes, streamlining delivery, and building great tools that are a joy to use. Having a solid Linux development and operations background is very helpful. We’re a very fast growing software company and we care about our culture. We value work/life balance, personal respect, code ownership by engineers, and experimentation. If all this sounds like your dream job, read on. Responsibilities: - Deliver the infrastructure and configurations that engineering teams need: both by provisioning systems directly, and by coordinating with other teams. - Improve and automate our tools for provisioning, monitoring, trending, and configuration management. - Explain and document our tools and processes, so that developers can own and self-serve their own operational needs wherever possible. - Communicate effectively with SRE teammates and developer “customers”. Provide good status on all requests, so that developers always know where their requests stand, and you can hand-off an ongoing problem to another team or a peer for completion. - Advise engineering teams on how to configure systems for high reliability. - Be motivated to continuously learn and apply new technologies - Participate in periodic on-call rotation as part of a global team maintaining the availability and performance of the New Relic site and APIs used by third-party services, as well as the various internal services and systems that these core interfaces depend on.
Ideal Candidate
**Requirements:** - 1+ year relevant working experience. - Ability to work as part of a team. - Demonstrated troubleshooting skills. - Proven success resolving multiple interrupt-driven priorities simultaneously. - A friendly, positive customer support presence. -BA/BS degree in Computer Science or related field. **Experience in some or all of the following areas:** - Linux systems administration and tuning - Config management tools such as Puppet, Ansible, and/or Chef - Monitoring, trending, and logging tools such as Nagios, Kibana, Grafana and Cacti - TCP/IP networking - AWS and the Command Line Interface - Docker or other containerization tools - Programming in Ruby, Python, Go, or Java - Load balancing, storage, and clustering technologies

Questions

Answered by on
This question has not been answered
Answered by on

There are no answered questions, sign up or login to ask a question

Want to see jobs that are matched to you?

DreamHire recommends you jobs that fit your
skills, experiences, career goals, and more.