Senior Site Reliability Engineer

MZ

(Palo Alto, California)
Full Time
Job Posting Details
About MZ
MZ, previously known as Machine Zone, is a privately held technology company based in Palo Alto, California, founded in 2008. Its flagship product, RT platform is a real-time messaging and analytics platform for managing real time data streams.
Summary
The Senior Site Reliability Engineer plays a major role across the Operations team and MZ overall. You’ll be tasked with maintaining our complex infrastructure and optimizing our environment for maximum uptime. You’ll also monitor and build out our systems to ensure health and scalability in a fast paced environment and you’ll have a strong say in our infrastructure decisions moving forward.
Responsibilities
* Create, monitor, and scale our operations efforts through innovative automation approaches and configuration management * Develop and monitor our global infrastructure as we scale internationally * Build custom tools and instrumentation that ensure maximum system uptime and health * Research new techniques and explore the newest technologies * Play with: Puppet, Python, SaltStack, Splunk, MySQL, Redis, Nginx, Graphite, Nagios, Go
Ideal Candidate
**Who you are:** * Strong knowledge of system architecture, performance tuning concepts, and web applications * Passionate about automation and configuration management (Puppet, SaltStack, Chef, etc.) * Experience with software development, the application stack, and network level issues * Scripting and programming mastery across a variety of languages (Python, PHP, C/C++, or Java) * Expertise in large scale, high volume operations environments * Strong foundation with relational database technologies and caching techniques * 5+ years of experience with Unix/Linux and system administration related tasks * 2+ years of application development * 2+ years of experience with networking systems and technologies **Bonus Points:** * You think of infrastructure and automation as code * You handle large services and applications in high traffic environments * You enjoy working at scale * You understand server & network failures and how to handle them * You like coding challenges and thrive on efficient and fast code * You are passionate about what you do and often explore new tools and technologies that make automation and scale a reality

Questions

Answered by on
This question has not been answered
Answered by on

There are no answered questions, sign up or login to ask a question

Want to see jobs that are matched to you?

DreamHire recommends you jobs that fit your
skills, experiences, career goals, and more.