Site Reliability Engineer

Zoosk

(San Francisco, California)
Full Time
Job Posting Details
About Zoosk
Zoosk is a leading online dating company founded by engineers, and has a solid history of iterating on what works best for our customers to help them fall in “like” and eventually love. Our smart, passionate employees collaborate across departments to help achieve our mission to empower everyone to lead a more fulfilling love life. With over 35 million active members in 80+ countries and top dating apps for both Android and iOS, we’re a lot more than just a fun place to work.
Summary
Zoosk's Technical Operations team is looking to add a seasoned Site Reliability Engineer to our growing team. In this role you'll be a key member of the team that keeps us up and running, serving over 38 million active members in 80 countries. You should have a focus on automation, scaling, and monitoring. We maintain a balance of self-hosted and cloud-based solutions. As a member of the team, you have a direct impact on design and feature enhancements to keep our platform running smoothly.
Responsibilities
* Work closely with platform and other engineering teams to set level expectations for projects. * Establish best practices for OS tuning and hardware management, which will allow us to scale efficiently and securely with optimal hardware requirements. * Manage backup and restore, maintaining run books and planning for disaster recovery. * Performance analysis and tuning for services. * Write tools to help automate and conduct functional tests of infrastructure. * Familiarity with MySQL replication, configuration, and backup strategies. * Familiarity with TCP/IP networking and routing for small and large networks.
Ideal Candidate
**Qualifications:** * Solid understanding of Linux system administration, including configuration, troubleshooting, automation, and security. * Experience working in a high capacity, fault tolerant and horizontally scalable environment. * Experience with load balancer technologies and best practices, F5, NetScaler, SteelApp. * Experience with at least one scripting language (Python, Perl, Ruby), as well as shell scripting. * Experience with common monitoring tools such as Nagios, Ganglia, Cacti, Splunk, New Relic. * Familiarity with building hosts (kickstart, PXE boot) and configuration management systems (CFEngine, Puppet, Chef.) * Experience in a 24x7 on call rotation. **Pluses:** * Experience with handling critical production under structured change management guidelines (publish tickets, design and implementation documents followed by Run Books which can be used by rest of Ops team to triage production issues). * Experience with MySQL High Availability solutions. * Experience managing instances in an AWS environment. * Experience with implementing active-active topologies across data centers. * Strong understanding of backup solutions.

Questions

Answered by on
This question has not been answered
Answered by on

There are no answered questions, sign up or login to ask a question

Want to see jobs that are matched to you?

DreamHire recommends you jobs that fit your
skills, experiences, career goals, and more.