Senior Systems Engineer, Cluster Services

Quantcast

(San Francisco, California)
Full Time
Job Posting Details
About Quantcast
Quantcast helps create a more personalized digital world. We have built one of the world’s most sophisticated data-intelligence platforms, using big data and machine learning to solve the biggest challenges in marketing and create more rewarding experiences across the digital landscape.
Summary
Quantcast operates some of the largest custom-developed data processing infrastructure in the world, storing and processing tens of petabytes of data daily. This includes a fault tolerant space efficient distributed file system (QFS), along with a custom map/reduce implementation that is four times faster than open source alternatives. Low-probability failures are common occurrences in any system that contains thousands of nodes. The Cluster Services team owns the development and operation of software systems that not only identify problems across the stack (network, hardware, OS, services), but also auto-correct and compensate for them. The team also owns a custom resource allocation system that has been used to achieve massive service co-tenancy across the compute infrastructure: scaling the distributed processing and storage platforms to become a highly available service that supports analytics and modeling across the company. The ideal candidate has hands-on experience with large scale distributed systems (HDFS, Hadoop, Cassandra, etc), configuration management (Puppet, Chef), databases (mySQL, PostgreSQL), and Amazon Web Services. They should be comfortable working in an event-driven environment while also developing code that scales the management of our distributed storage and compute platform.
Responsibilities
* Mentor and grow the more junior engineers on the team * Drive operational excellence through automation, monitoring, and incident analysis * Provide technical input into product roadmaps for the team * Develop tools that scale the management of the distributed storage and compute platform * Maintain and enhance the services that support the distributed storage and compute platform * Work to make our platform more elastic and fault tolerant * Guide the development of systems that integrate our data centers with Amazon Web Services
Ideal Candidate
* BS in computer science or equivalent experience * Experience with large scale distributed systems * Proficiency in one or more programming languages * Linux system administration/automation experience * Track record of driving operational excellence * Excellent communication and interpersonal skills * Strong written communication and documentation skills * Organized, detail-oriented personality

Questions

Answered by on
This question has not been answered
Answered by on

There are no answered questions, sign up or login to ask a question

Want to see jobs that are matched to you?

DreamHire recommends you jobs that fit your
skills, experiences, career goals, and more.