Senior Site Reliability Engineer

Leanplum

(San Francisco, California)
Full Time
Job Posting Details
About Leanplum
Leanplum is the most complete mobile marketing platform, designed for intelligent action. Its integrated solution delivers meaningful engagement across messaging and the in-app experience. Leanplum offers Messaging, Automation, App Editing, Personalization, A/B Testing, and Analytics.
Summary
Our Site Reliability Engineers are a hybrid of software and systems engineers. We code our way out of operational problems and into chocolate chip cookies. Our current mission is to design Leanplum’s next version of the core infrastructure. We are responsible for reliability, scalability, and automation, while keeping an eye on latency, performance, and capacity. We are seeking extraordinary talent to help fuel our distributed applications capable of serving over 1 billion mobile devices tracking over 6 billion analytical events/day equating to over 17,000 requests/second and in the end generating over 1.5TB/day of data.
Responsibilities
* Monitoring and alerting for various components across our infrastructure * Automate the server provisioning process across API, Cassandra and Spark with over 400 nodes * Influence and create new designs and architectures for a growing number of distributed systems (multi regions cloud environment) * Plan and execute configuration management and monitoring of our platform as it grows. * Design the system and processes that engineers use to deploy their software into production. * Design, write, and maintain software to improve the availability, scalability, latency, and efficiency of Leanplum’s services, incorporating cloud and open source tools when available and writing software of your own when nothing else fits the bill. * Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks and provisioning new hardware as necessary. * Run software performance analysis and system tuning. * Participate in rotating on-call duties.
Ideal Candidate
**You’re Good At** * Fluent in one or more of: Java, Python, or Scala * Familiarity with algorithms, data structures, and complexity analysis * Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols * Nice to have experience with network protocols and theory (TCP/IP, UDP, ICMP, MAC addresses, IP packets, DNS, OSI layers, and load balancing, etc.) * Systematic problem solving approach **You Might Be Also Good At** * Expertise in designing, analyzing, and troubleshooting large-scale distributed systems * In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.) * Strong sense of ownership and drive * Experience with AWS, GCP, or Microsoft Azure * Experience with tuning and performance (Spark, Cassandra, Google App Engine apps)
Compensation and Working Conditions
Benefits Benefits included

Additional Notes on Compensation

Competitive Salaries, Health, vision, and dental insurance.

Questions

Answered by on
This question has not been answered
Answered by on

There are no answered questions, sign up or login to ask a question

Want to see jobs that are matched to you?

DreamHire recommends you jobs that fit your
skills, experiences, career goals, and more.