Site Reliability Engineer

Medallia

(Palo Alto, California)

Full Time

Job Posting Details

About Medallia

Medallia is the customer experience management company. Founded in 2001, the company is trusted by the world’s leading brands including Verizon, Macy’s, Sephora, Honeywell, Four Seasons, Sodexo, and Mercedes to improve customer experiences. Medallia’s Software-as-a-Service (SaaS) application enables companies to capture customer feedback across Web, social, mobile, and contact center channels, understand it in real-time, and take action everywhere.

Summary

Site Reliability Engineering at Medallia creates the systems that power a highly reliable, agile, and efficient global SaaS platform. We are building a next generation global data center operating system that spans on-premise and cloud-based infrastructure, leveraging some of the most exciting new open source technologies. We work closely with product and platform engineering to make the world's best customer experiences even better. SREs own the reliability of key components of the applications and infrastructure stack at Medallia, and ensure that they continue to scale with our rapidly-growing business.

Responsibilities

**As a Site Reliability Engineer, you may:**

* Build and own an infrastructure component within our systems foundation (compute, storage, network, etc.).
* Instrument and build testing automation to prove that our infrastructure is delivering a world-class experience.
* Debug and solve complex problems that may span the full service stack.
* Automate the provisioning and updates of components of the Medallia SaaS stack.
* Proactively monitor and manage the availability of infrastructure and applications.
* Optimize performance of components across the full service.
* Develop capacity planning models and translate workload forecasts into capacity requirements.  
* Be a part of an engineering team on-call rotation for escalations.

**Our Engineering Culture**

* We don’t expect to be perfect, but we are always proactively seeking out ways to help ourselves and our teams to minimize pain points within our infrastructure and code base.  
* We love technology -- and following the latest technologies and sharing what we learn.
* We are not afraid of failing when we are experimenting with different technologies, development methodologies, and toolings.
* We develop strong relationships with team members around the globe.

**As a Site Reliability Engineer, you may:** * Build and own an infrastructure component within our systems foundation (compute, storage, network, etc.). * Instrument and build testing automation to prove that our infrastructure is delivering a world-class experience. * Debug and solve complex problems that may span the full service stack. * Automate the provisioning and updates of components of the Medallia SaaS stack. * Proactively monitor and manage the availability of infrastructure and applications. * Optimize performance of components across the full service. * Develop capacity planning models and translate workload forecasts into capacity requirements. * Be a part of an engineering team on-call rotation for escalations. **Our Engineering Culture** * We don’t expect to be perfect, but we are always proactively seeking out ways to help ourselves and our teams to minimize pain points within our infrastructure and code base. * We love technology -- and following the latest technologies and sharing what we learn. * We are not afraid of failing when we are experimenting with different technologies, development methodologies, and toolings. * We develop strong relationships with team members around the globe.

Ideal Candidate

* BS or equivalent experience in Computer Science or other technical specialty. * 5+ years experience in hands-on engineering for Internet-scale services. * Ability to code or script in at least one language (Java, Groovy, Python, Go, Ruby, etc.) on Linux-based platforms * Deep experience in at least one infrastructure component (operating systems, compute, storage, networking, data center, distributed systems, big data, cloud, etc.) and solid understanding of the rest and how they impact services. * Experience building, configuring, and maintaining operational monitoring and reporting tools. * Solid understanding of infrastructure and application performance metrics, including capacity planning. * Familiarity with cluster management tools such as Aurora, Mesos, borg, marathon, tupperware, kubernetes. * Familiarity with relational databases, particularly PostgreSQL

Similar Jobs

See other jobs at Medallia
See more engineering jobs in California

Questions

Answered by on

This question has not been answered

Answered by on

Ask a question!

There are no answered questions, sign up or login to ask a question

Site Reliability Engineer

Medallia

Questions

For Job Seekers

Contact Us

Site Reliability Engineer

Medallia

Questions

Want to see jobs that are matched to you?

Application Submitted

Login Here

Question Submitted

Thanks for submitting your question!