DevOps Engineer
PubNub
(San Francisco, California)PubNub runs a globally distributed “Data Stream Network”, a cloud service that developers use to build and scale large real-time apps. We have thousands of customers, and process billions of realtime messages each month. We power a variety of realtime applications including multi-player games, ride dispatch services, social events, online auctions, education apps, telecom infrastructure and more. We develop software close to the bare iron and optimize performance in microseconds.
The Engineering team is responsible for designing, developing, operationalizing, sustaining and scaling PubNub’s Data Stream Network. This includes our secure, distributed messaging bus as well as all add-on services and data pipelines including Storage/Playback, Presence, Access Management, Push Gateways and more. We are a strong team of Engineers and Architects who are low on drama and high on results. Our mission is to use uptime, performance and scale as tools, to extend the fabric of real-time possibilities and stay true to the trust and confidence reposed in us by customers to deliver delightful user experiences. We believe that teams, which balance linear progression and non-linear innovation, achieve the best results. Consequently, we place the team ahead of the individual when solving problems and celebrating achievements. If you are on a journey to seek a team whose norm is to swarm and perform, we are your destination! As a DevOps Engineer, you would be directly responsible for championing tools and technologies; adopting and adapting frameworks and services used/needed by current and forward looking features. In net terms, you’ll serve as a productivity multiplier for other developers in the team.
- Collaborate with engineering teams, product owners and other stakeholders to understand tooling needs for Agile development and Continuous Integration/Deployment (CI/CD) practices
- Devise automation strategies for upcoming releases while continuously modernizing existing systems
- Create and manage local development environments for complex software systems
- Work closely with Architects and Engineers to define build pipelines, administer artifact repositories and automate test tooling
- Champion best practice methodologies for packaging and distributing web-scale applications
- Assist the execution of performance, stress and security test plans
- Assess release readiness of features from an operational perspective
- Ensure smooth and optimal production rollouts of provisioning, deployment, configuration, monitoring and other day-to-day operational activities
- Manage and own testing, staging and production infrastructure and cloud resources (DNS, firewalls, bastion hosts, proxies, load balancers etc)
- Manage ongoing operations of SQL and NoSQL databases like MySQL, Redis and Cassandra
- Spearhead implementation of predictive service monitoring solutions to proactively detect problems, identify root causes and manage SLAs
- Promote operational best practices for infrastructure dynamism including automated capacity modeling, Service discovery, Containerization and Auto scaling
- Strong automation design skills
- Strong understanding of cloud infrastructure providers – AWS, GCP, Rackspace, Digital Ocean etc.
- Strong understanding of networking concepts, protocols, and security (TCP/IP, UDP, HTTP, NTP, DNS, TLS etc)
- Hands on experience in vendor-agnostic infrastructure automation and configuration management technologies such as Terraform and Ansible
- Significant experience with web scale operation of virtualized linux farms and infra components such as Proxies, Reverse Proxies, in-memory caches, distributed filesystems, no sql databases etc.
- Significant experience with Open source and Commercial Application Performance Monitoring and Log analysis stacks
- Experience with implementing and using Continuous delivery design patterns and tool chains
- Expert level knowledge of shell scripting
- Working knowledge of Python/Go
- Experience with administering and using Code review tools such as Crucible/Gerritt and Artifact Repositories such as Artifactory
- Attention to detail and ability to work independently on complex problems
- 3-5 years experience in a Site Reliability role
- BS or MS in Computer Science or related technical field.
Benefits | Benefits included |
---|
Additional Notes on Compensation
Competitive salary and pre-IPO stock options. Generous paid medical/dental/vision coverage, plus medical and dependent FSA. Catered lunch 3x a week. Fully stocked break room with unlimited drinks and snacks. Team outings and holiday party.
Questions
There are no answered questions, sign up or login to ask a question
- AWS
- Cloud
- Agile Development Process
- Linux
- MySQL
- Networking
- Python
- SQL
- Continuous Integration
- DNS
- Go
- NoSQL
- Rackspace
- Redis
- TCP/IP
- Computer Science
- HTTP
- Containerization
- Ansible
- Crucible
- Auto Scaling
- Terraform
- Code Review
- In-Memory
- Infrastructure Automation
- Scale Applications
- Open Source
- Digital Ocean
- Artifactory
- CI/CD

Want to see jobs that are matched to you?
DreamHire recommends you jobs that fit your
skills, experiences, career goals, and more.