We’re looking for a Staff Operations Engineer – Ceph Storage to support our storage team in the Cloud Platform division. Our scale spans the globe, our transactions happen 24x7 in our global data centers, and every second that passes millions of requests are evaluated across our exchange. In order to achieve our mission, global efficiency and reliability are absolutely key, as every millisecond quite literally counts in our business.
What We’re Looking For:
- Facilitator: It’s essential that you can relay information and ideas effectively both within and across different teams. While we place a huge premium on technical skill, we value just as much your ability to work with other people
- Adaptable: Our industry moves fast and we need you to keep up and adapt. Acting as an Operations Engineer, you have the ability to prioritize tasks, often against competing scope and timelines
- Technical: A strong foundation in Operations, and experience in solving complex problems and building solutions (including CI/CD, real-time monitoring, production issues, etc.)
- Rigorous: We design and manage massive, globally distributed systems that handle billions of transactions a day - your approach and solutions need to be thorough, scalable and ironclad.
Here’s What You’ll be Doing:
- Design, build and operate a highly scalable, performant and resilient storage layer that operates on a planet scale
- Developing and maintaining automation to handle logging, monitoring and maintenance of the storage layer
- Work with different technologies such as Hadoop, Spark, Aerospike, Kafka etc to enhance and optimize existing systems.
- Participate in complex security system designs and mentor junior teammates
- Act as a senior contributor to the team who takes ownership of large projects and components
- Champions improvements to processes and procedures for the team and for the division.
- Influence the team’s direction and foster accountability, trust, and focus on goals.
- Live the values and promote those internally and externally.
Here's What You Need:
- Experience with building, maintaining and troubleshooting open-source distributed storage solutions such as Ceph and storage orchestrators such as Rook, in a highly automated environment and at scale.
- Experience with IaC and configuration management tools such as Salt, Ansible, Puppet or Terraform.
- Experience with storage level replication technologies.
- Strong experience with capacity planning, disaster recovery and monitoring