We are looking to hire an accomplished Data Engineer to join our US team, helping us build and maintain a data platform that supports diverse uses. This includes data analysis, exploration, aggregation, user modeling, and scalable training systems. As part of the Data Team you will be a key player in our small, nimble, internationally-distributed team and drive significant impact all across the company's data technology and business.
We currently use Python, Java, and Scala to develop tools with Spark, Kafka, Airflow, MySQL, Druid, Spinnaker, and Kubernetes. We run in both GCP and AWS, but primarily work with Dataproc, Dataflow, BigQuery and Big Table. You will have the opportunity to join us in exploring new technologies and use them to design, deploy and operate highly performant systems.
In this role you are expected to be comfortable working to high standards as a professional data engineer, dealing with huge amounts of business-critical data (PBs), and to contribute across a full spectrum of responsibilities from architecture to ops.
Impact you will make:
- Develop high-quality reliable data pipelines that convert data streams into valuable information
- Design, implement and deploy both real time and batch data processing pipelines for internal and external customers
- Develop tools to monitor, debug, analyze and operate our data infrastructure
- Design and implement data technologies that can scale for hundreds of millions of users
- Collaborate with our product and business teams to deliver valuable new features and functions
Who you are:
- BS in Computer Science or related technical discipline or equivalent experience
- 2+ years of professional experience in data engineering environments
- 2+ years of experience with SQL and programming in any of Python/Java/Scala or similar HLL
- Experience with data pipelines processing larger than 10TB of data is a plus
- Experience working in cloud environments, ideally with GCP or AWS
- Strong experience in improving performance of queries and data jobs and scaling the system for exponential growth in data volumes and traffic
- Expert debugging skills and enthusiasm for automation to deliver high-quality reliable systems
- Comfortable with modern development tools such as Git and Confluence and working in a distributed agile team environment with both high autonomy and regular collaboration