Our client is looking for Data Engineers who have seen their fair share of messy data sets and have been able to structure them for building useful AI products.
You will be working on writing frameworks building for real time and batch pipelines to ingest and transform events(108 scale) from 100’s of applications every day. The ML and Software engineers consume these for building data products like personalization and fraud detection. You will also help optimize the feature pipelines for fast execution and work with software engineers to build event driven microservices.
You will get to put cutting edge tech in production and freedom to experiment with new frameworks, try new ways to optimize and resources to build next big thing in fintech using data!
- You have previously worked on building serious data pipelines ingesting and transforming > 10 ^6 events per minute and terabytes of data per day.
- You are passionate about producing clean, maintainable and testable code part of real-time data pipeline.
- You understand how microservices work and are familiar with concepts of data modelling.
- You can connect different services and processes together even if you have not worked with them before and follow the flow of data through various pipelines to debug data issues.
- You have worked with Spark and Kafka before and have experimented or heard about Flink/Druid/Ignite/Presto/Athena and understand when to use one over the other.
- On a bad day maintaining zookeeper and bringing up cluster doesn’t bother you.
- You may not be a networking expert but you understand issues with ingesting data from applications in multiple data centres across geographies, on-premise and cloud and will find a way to solve them.
- Proficient in Java/Scala/Python/Spark