We are seeking a skilled Data Engineer with a strong background in Python and experience managing data at scale. The ideal candidate will be proficient in AWS services like S3 and EC2, and will have proven expertise in building, optimizing, and maintaining large-scale data pipelines (petabyte-level). A bonus would be familiarity with video processing workflows.
Responsibilities
- Data Pipeline Development: Design, build, and maintain scalable, efficient data pipelines in Python.
- AWS Ecosystem: Leverage services like S3 and EC2 for data storage, retrieval, and processing in production environments.
- Big Data Handling: Develop and optimize systems to handle petabyte-scale datasets with a focus on performance, reliability, and cost-effectiveness.
- Monitoring & Reliability: Implement robust monitoring, alerting, and logging to ensure smooth data flow and quickly troubleshoot issues.
- Collaboration: Work cross-functionally with data scientists, software engineers, and product teams to understand data needs and deliver optimized solutions.
- Video Processing (Preferred): If applicable, process and manage video data for analytics, quality control, and other use cases.
Required Qualifications
- Python Proficiency: Strong coding skills in Python (including familiarity with libraries for data manipulation and analysis).
- AWS Expertise: Hands-on experience using core AWS services (S3, EC2, possibly Lambda, EMR, or ECS).
- Big Data Skills: Demonstrated ability to work with large-scale datasets (petabyte-level), ensuring high performance and scalability.
- Database & Storage: Familiarity with SQL and NoSQL databases, plus data lakes and distributed data storage practices.
- Automation & Scripting: Comfortable building CI/CD pipelines and automating repetitive tasks.
Nice to Have
- Video Processing: Experience handling or transforming video data (e.g., transcoding, extracting metadata).
- Machine Learning Pipelines: Familiarity with ML workflows or frameworks (TensorFlow, PyTorch, etc.).
- Orchestration Tools: Knowledge of Airflow, Luigi, or other workflow orchestration frameworks.
- Security Best Practices: Understanding of AWS IAM, encryption, and compliance standards.
What We Offer
- An opportunity to work with massive data sets and cutting-edge technologies in the cloud.
- A collaborative environment with a talented, diverse team of engineers and data experts.
- Competitive compensation and benefits with room for career growth and professional development.
- A chance to influence and shape high-performance data-driven applications.