logo

View all jobs

Senior Software Engineer, Platform (SRE)

Kitchener/Waterloo, Ontario · Computer/Software

Platform / Site Reliability Engineer (SRE)

Our client is transforming industries through cutting-edge technology. Their platform leverages AI, automation, and scalable systems to solve complex real-world problems.

As a Platform / Site Reliability Engineer (SRE), you will play a key role in establishing and enhancing the engineering platform. You’ll help ensure the reliability, scalability, and efficiency of our systems while developing tools that improve engineering productivity.

You will help define and shape the platform strategy, set best practices, and drive initiatives that enhance developer experience, system performance, and operational efficiency.


What You’ll Be Doing

  • DevOps & Infrastructure: Design, implement, and maintain scalable infrastructure to support engineering needs.

  • CI/CD Optimization: Improve continuous integration and deployment pipelines using AWS CDK, including requirements for deployment and database migration tooling.

  • Release Tracking & Deployment: Establish visibility into release cycles, implement automation to streamline deployments, and ensure smooth rollouts.

  • Site Reliability & Observability: Implement monitoring, logging, and alerting systems to ensure high availability and performance.

  • Internal Tooling: Build and maintain tools that improve developer efficiency, automate repetitive tasks, and enhance productivity.

  • Security & Compliance: Ensure infrastructure and deployments align with security best practices, with attention to SoC, ISO, and GDPR standards.


Experience

  • 7+ years of technical experience, with 5+ years as an SRE or similar role. Startup experience is a plus.

  • Deep expertise in AWS, including Fargate and Kubernetes for container orchestration.

  • Strong experience with CI/CD pipelines, particularly using AWS CDK.

  • Proficiency with observability tools (Datadog, Prometheus, Grafana).

  • Strong knowledge of scaling strategies and highly available architectures.

  • Proficiency in scripting/automation with Python, Bash, or TypeScript.

  • Familiarity with security best practices and compliance frameworks (SoC, ISO, GDPR).

  • Strong collaboration skills and ability to work cross-functionally.


Our Tech Stack

  • Infrastructure: AWS, Fargate, Redis, PostgreSQL, SQS, CDK, GitHub, Retool

  • Backend: Django REST framework, Celery

  • Frontend: Next.js, Tailwind CSS

  • LLM Integrations: OpenAI, Claude, AWS Bedrock

Share This Job

Powered by