Skip to content
flint
Back to jobs
sitetracker

Site Reliability Engineer

Canada, CA full time 24d ago

About this role

country: CA all locations: [Canada] commitment: Full-Time department: Technology location: Canada team: Engineering Within 90 Days, You'll: Fully onboard and partner with the engineers currently managing reliability to review and revise the existing operational plan. Operationalize high-leverage items to transition the team out of reactive firefighting and into a more stable, proactive state. Establish a baseline for current system behavior by identifying the most critical user journeys that require immediate SLI/SLO definitions. Within 180 Days, You'll: Independently drive the revised reliability plan, ensuring SLIs/SLOs are in place and actively used to guide engineering decisions. Standardize the incident response structure, including severity definitions, Incident Commander roles, and a cadence for blameless postmortems. Measurably reduce paging volume and ensure that every alert that pages an engineer is backed by a clear, effective runbook. Within 365 Days, You'll: Establish a mature reliability practice where production-readiness reviews and error-budget conversations are default parts of the development lifecycle. Define a clear, evidence-based tooling roadmap for the next phase of our evolution, such as Terraform, service mesh, or multi-region expansion. Serve as an organizational multiplier, having built the observability and culture necessary for other engineers to reason about reliability without constant supervision.
Sign in Apply