Back to jobssitetracker
Site Reliability Engineer
Canada, CA full time 24d ago
Skills
About this role
country: CA
all locations: [Canada]
commitment: Full-Time
department: Technology
location: Canada
team: Engineering
Within 90 Days, You'll:
Fully onboard and partner with the engineers currently managing reliability to review and revise the existing operational plan.
Operationalize high-leverage items to transition the team out of reactive firefighting and into a more stable, proactive state.
Establish a baseline for current system behavior by identifying the most critical user journeys that require immediate SLI/SLO definitions.
Within 180 Days, You'll:
Independently drive the revised reliability plan, ensuring SLIs/SLOs are in place and actively used to guide engineering decisions.
Standardize the incident response structure, including severity definitions, Incident Commander roles, and a cadence for blameless postmortems.
Measurably reduce paging volume and ensure that every alert that pages an engineer is backed by a clear, effective runbook.
Within 365 Days, You'll:
Establish a mature reliability practice where production-readiness reviews and error-budget conversations are default parts of the development lifecycle.
Define a clear, evidence-based tooling roadmap for the next phase of our evolution, such as Terraform, service mesh, or multi-region expansion.
Serve as an organizational multiplier, having built the observability and culture necessary for other engineers to reason about reliability without constant supervision.