Site Reliability Engineer II

Bangalore, IN on-site full time mid Apr 21, 2026

Skills

About this role

At Advisor360°, we build technology that transforms how wealth management firms operate, scale, and serve their clients. As a leading SaaS platform in the fintech space, we’re trusted by some of the largest independent broker-dealers and RIAs to power the full advisor and client experience—from portfolio construction and reporting to compliance and client engagement. What sets us apart? It's not just the tech (though it's best-in-class). It's the people, the purpose, and the passion behind everything we do. We’re a team of builders, thinkers, and doers who believe that great companies are defined by the stories they tell and the experiences they create—internally and externally. We bring deep industry expertise, a collaborative spirit, and a commitment to innovation as we reshape what’s possible in wealth management. As we grow, we’re looking for teammates who are ready to roll up their sleeves, think big, and help elevate our brand in a way that reflects the bold ambitions we have for our company and the clients we serve. Join us, and be part of a company that's not only moving fast—but making it count. Site Reliability Engineer II We are seeking a highly motivated Site Reliability Engineer (SRE) to join our team and drive operational excellence across our systems. In this role, you will serve as a key steward of reliability, scalability, and performance for our mission-critical SaaS platform. You will operate at the intersection of software engineering and operations, applying SRE principles to improving system reliability, reducing operational toil through automation, and enhancing observability across the platform. As an SRE, you will play a critical role in maintaining the health of production environments, proactively identifying risks, and ensuring rapid and effective incident response. If you are passionate about automation, operational excellence, and building highly reliable systems at scale—and thrive in fast-paced, high-impact environments—this role is for you. Key responsibilities Implement, maintain, and support infrastructure and system environments across cloud and hybrid platformsDesign and implement monitoring, alerting, and observability solutions (e.g., Dynatrace, Grafana, Datadog)Build dashboards and alerting frameworks that provide actionable insights and reduce mean time to detection (MTTD)Define and manage SLIs, SLOs, and error budgets to establish measurable reliability targets and drive data-driven decisionsLead incident response efforts, including troubleshooting, root cause analysis (RCA), and post-incident improvementsImplement, automate and manage effective OS patching and upgrade processes to ensure security, stability, and complianceDevelop and maintain automation for deployment, scaling, recovery, and operational tasks using Python, Go, Bash, or PowerShellProactively identify risks, bottlenecks, and reliability gaps, and drive remediation effortsCollaborate with engineering teams to improve application reliability, performance, and scalabilityPartner with cross-functional teams (Engineering, Security, Platform, support) to embed reliability practicesCreate and maintain runbooks, playbooks, and operational documentation with an automation-first mindsetParticipate in on-call rotations and contribute to a sustainable, balanced on-call cultureMentor junior engineers and advocate for SRE best practices across the organization Required Skills & Qualifications 5+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Systems EngineeringProven experience operating and supporting large-scale distributed systems in production environmentsStrong understanding of SRE principles and practices, including SLIs/SLOs, error budgets, and reliability engineering methodologiesStrong programming and automation skills in Python, Go, Bash, or similar scripting languagesDemonstrated experience on automation and execution on OS patching and upgrade processesDeep expertise in monitoring and observability platforms such as Dynatrace, Splunk, Prometheus, Grafana, and ELK stackWell versed in all aspects of incident management, including troubleshooting, root cause analysis (RCA), and post-incident improvementsExperience working with relational and NoSQL databases in high-availability and production-grade environmentsUnderstanding of networking fundamentals, including TCP/IP, DNS, load balancing, CDN, and firewallsHands-on experience with cloud platforms (AWS, Azure, or GCP) and associated managed services Why You’ll Love Working Here: It’s not just about work—it’s about building a career and enjoying the ride! Here’s what you can expect: We believe in recognizing and rewarding performance. Our compensation package includes competitive base salaries, annual performance-based bonuses, and the chance to share in the equity value you and your colleagues create during your time with the company. We offer comprehensive health benefits, including dental, life, and disability insurance. We also trust our employees to manage their time effectively, which is why we offer an unlimited paid time off program to help you perform at your best every day. Join us on this journey. Advisor360° is an equal opportunity employer committed to a diverse workforce. We believe diversity drives innovation and are therefore building a company where people of all backgrounds are truly welcome and included. Everyone is encouraged to bring their unique, authentic selves to work each and every day. The way we see it, we are here to learn from each other. Locations: Bangalore, India