← all jobs

Lead Site Reliability Engineer - Infrastructure

Work from home Full-time role Hiring

JOB DESCRIPTION We are seeking a Lead Site Reliability Engineer (Infrastructure) to act as technical lead for our Infrastructure SRE team in a fast-moving VSaaS engineering organization. In this role, you will own the team's technical direction and execution across reliability, scalability, and operability of our shared platform and production systems, combining hands-on technical leadership with responsibility for team outcomes. You will define SRE strategy and guide architecture across our GCP and Kubernetes ecosystem, setting standards for reliability, scalability, GitOps, and observability. You will also mentor senior and staff engineers, and lead incident response and high-impact operational work, contributing hands-on when needed. Role Overview Site Reliability Engineer - Infrastructure In this role, you will translate product and business needs into scalable infrastructure and clear technical direction. With a system-wide view of the platform, you will guide architectural decisions, surface non-obvious risks, and drive long-term improvements to system reliability and operability. Working closely with product and platform teams, you will shape the developer experience and ensure engineering teams can ship with speed and confidence. You will set engineering standards and continuously evolve our GitOps and observability practices. This role requires strong expertise in cloud infrastructure, distributed systems, and CI/CD, along with hands-on experience in Golang and/or Python to support automation and long-term system reliability.

Responsibilities

As a Lead Site Reliability Engineer, you will:

  • Team Leadership & Execution Ownership: Own technical direction and execution of the Infrastructure SRE team. Translate platform goals into actionable plans, ensuring alignment on priorities, reliability outcomes, and operational excellence across production systems.
  • Production Operations & Incident Management: Operate and evolve large-scale distributed systems in production, proactively identifying failure modes and mitigating risk. Own day-to-day operations including monitoring, alerting, incident response, coordination, post-incident analysis, and continuous improvement.
  • Architecture, Standards & Platform Governance: Provide architectural leadership across platform and infrastructure changes, identifying scalability constraints, system design risks, and long-term reliability gaps. Define and enforce engineering standards for GCP, Kubernetes, and ArgoCD, ensuring consistent, secure, GitOps-based delivery.
  • Reliability Engineering & Observability: Lead strategy for monitoring, alerting, and system observability, driving a shift from reactive incidents to proactive reliability engineering.
  • Enablement, CI/CD & Collaboration: Guide CI/CD and cloud-native delivery practices at scale to ensure safe, scalable releases. Mentor senior and staff engineers, conduct high-impact design and code reviews (Golang/Python), and partner with product and engineering teams to embed system-level thinking across development.
  • Hands-on Technical Contribution: Provide hands-on technical contribution where needed, including debugging production issues, reviewing and contributing to code, and supporting critical incident resolution to ensure system reliability and team effectiveness.
  • Other duties as assigned are absorbed into the above ownership and operational responsibilities.

Minimum Qualifications

  • Leadership & Experience: 10+ years of experience in Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering, including demonstrated experience leading technical engineering teams, driving roadmaps, and owning delivery of large-scale production systems.
  • Cloud & Distributed Systems Expertise: Deep experience with cloud-native architectures and distributed systems at scale, particularly in GCP and Kubernetes environments. Ability to reason about system design, identify failure modes, and evaluate scalability and reliability risks.
  • GitOps & Delivery Engineering: Strong experience with GitOps-based delivery workflows, particularly ArgoCD, and CI/CD pipeline design. Ability to ensure safe, repeatable, and observable production deployments.
  • Infrastructure & Automation: Strong hands-on background in infrastructure-as-code (Terraform preferred), automation, and operational tooling. Proficiency in Golang and/or Python for building and reviewing production systems. Strong Linux systems knowledge and production troubleshooting experience.
  • Observability & Reliability Engineering: Experience designing or operating observability systems (logging, monitoring, alerting) and applying SRE principles such as SLOs, incident management, postmortems, and reliability engineering practices.
  • Technical Oversight & Engineering Quality: Ability to review and critique system design and production code, ensuring engineering quality across backend systems and infrastructure components.
  • Communication & Leadership Influence: Ability to influence technical direction, communicate trade-offs to stakeholders, and drive alignment across product and engineering teams on reliability and platform priorities.

Why Milestone? Milestone offers not only great benefits but also great culture. Employees here have flexible work environments, opportunities for further education, and the ability to effect change in our Organization directly. The annual salary for this position ranges from $160,000 to $180,000 range. Pay is based on the level, location, complexity, responsibility, and job duties of the specific position and is just one component of Milestone's total compensation package. Additionally, we offer an attractive benefits package that includes medical/dental benefits, FSA or HSA, 401k with 6% Safe Harbor employer match, paid parental leave, generous PTO (20 days' vacation, 10 days paid sick time, and 12 company holidays), fully paid Short Term disability policy, fully paid Long Term disability policy, and Life Insurance. If you are selected for an interview, please feel welcome to speak to our Talent Partner about our compensation philosophy. All employees must complete a background check. Employees in fiscal roles are also required to undergo a credit check. All information obtained during these checks is handled confidentially and shared only with authorized personnel. Milestone is committed to creating a diverse and inclusive workplace and is proud to be an equal opportunity employer. Contact and application Please apply at our website: www.milestonesys.com We are looking forward to receiving your application

More open positions

Principal Site Reliability Engineer (SRE)

Work from home Full-time role

IBM SFG with Docker/Kubernetes - Remote

Work from home Full-time role

Senior Software Engineer, Kubernetes Platform, Fabric Integration

Work from home Full-time role

Associate Technical Support Engineer - Red Hat Advanced Cluster Management for Kubernetes

Work from home Full-time role

Wireless & Network Engineer

Work from home Full-time role

Consultant, Environmental Health and Safety / San Fran Bay Area

Work from home Full-time role

[Remote] Account Executive

Work from home Full-time role

Remote Data Analyst – Entry‑Level, $35/hr Full‑Time Remote Position – Data Investigation, Business Process Optimization & Reporting at careerzynith

Work from home Full-time role

[Remote] Sales Lead – Employer Sales, Americas

Work from home Full-time role

Work‑From‑Home Customer Service Representative – Remote Support for careerzynith E‑Commerce Operations

Work from home Full-time role

Academic & Professional Development Advisor: College of Social Work - UTK

Work from home Full-time role

Senior Manager, WFM Scheduling & Intraday

Work from home Full-time role

Logistics & Supply Chain Specialist

Work from home Full-time role

Amazon Product Reviewer Jobs (Work From Home) $27/H

Work from home Full-time role

Lead Bundling Strategy Specialist – Remote Data Entry & Brand Management at careerzynith – $26/hr – Full‑Time – Minnesota

Work from home Full-time role

BI Reporting Analyst

Work from home Full-time role

Director, Partner Relations - Destinations & Tourism Boards

Work from home Full-time role

Scientific Services Veterinarian

Work from home Full-time role

[Remote] Customer Support Representative - Missouri Work-from-Home

Work from home Full-time role

Scientist I / Scientist II, Computational Protein Generation

Work from home Full-time role

Contact Center Customer Experience Specialist - Cardmember Services

Work from home Full-time role