Philadelphia, PA, USA
3 days ago
Director of SRE - Fully Remote

A local company is seeking a Director of Site Reliability Engineering (SRE) to lead and enhance theĀ Azure-based infrastructure in a fully remote role, with occasional office visits to Florida. This role is ideal for a seasoned SRE leader with deep expertise in Azure Cloud, Kubernetes, and observability tools.

Responsibilities Architect, scale, and optimize Azure cloud environments to ensure reliability and performance. Lead Kubernetes operations, including cluster management and automation. Implement and manage Datadog and PagerDuty for monitoring, alerting, and incident response. Define and enforce SRE best practices to improve system resilience and operational efficiency. Collaborate with engineering teams to streamline CI/CD pipelines and infrastructure automation. Drive incident management, post-mortems, and reliability improvements. Requirements Proven experience leading SRE teams in an Azure-focused environment. Strong expertise in Kubernetes, including deployment, scaling, and troubleshooting. Hands-on experience setting up and managing Datadog and PagerDuty. Deep understanding of cloud infrastructure, automation, and observability tools. Experience with CI/CD, infrastructure as code (Terraform, Bicep), and scripting. Excellent problem-solving and leadership skills. We are currently not accepting h1b at this time.
Confirm your E-mail: Send Email