About the Role
We are seeking a skilled Sr Software Engineer – Infrastructure Telemetry and Site Reliability Engineer (SRE) to join our dynamic platform team. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our systems while leveraging telemetry data to enhance monitoring and observability. This role is critical in maintaining our high service standards and continuously improving our infrastructure.
Key Responsibilities
Lead the design, develop, and implement monitoring, logging, and alerting solutions to ensure system reliability and performance. Utilize telemetry data to identify and troubleshoot issues, optimize system performance, and enhance overall observability. Collaborate with development and operations teams to ensure seamless integration of monitoring and alerting tools. Write and maintain scripts for infrastructure management and automation (e.g., Python, PowerShell, Bash). Automate repetitive tasks to improve efficiency and reduce manual intervention. Automate deployment pipelines using CI/CD tools such as Jenkins, GitHub Actions, or Azure DevOps. Participate in on-call rotations and incident response, providing timely resolution to system outages and performance issues. Develop and maintain documentation for system architecture, processes, and procedures related to telemetry and site reliability. Design and implementation of cloud infrastructure using Infrastructure as Code (IaC) tools such as Terraform, AWS CloudFormation, or Azure Resource Manager. Collaborate with cross-functional teams to design and implement scalable and resilient infrastructure solutions. Conduct root cause analysis of incidents and implement corrective actions to prevent recurrence. Drive the adoption of best practices in site reliability engineering and telemetry within the organization.
Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience. 5+ years of experience in software engineering with a focus on site reliability engineering, DevOps, IaC and Cloud Infrastructure or a related field. Strong knowledge of monitoring, logging, and alerting tools (e.g., Datadog, Prometheus, Grafana, ELK stack, Splunk, New Relic). Proficiency in programming and scripting languages (e.g., Python, Go, Bash). Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes). Strong understanding of Linux/Unix systems and networking concepts. Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems. Experience with configuration management and automation tools (e.g., Terraform, Ansible, Puppet, Chef). Strong communication and collaboration skills, with the ability to work effectively in a team-oriented environment. Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, CircleCI) is a plus.
Preferred Qualifications
Experience with site reliability engineering practices and principles, such as error budgets and service level objectives (SLOs). Knowledge of data analytics and the ability to interpret and visualize telemetry data. Experience with incident management and post-incident analysis. Understanding of IT security best practices and tools. Strong problem-solving skills and attention to detail. Effective communication and collaboration skills.Salary Range by Location:
AK: Anchorage: Min:$53.90, Max: $91.78
AK: Kodiak, Seward, Valdez: Min:$56.19, Max: $95.67
California: Humboldt: Min:$56.19, Max: $95.67
California: All Northern California - Except Humboldt: Min: $63.04, Max: $107.34
California: All Southern California - Except Bakersfield: Min: $56.19, Max: $95.67
California: Bakersfield: $53.90, Max: $91.78
Idaho: Min: $47.96, Max: $81.67
Montana: Except Great Falls: Min: $43.40, Max: $73.89
Montana: Great Falls: Min:$41.11, Max: $70.00
New Mexico: Min: $43.40, Max: $73.89
Nevada: Min:$56.19, Max: $95.67
Oregon: Non-Portland Service Area: Min:$50.25, Max: $85.56
Oregon: Portland Service Area: $53.90, Max: $91.78
Texas: Min:$41.11, Max: $70.00
Washington: Western - Except Tukwila: $56.19, Max: $95.67
Washington: Southwest - Olympia, Centralia & Below: $53.90, Max: $91.78
Washington: Tukwila: $56.19, Max: $95.67
Washington: Eastern: $47.96, Max: $81.67
Washington: South Eastern: Min:$50.25, Max: $85.56
Why Join Providence?
Our best-in-class benefits are uniquely designed to support you and your family in staying well, growing professionally, and achieving financial security. We take care of you, so you can focus on delivering our Mission of caring for everyone, especially the most vulnerable in our communities.
Accepting a new position at another facility that is part of the Providence family of organizations may change your current benefits. Changes in benefits, including paid time-off, happen for various reasons. These reasons can include changes of Legal Employer, FTE, Union, location, time-off plan policies, availability of health and welfare benefit plan offerings, and other various reasons.