Seattle, WA, US
12 hours ago
Principal TPM, Resilience Infrastructure, Incident Prevention
The Resilience Infrastructure and Solutions team reduces the number and duration of customer impacting availability events. As part of its mission, the team undertakes disruptive testing to AWS infrastructure and services in non-production environments to identify potential availability gaps and verify that past findings have been addressed. Join a team of systems experts operating at high velocity, with limitless curiosity, and focused on helping AWS prevent customer impacting availability events.

Key job responsibilities
-Build and operate a program for executing large-scale, disruptive tests on non-production Region-scale AWS infrastructure
-Partner with AWS service and infrastructure technical leaders on test planning, execution, and follow ups
-Dive deep into technical findings, incident response, and follow ups with AWS service teams
-Review test strategy and results with senior AWS leaders as part of quarterly reviews
-Guide automation and test strategies that maximize utilization of tests infrastructure while reducing builder toil
Confirm your E-mail: Send Email