Home Office, Home Office, USA
1 day ago
Site Reliability Engineer
REQ#: RQ193031Public Trust: Other Requisition Type: Regular Your Impact

Own your opportunity to work alongside federal civilian agencies. Make an impact by providing services that help the government ensure the well being of U.S. citizens.

Job Description

Cloud Developer Sr Advisor

GDIT is looking to hire a lead Site Reliability Engineer (SRE) to help take a cloud team to the next level. You will work with the government and other team members to identify and assist in enhancing the reliability of this agency's core cloud infrastructure.  

 
As an SRE you will act as an Account Manager for core AWS accounts responsible for overseeing services running in this agencies infrastructure AWS accounts.  
 
HOW A SITE RELIABILITY ENGINEER WILL MAKE AN IMPACT 
 

You will need to develop a deep understanding of how systems inter-operate within the infrastructure, including upstream and downstream dependencies. 

Responsible for reviewing all AWS infrastructure deployments to identify upstream and downstream impacts and ensure test processes fully validate feature and integration. 

Ensure that monitoring, logging, and alerting for services running in core infrastructure accounts are properly configured and provide actionable information. 

In collaboration with government stakeholders, develop and maintain a logging and monitoring strategy for the infrastructure platform. 

Conduct and coordinate 5 Y’s and other blameless post-mortem activities in the event of an incident. 

Participate in continuous improvement activities such as technical debt analysis, and contributing to the reliability standards and practices of the team 

Work with team DevOps engineers to improve deployment process and introduce automated testing. 

Audit resources in accounts under your responsibility; identify areas for improvement or technical debt and collaborate with program and government partners to prioritize. 

Assist the cloud infrastructure team and other teams in troubleshooting wide area integration issues 

Commit changes to our infrastructure codebase as necessary 

WHAT YOU’LL NEED TO SUCCEED: 

Required Experience: 10+ years AWS infrastructure design and deployment. 3+ years in an SRE role working in complex systems. 

Required Technical Skills: IaC background including CDK or CloudFormation. Lead experience configuring and using logging and monitoring systems including CloudWatch, Splunk or Instana. 

Required Skills and Abilities: Ability to analyze infrastructure dependencies. Experience overseeing infrastructure deployments including developing testing procedures. Strong communication skills. Ability to work with government stakeholders. Prior experience in a cross-cutting SRE role. 

Preferred Skills: AWS Solutions Architect Professional or DevOps Engineer Professional Certification. 

Location: Remote with on-site client meetings 

GDIT IS YOUR PLACE: 

Full-flex work week to own your priorities at work and at home 

401K with company match 

Comprehensive health and wellness packages 

Internal mobility team dedicated to helping you own your career 

Professional growth opportunities including paid education and certifications 

Cutting-edge technology you can learn from 

Confirm your E-mail: Send Email