SRE Implementation Engineer
Advantage Solutions
**Summary**
The SRE Implementation Engineer will be responsible for driving the adoption of SRE principles across the engineering organization. You will work closely with development, operations, and infrastructure teams to implement and integrate reliability-focused practices such as monitoring, automation, capacity planning, and incident management into the development lifecycle. Your work will help scale and automate critical systems while ensuring that they remain highly available and performant.
**SRE Principles Implementation:**
+ Assist in the implementation of Site Reliability Engineering principles across the organization, ensuring the team adheres to key tenets such as automation, reliability, scalability, and proactive monitoring.
+ Help define and enforce Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to align system performance with business requirements.
+ Collaborate with engineering teams to embed reliability and performance best practices into the development and deployment pipelines.
**Automation & Infrastructure Management:**
+ Implement and maintain infrastructure automation tools and processes using technologies such as Terraform, Ansible, CloudFormation, and Kubernetes.
+ Automate the deployment, monitoring, and scaling of applications to ensure consistency, repeatability, and reliability.
+ Design and implement self-healing infrastructure that can recover from common failures without manual intervention.
**Monitoring & Observability:**
+ Implement monitoring, alerting, and observability solutions for applications and infrastructure using tools like OTEL, cloud Monitoring, Datadog, or New Relic.
+ Ensure metrics, logs, and traces are captured effectively to enable proactive monitoring, detection, and diagnosis of issues.
+ Work with developers and operations teams to define effective monitoring strategies and dashboards that provide deep visibility into production systems.
**Incident Management & Troubleshooting:**
+ Implement and refine processes for incident response, including automated detection, alerting, and remediation of production issues.
+ Develop and maintain runbooks, playbooks, and documentation to facilitate quick and effective incident resolution.
+ Collaborate with on-call engineering teams to ensure rapid response to service disruptions and improve mean time to detection (MTTD) and mean time to recovery (MTTR).
**Capacity Planning & Scaling:**
+ Work with engineering teams to implement capacity planning and scaling strategies that ensure systems can handle growth while maintaining performance and cost efficiency.
+ Implement tools and processes to manage resource usage, including autoscaling, provisioning, and optimization in cloud environments.
**Collaboration & Knowledge Sharing:**
+ Collaborate with development, operations, and other cross-functional teams to implement SRE practices throughout the SDLC.
+ Educate and mentor engineering teams on SRE principles, reliability best practices, and incident management.
+ Contribute to a culture of continuous improvement, driving efficiency, automation, and reliability across the organization.
**Education & Training Experience**
+ Experience with microservices architecture and managing highly distributed systems.
+ Knowledge of advanced observability tools such as distributed tracing (Jaeger, Zipkin), log management (Splunk, ELK), and APM tools.
+ Prior experience in an SRE, DevOps, or infrastructure engineering role, especially in a cloud-native or containerized environment.
+ Familiarity with DevSecOps practices, integrating security into the SRE workflow.
+ Strong experience with cloud platforms (AWS, GCP, Azure), including provisioning and managing cloud-based infrastructure.
+ Hands-on experience with containerization (Docker) and container orchestration (Kubernetes).
+ Proficiency in infrastructure-as-code tools such as Terraform, CloudFormation, or Ansible.
+ Experience with CI/CD pipelines, including integrating SRE practices like automated testing, deployment, and monitoring into the pipeline.
+ Solid understanding of monitoring tools (Prometheus, Grafana, Datadog, ELK stack) and incident management platforms (PagerDuty, Opsgenie).
Second Level Engineering Job
This position is an individual contributor.
Travel: 5%
Job Will Remain Open Until Filled
**Responsibilities**
The Company is one of North America’s leading sales and marketing agencies specializing in outsourced sales, merchandising, category management, and marketing services to manufacturers, suppliers, and producers of food products and consumer packaged goods. The Company services a variety of trade channels including grocery, mass merchandise, specialty, convenience, drug, dollar, club, hardware, consumer electronics, and home centers. We bridge the gap between manufacturers and retailers, providing consumers access to the best products available in the marketplace today.
**Responsibilities**
+ Oversee the entire lifecycle of small-to-medium-sized projects, including design, development, testing, production, and subsequent improvements.
+ Provide on-call support for features they or their team are responsible for.
+ Document designs and write clear, concise, and tested code that is easily understood by others, including designing abstract interfaces and constructing modular libraries.
+ Refactor code regularly to improve error handling, testability, and maintainability.
+ Track and respond to issues raised by external contributors or partners related to their code.
+ Enhance the development experience for their team by improving development tools, test coverage, and code structure. Utilize systematic tools to debug and diagnose issues in a CI/CD pipeline.
+ Contribute to code specifications and participate in small-scale code reviews.
+ Have a deep understanding of key features and architecture for one product and a high-level understanding of several other products, integrations, and capabilities.
+ Advocate for and contribute to engineering standards and development best practices.
+ Understand non-functional requirements and regularly refactor code to improve error handling, security, and maintainability.
+ Stay up to date on industry trends and development best practices, and feel comfortable writing code in an open-source environment.
+ Identify conflicting requirements across the company and flag them to management. Identify risks in code, features, and design, and communicate these to the team to find collaborative solutions.
**Supervisory Responsibilities**
Direct Reports
This position does not have supervisory responsibilities for direct reports.
Indirect Reports
May delegate work to others and provide guidance, direction, and mentoring to indirect reports.
**Travel**
Some travel will be required, estimated up to 5%.
**Minimum Qualifications**
Education Level: Bachelor’s degree in Computer Science, Software Engineering, or related field.
Experience Requirements: 3 to 6 years of experience as a software engineer.
**Knowledge, Skills, and Abilities**
+ Strong foundations in engineering, programming, and software development.
+ Solid understanding of data structures, algorithms, operating systems, networks, and programming languages.
+ Proficiency in concurrent and event-based development.
+ Experience with development and test frameworks.
+ Mastery of debugging and diagnosing issues in a CI/CD pipeline.
+ Strong verbal and written communication skills.
+ Ability to work effectively in an open-source environment and contribute to industry best practices.
**Additional Information Regarding Job Duties and Job Descriptions**
Job duties include additional responsibilities as assigned by one’s supervisor or other manager related to the position/department. This job description is meant to describe the general nature and level of work being performed; it is not intended to be construed as an exhaustive list of all responsibilities, duties, and skills required for the position. The Company reserves the right at any time with or without notice to alter or change job responsibilities, reassign or transfer job positions, or assign additional job responsibilities, subject to applicable law. The Company shall provide reasonable accommodations of known disabilities to enable a qualified applicant or employee to apply for employment, perform the essential functions of the job, or enjoy the benefits and privileges of employment as required by the law.
**Important Information**
The above statements are intended to describe the general nature and level of work being performed by people assigned to this position. They are not intended to be an exhaustive list of all responsibilities, duties and skills required of associates so classified.
The Company is committed to providing equal opportunity in all employment practices without regard to age, race, color, national origin, sex, sexual orientation, religion, physical or mental disability, or any other category protected by law. As part of this commitment, the Company shall provide reasonable accommodations of known disabilities to enable an applicant or employee to apply for employment, perform the essential functions of the job, or enjoy the benefits and privileges of employment as required by the law.
**Job Locations** _US-MO-Saint Louis_
**Primary Posting Location : City** _Saint Louis_
**_Primary Posting Location : State/Province_** _MO_
**_Primary Posting Location : Postal Code_** _63101_
**_Primary Posting Location : Country_** _US_
**Requisition ID** _2024-434298_
**Position Type** _Full Time_
**Category** _Professional: (IT, Finance, Legal, HR, Talent Acquisition, Administrative, Customer Service)_
**Minimum** _USD $93,100.00/Yr._
**Maximum** _USD $121,000.00/Yr._
Confirm your E-mail: Send Email
All Jobs from Advantage Solutions