Winston Salem, NC, USA
93 days ago
Senior Site Reliability Engineer

As a Site Reliability Engineer (SRE), you'll play a pivotal role in ensuring the health, reliability, performance, and scalability of our applications. You'll bridge the gap between development and operations, leveraging your technical expertise and problem-solving skills to triage production issues, automate operations, optimize processes, and maintain high availability.

Key Responsibilities:

•    Steward of Application Health: Work closely with application developers to design resilient, scalable, and maintainable applications, ensuring they meet operational requirements and minimize downtime.

•     Collaboration: Participate in code review; mentor and train peers; advocate DevOps principals to application developers.

•     Infrastructure Automation: Develop and maintain automation and tools to streamline deployments, configuration management, and infrastructure provisioning.

•     Monitoring and Alerting: Implement robust monitoring systems to proactively identify and address performance bottlenecks, anomalies, and security threats.

•     Capacity Planning: Forecast resource needs and optimize infrastructure utilization to ensure high availability and performance.

•     Change Management: Collaborate with development teams to ensure smooth deployment of new features and updates, minimizing disruptions.

•     Security and Compliance: Adhere to security best practices and implement measures to protect systems from vulnerabilities and threats.

•     Guardian of SLA: Actively monitor and maintain the health and performance of applications, ensuring they meet Service Level Agreements. Respond to, triage and mitigate emergent problems in production.

•     On-Call Support: Participate in on-call rotation to provide timely support and resolution for critical issues.

Required Skills and Experience:

•     Strong programming skills in languages like Python, Ruby, Bash, Rust, Go.

•     Experience with cloud platforms (AWS, GCP, Azure) and infrastructure as code tools (Cloudformation, Terraform, Ansible, Chef)

•     Deep understanding of containerization technologies (Docker, Kubernetes)

•     Proficiency with linux (Ubuntu)

•     Proficiency in monitoring and alerting tools (Cloudwatch, Prometheus, Grafana)

•     Knowledge of DevOps practices and methodologies (CI/CD, Agile)

•     Excellent problem-solving and troubleshooting skills

•     Strong communication and collaboration abilities

Preferred Skills and Experience:

•     Knowledge of asynchronous processing (Kafka, Celery)

•     SQL Database administration and query optimization (Postgres, MySQL)

•     Experience with Github Actions

 

Why Join Us:

•     Be part of a dynamic and innovative team

•     Work on cutting-edge technologies and projects

•     Opportunity for professional growth and development

 

If you are passionate about building reliable and scalable systems and have a strong foundation in DevOps, we encourage you to apply.

We are an Equal Opportunity Employer, including disability/vets.

Confirm your E-mail: Send Email