The Platform Reliability Engineering Architect will lead a dynamic team responsible for ensuring our critical systems' reliability, performance, and efficiency. This role involves a strategic blend of engineering and operations and requires a strong background in software development, systems engineering, and leadership. This is a pivotal role in our operations, demanding a dedicated individual who excels in a fast-paced and collaborative environment. We invite you to apply if you are driven by system reliability and ready to lead a high-performing team.
Job Responsibilities• Lead and mentor a team of Platform Engineers, fostering a culture of continuous improvement and innovation.
• Collaborate with product and engineering teams to design and implement scalable solutions.
• Develop and maintain a reliable monitoring and alerting system to detect and mitigate issues proactively.
• Handle incidents to reduce TTM and TTR consistently
• Participate and lead post-mortem analyses to prevent future outages.
• Manage priorities, projects, and the overall workflow of the SRE team.
• Ensure compliance with security best practices and company policies.
• Stay ahead of industry trends and emerging technologies to improve system reliability and performance continuously.
• Exceptional problem-solving skills and the ability to work under pressure.
• Excellent communication and team-building skills.
• 12 years of experience in Software Development, Platform Engineering, DevOps, or similar roles, with at least 5 years in a lead and/or architect position.
• Experience mentoring geographically dispersed teams.
• Recommend the appropriate technological approach, team structures, and skill sets
• Proficiency in programming languages such as Python, Go, or Java.
• Extensive experience with cloud services (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker).
• Experience designing and implementing CI/CD pipelines and Configuration Management (Jenkins, Ansible, Terraform)
• Deliver architectural initiatives that drive and improve efficiency in line with business strategy.
• Familiarity with distributed systems design patterns using tools such as Kubernetes.
• Exceptional knowledge of observability tools and setting up architecture for proactive monitoring of the product.
• Experience in setting up SLOs & SLIs.
• Proven track record of designing and implementing scalable, high-availability systems.
Compensation:
The target salary range for this position is 192,330 - 270,380 USD. The salary offered will be determined by the candidate's location, qualifications, experience, and education and may be outside of this range. Final compensation packages are competitive and in line with industry standards, reflecting a variety of factors, and include a comprehensive benefits package. This may cover Health Insurance, Life Insurance, Retirement or Pension Plans, Paid Time Off (PTO), various Leave options, Performance-Based Incentives, employee stock purchase plan, and/or restricted stocks (RSU’s), with all offerings subject to regional variations and governed by local laws, regulations, and company policies. Benefits may vary by country and region, and further details will be provided as part of the recruitment process.