About the Team:
Senior Site Reliability Engineers at UKG are team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation.
About the Role:
Senior Site Reliability Engineers must have a passion for learning and evolving with current technology trends. They strive to innovate and are relentless in their pursuit of a flawless customer experience. They have an “automate everything” mindset, helping us bring value to our customers by deploying services with incredible speed, consistency and availability.
• Engage in and improve the lifecycle of services from conception to EOL, including: system design consulting, and capacity planning
• Define and implement standards and best practices related to: System Architecture, Service delivery, metrics and the automation of operational tasks
• Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response.
• Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis
• Collaborate closely with engineering professionals within the organization to deliver reliable services
• Identify and eliminate operational toil by treating operational challenges as a software engineering problem
• Actively participate in incident response, including on-call responsibilities
• Requirement for on call
About You:
Basic Qualifications:
• 3-5+ years of hands-on experience working in Engineering or Cloud
• 3-5+ years of experience with public cloud platforms (e.g. GCP, AWS, Azure)
• Engineering degree, or a related technical discipline, or equivalent work experience
• Experience coding in higher-level languages (e.g., Python, JavaScript, C++, or Java)
• Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing
• Demonstrable fundamentals in 2 of the following: Computer Science, Cloud Architecture, Security, or Network Design fundamentals
• Working experience with industry standards like Terraform, Ansible, Kubernetes, DataDog
• Experience working with automation
Preferred Qualifications:
• Experience with distributed system design and architecture
• Experience with containerization technologies
• Experience in configuration and maintenance of applications and/or systems infrastructure for large scale customer facing company