Toronto, Canada
48 days ago
Site Reliability Engineer lll
Develop software and software fixes to integrate internal systems. Ensure code quality, test and distribute code updates, and monitor the health and stability of the servers.What you'll do:Meet and beat Key Performance Indicators, SLAs, maintain an error budget and adhere to it.Identify, evaluate, and execute preventative measures to minimize and avoid impact to the customer experienceEmploy deep troubleshooting skills to improve the availability, performance, and security for CR and Emburse, ensure services are designed with 24/7 availability and operational readiness and rigorCoding and Automation of Applications on Cloud PlatformsWork with Engineering leadership to build shared services that meet the requirements and need of the platform and application teamsWork with Cloud Platform and Operations leaders to develop narratives, backlog grooming, epic planning and overall sprint planning processesEnsure the platform holds a high degree of reliability, at least four 9s.Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systemsOwn technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.Work closely with product different stakeholders to align Operational priorities and planning with the product and engineering roadmapPrepare and present engineering related documents to key stakeholdersProvide recommendations and feedback in review sessions, design reviews and review sessions.Mentor SRE I and II’sAssist guiding more junior engineers in best practicesConduct and assist with investigation, test and deployment activities, identify and mitigate risks in development activitiesWhat we're looking for:Bachelor’s degree in Computer Science or a STEM field requiredMinimum of 7 years’ experience in an engineering role requiredDeep understanding of infrastructure as code, scripting, self-healing, containers, DevOps tooling, distributed systems higly desiredExperience working with Ansible and Terraform tools hightly desirableExcellent written and verbal communication skills, in EnglishExperience with full lifecycle of SaaS implementations as well as Infrastructure as codeExcellent follow-up and project management skillsProven ability to create and maintain new toolsExcellent troubleshooting skillsExcellent technical skills. Up to 70% of the job is hands on in a distributed Linux environmentStrong scripting skills. OOP is a plusLiaise between other teams to help prioritize and align prioritiesExperience working with an off shore team
Confirm your E-mail: Send Email
All Jobs from Emburse