M365's COSMIC team designs, builds, and operates a global scale managed-runtime environment based on Azure Kubernetes Service for the benefit of Microsoft Substrate service and developers. COSMIC could be compared to a ‘Kubernetes PaaS’. Our charter builds and maintains solutions that enable substrate service teams onboarding to Cosmic Linux platform to focus on their own scenarios and business requirements rather than worrying about common infrastructure components like Deployment, Upgrades, Security, Observability, Debuggability etc.
We are looking for Senior Site Reliability Engineer to maintain the health of Cosmic platform by ensuring all the agents are updated, upgrades are happening as per schedule and debug any issues arising out of it. As an SRE, you would need to identify the patterns from the service alerts, add automations to enrich the incidents with metadata as well as build solutions for auto remediation wherever possible.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.