Pune, Maharashtra
2 days ago
Senior Site Reliability Engineer (SRE) Lead

In everchanging SaaS landscape there are a few persistent items that contribute to developing quality solutions with speed. Namely, ensuing operational activities are treated as software development enhancements, manual tasks are remediated though automation, risk reduction though compartmentalization of services/code and consumption of readily available provider services. Product/development teams require an accountable partner to advance on these topics, The SRE (Site Reliability Engineering) team will be this partner.

The SRE team will support the Siemens Xcelerator platform and will be responsible for identifying, managing, improving, and reporting on availability, resiliency, reliability, and stability efficiencies. This includes providing technical guidance and leadership to drive solutions, create & enhance processes that deliver excellence. A strong relationship with the various product teams of the Xcelerator platform is necessary to support core objectives. This roles success will be defined by product teams meeting their SLO’s with healthy product adoption and operational excellence.

This position will be responsible to support technology and cluture though an enterprise ecosystem to ensure developers and products exceed product SLO’s (Service level Objectives) and clearly, without dispute, benefit from every interaction with the SRE team.

Responsibilities


Incident Management, Game Day coordination,Create and drive Metric/observability solutions and reviewsSupport production readiness reviewsCross division role model to advance the SRE practice in SiemensComplete technological control over methods of automation, codifying optional activities, microservice architecture, platform engineering to ensure changes, updates or technical advancements are in place for a productEnsure the team can provide the design, deployment, automation, and scripting solutions to drive new capabilities, visibility, and efficiencySimplify highly complex ideas, architectures and concepts to encourage achievable adoptionCollaborate with other technical platforms and partners to engineer automated and integrated solutions between tools, services, teams that increase availability, reliability, and performanceOwn and ensure the internal and external SLA’s meet and exceed expectationsBe part of maintaining a 24x7, global, highly available SaaS environmentParticipate in an on-call rotation that supports our production infrastructureTroubleshoot production availability incidents that often span across multiple teams and servicesEnsure the SRE team can coordinate production incident post-mortems, and contribute to solutions to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditionsCommunicate to business and technical partners on incidents as they occur when they impact system performance or availability at a critical level

Required Knowledge/Skills, Education, and Experience

Bachelor’s Degree or equivalent experience;Proven experience as a Site Reliability Engineer or equivalent role;Experience working in a large organization though a SRE transformation where existing applications were adapted to contemporary targetsProven experience with automation via scripting & API developmentExperience with software development in the cloudExperience with monitoring tools  (Datadog, CloudWatch, CloudTrail, Cloudability, or equivalent tools)Proven experience with containerization, specifically KubernetesExperience with Amazon Web Services (AWS) services and Terraform, CloudFormation, Ansible, or equivalent tools

Preferred Knowledge/Skills, Education, and Experience

Desired certifications include: Datadog, Kubernetes, Security, AWS certificationUnderstanding of ITILDeep understanding of SRE and Incident management strategiesExperience with issue/incident tracking tool (ServiceNOW, ServiceDesk, Jira or equivalent tools) and open source tools (Linux, Python, Git, Ansible)Experience on Enterprise IT environment with distributed environmentsNetworking concepts, including firewalls, VPN, routing, load balancers, security and DNSSenior level system administration experience, including troubleshooting, support, mentorship/training, and oversight

Why us?

Working at Siemens Software means flexibility - Choosing between working at home and the office at other times is the norm here. We offer great benefits and rewards, as you'd expect from a world leader in industrial software.

A collection of over 377,000 minds building the future, one day at a time in over 200 countries. We're dedicated to equality, and we welcome applications that reflect the diversity of the communities we work in. All employment decisions at Siemens are based on qualifications, merit, and business need. Bring your curiosity and creativity and help us shape tomorrow!

Siemens Software. Transform the Everyday


#LI-PLM 

#LI-HYBRID





Confirm your E-mail: Send Email