SRE II - Observability & Reliability
Movius Interactive
Job Summary We are seeking a Senior Software Engineer to join our Site Reliability Engineering team, with a focus on Observability and Reliability. As a key member of our SRE team, you will play a critical role in ensuring the performance, stability, and availability of our applications and systems with a focused approach in Application Performance Management, Observability & Reliability of the platform. The Senior Software Engineer will be responsible for the design, implementation, and maintenance of our observability and reliability infrastructure, with a primary focus on the ELK stack (Elasticsearch, Logstash, and Kibana). The role involves configuring, fine-tuning, and automating alerts, integrating Elastic solutions with other tools and applications, generating reports, and optimizing the observability and monitoring systems. Key Duties & Responsibilities 1 Collaborate with cross-functional teams to define and implement observability and reliability standards and best practices. 2 Design, deploy, and maintain the ELK stack for log aggregation, monitoring, and analysis. 3 Develop and maintain alerts and monitoring systems, ensuring early detection of issues and rapid incident response. 4 Create, customize, and maintain dashboards in Kibana for different stakeholders. 5 Collaborate with software development teams to identify performance bottlenecks and recommend solutions. 6 Automate manual tasks and workflows to streamline observability and reliability processes. 7 Conduct regular system and application performance analysis and optimization, effective automation & tooling, capacity planning and optimization, security practices and compliance adherence, documentation and knowledge sharing, Disaster Recovery and backup. 8 Generate and deliver detailed reports on system performance and reliability metrics. 9 Stay up to date with industry trends and best practices in observability and reliability engineering. Qualifications/Skills/Abilities Minimum Requirements Formal Education Bachelor’s degree in computer science, Information Technology, or a related field (or equivalent experience). Experience (type & duration) 5+ years of experience in Site Reliability Engineering, Obervability & reliability, DevOps Skills Proficiency in configuring and maintaining the ELK stack (Elasticsearch, Logstash, Kibana) is mandatory. Strong scripting and automation skills, with expertise in Python, Bash, or similar languages. Experience in Data structures using Elasticsearch Indices. Experience in writing Data Ingestion Pipelines using Logstash. Experience with infrastructure as code (IaC) and configuration management tools (e.g., Ansible, Terraform). Handson and experience with cloud platforms ( AWS preferred) and containerization technologies (e.g., Docker, Kubernetes). Good to have Telecom domain expertise but not mandatory Strong problem-solving skills and the ability to troubleshoot complex issues in a production environment. Excellent communication and collaboration skills. Accreditation/certifications/licenses Relevant certifications (e.g., Elastic Certified Engineer) are a plus.
Confirm your E-mail: Send Email
All Jobs from Movius Interactive