Hyderabad, Telangana, India
1 day ago
SRE III

There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems.

As a Site Reliability Engineer III at JPMorgan Chase within the Infrastructure Platforms team, you will solve complex and broad business problems with simple and straightforward solutions. Through code and cloud infrastructure, you will configure, maintain, monitor, and optimize applications and their associated infrastructure to independently decompose and iteratively improve on existing solutions. You are a significant contributor to your team by sharing your knowledge of end-to-end operations, availability, reliability, and scalability of your application or platform.

Job responsibilities

 

Design and implement solutions to enhance the reliability and scalability of platforms and applications to accommodate rapidly growing demands. Analyze defects, propose improvements, and drive efficiencies in systems and processes. Optimize the performance and utilization of AI ML platform and infrastructure. Develop observability, security, and finops tools and orchestration. Author and improve the quality of technical engineering documentation. Debug and solve issues in a production environment. Participate in on-call rotations and escalation workflows. Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications Implements infrastructure, configuration, and network as code for the applications and platforms in your remi

 

 

Required qualifications, capabilities, and skills

 

Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform Formal training or certification on Site Reliability Engineering concepts and 3+ years applied experience Expertise in programming with Python and cutting-edge software engineering practices. Coding skills in any of the programming languages like Python, Java, PHP, Shell Scripting, Powershell Scripting Experience in designing and implementing large-scale distributed systems and cloud-native architecture. Experience with developing on Cloud, especially AWS, and knowledge in Infrastructure as Code tools such as Terraform Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team Ability to initiate and implement ideas to solve business problems     Preferred qualifications, capabilities, and skills Prior experience working in AI, ML, or Data engineering. Systematic problem-solving and troubleshooting skills in a complex system. Excellent communication skills working with stakeholders and domain experts across the company to design solutions to user problems. Self-disciplined, self-managed, self-motivated with a strong sense of ownership, urgency, and 
Confirm your E-mail: Send Email