We have an opportunity to impact your career and provide an adventure where you can push the limits of what's possible.
As a Lead Software Engineer at JPMorgan Chase within the Cybersecurity, Technology, and Controls line of business, you are an integral part of an agile team that works to enhance, build, and deliver trusted market-leading technology products in a secure, stable, and scalable way. As a core technical contributor, you are responsible for conducting critical technology solutions across multiple technical areas within various business functions in support of the firm’s business objectives.
We are seeking a highly skilled ML Ops Engineer with expertise in deploying, monitoring, and managing machine learning models in production environments. This role involves working with cutting-edge technologies to ensure scalable, reliable, and efficient AI solutions. The ideal candidate will be adept at building robust infrastructure and processes to support the seamless operation of machine learning models. In this role, you will be responsible for automating model deployment, optimizing infrastructure, and ensuring the continuous performance of AI systems. Your ability to collaborate with cross-functional teams and address operational challenges will be crucial to driving innovation and delivering impactful AI solutions.
Job responsibilities
Executes creative software solutions, design, development, and technical troubleshooting with ability to think beyond routine or conventional approaches to build solutions or break down technical problems Develops secure high-quality production code, and reviews and debugs code written by others Identifies opportunities to eliminate or automate remediation of recurring issues to improve overall operational stability of software applications and systems Leads communities of practice across Software Engineering to drive awareness and use of new and leading-edge technologies Adds to team culture of diversity, equity, inclusion, and respect Collaborate with cross-functional teams, including data scientists and software engineers, to understand model requirements and integrate them into applications Develop and implement strategies for deploying machine learning models into production, ensuring scalability, reliability, and efficiency Design and maintain continuous integration and continuous deployment (CI/CD) pipelines to automate the testing, deployment, and updating of machine learning models Manage and optimize the infrastructure required for running machine learning models, including cloud services, containerization (e.g., Docker), and orchestration tools (e.g., Kubernetes) Implement monitoring and logging solutions to track model performance, detect anomalies, and ensure models are operating as expected in production. Respond to incidents and troubleshoot issues related to model performance, data quality, and infrastructureRequired qualifications, capabilities, and skills
Formal training or certification on security engineering concepts and 5+ years applied experience Hands-on practical experience delivering system design, application development, testing, and operational stability Advanced in one or more programming language(s) Proficient in all aspects of the Software Development Life Cycle Demonstrated proficiency in software applications and technical processes within a technical discipline (e.g., cloud, artificial intelligence, machine learning, mobile, etc.) Practical cloud native experience Strong expertise in deploying and managing machine learning models in production environments Proficiency in building and maintaining CI/CD pipelines for machine learning workflows. Expertise in cloud platforms (e.g., AWS, Google Cloud, Azure), containerization technologies (e.g., Docker, Kubernetes) Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack). Advanced Python Programming Skills including Pandas, Numpy and Scikit-Learn. Strong SQL skills a plusPreferred qualifications, capabilities, and skills
Proven experience in deploying and managing large-scale machine learning models in production environments Strong ability to monitor ML models in production, addressing model performance and data quality issues effectively Working knowledge of security best practices and compliance standards for Machine Learning systems Experience with infrastructure optimization techniques to enhance performance and efficiency Development of REST APIs using frameworks such as Flask or FastAPI for seamless integration into business solutions Familiarity with creating and utilizing synthetic datasets to improve model training and evaluation Bachelor's degree in Computer Science, Engineering, or a related field, with relevant experience in ML Ops or related roles