Palo Alto, CA, USA
35 days ago
Applied AI ML Director, Principal Machine Learning Platform Engineer

Job Description

We are seeking a highly skilled and innovative AI ML Director, Principal Machine Learning Platforms to join our team within the Corporate AI ML Technology Group. The ideal candidate will have extensive experience in traditional AI, ML infrastructure, ML Platform tools , GenAI, and Machine Learning Platforms. 

As an Executive Director, Applied AI/ML, Machine Learning Platforms, you will play a pivotal role in collaborating with various teams to develop and implement common ML components for AIT. You will contribute to our innovative projects and drive the future of machine learning at AIT.

In this role, you will:

Achieve state-of-the-art throughput for critical models using advanced techniques like model parallelism and distributed training. Reduce inference time for new model architectures using optimizations like quantization and pruning. Collaborate closely with Applied AI engineering to optimize the internal inference stack, leveraging technologies like TensorRT, ONNX, etc. Recruit and mentor top-tier AI systems engineers, fostering a culture of continuous learning and innovation. Coordinate the inference needs of JPMC's research teams, ensuring alignment with business goals.

Job responsibilities:

Architect and implement distributed ML infrastructure, including inference, training, scheduling, orchestration, and storage. Develop advanced monitoring and management tools for high reliability and scalability. Optimize system performance by identifying and resolving inefficiencies and bottlenecks. Collaborate with product teams to deliver tailored, technology-driven solutions. Drive the adoption and execution of ML Platform tools across various teams. Integrate Generative AI within the ML Platform using state-of-the-art techniques

Required Qualifications, Capabilities, and Skills:

10+ years of experience in engineering management with a strong technical background. Extensive hands-on experience with ML frameworks (TensorFlow, PyTorch, JAX, scikit-learn). Deep expertise in AWS / GCP and Kubernetes ecosystem, including EKS, Helm, and custom operators. Strong coding skills and experience in developing large-scale ML systems. Background in High Performance Computing, ML Hardware Acceleration (e.g., GPU, TPU, RDMA), or ML for Systems. Proven track record in contributing to and optimizing open-source ML frameworks. Strategic thinker with the ability to craft and drive a technical vision for maximum business impact. Demonstrated leadership in working effectively with engineers, data scientists, and ML practitioners. Proven ability to identify trade-offs, clarify project ambiguities, and drive decision-making.
Confirm your E-mail: Send Email