USA
8 days ago
Senior AI/HPC Software Engineer

Azure is building modernistic accelerated supercomputers at unforeseen scales to facilitate the massive computational demands of the world’s leading generative Articial Intelligence (AI). Microsoft’s Eagle cluster, a Graphics Processing Unit  (GPU)-accelerated supercomputer, is a noteworthy example achieving the coveted #3 and #2 ranks in Top500 and MLPerf benchmarks respectively. The Azure AI high performance computing team is looking for a Senior AI/ HPC Software Engineer to benchmark, profile, debug and tune the generative AI applications running in the production infrastructure. Sophisticated tools and techniques are needed to maintain the reliability, runtime performance, and health of the hundreds of nodes in a supercomputer consisting of thousands of GPUs. The candidate will work closely with customers, who are building the world’s leading generative AI, to understand the characteristics of their workloads, profile them to find performance bottlenecks, and instrument best known state-of-the-art and novel tools and techniques to achieve the smooth operation of the AI jobs. As a contributing member of the core group of engineers in Azure, the candidate would also bring to the table best practices driving architectural changes and influence roadmap of relevant software and hardware components. Your work will directly impact the business goals of a wide range of users and facilitate the next wave of growth and innovation in AI, and HPC in the cloud in general.

 

We are looking for a Senior AI/ HPC Software Engineer engineer who is  about quality, wants the customer to succeed and get things done. You will join a phenomenal team of engineers and researchers with deep experience in high performance computing, machine learning, deep learning, middleware, and software engineering. The following values drive us:

Drive for Results: We’re here to build great products. We take on whatever work is right for the product and strive for the best possible results.Modesty and Adaptability: The right answer is more important than being right. We search for solutions as a team, adapt quickly and value transparent and open feedback.

Your mission will be to help ensure the Azure platform is consistent on performance, can scale on-demand, and engineered to withstand the unparalleled computing demand from the customer workloads. You will help build a test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality.

 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Confirm your E-mail: Send Email
All Jobs from Microsoft