Seattle, WA, US
2 days ago
Software Development Engineer, GenAI/ML GPU Orchestration Service, Alloy
GenAI is revolutionizing every industry in the world, yet we are still at the very beginning. As the appetite for GenAI continue to grow exponentially, the demand for GPU instances grows exponentially with that resulting in a biggest problem on our generation - how do we get more GPU capacity? In order to solve it as an industry we need to (1) tackle both the production scale of GPUs, and (2) optimize usage of existing scarce GPU resources to do more with less.

The second bucket is where our team comes in the picture. Large cloud providers are spending billions already on GPU resources and resources are distributed in a Silo fashion to teams where they are unable to utilize the resources to its fullest extent, whether due to peak/off-peak seasonality or workloads completely sooner than expected during vacation. As we looked at the data and saw 15-30% idle capacity across GPU allocation, this then presented a huge opportunity for us to tackle.

As part the Alloy Greenland team, we are a new team started beginning of the year operating startup style with true Day1 spirit on a mission to accelerate AI/ML innovations of all teams across Amazon and as an extension with our partnership with AWS SageMaker to the rest of the world.

If you love working backwards from customers, building 0-1, having exposure to senior leadership visibility, and ultimately making a dent in the world excites you, this is the right place for you!

Alloy Greenland team is part of the Alloy organization which is the central efficiency org which drives cost savings for all service teams within Amazon via efficient use of AWS resources as they build and operate their services. This team is special in 3 ways (1) business impact - we have proven records to save cost by hundred million dollars annually. We have earned trust and reputation from service teams, partner teams (business and technical), and senior leadership (2) technical complexity - our system is not a single product but the whole Amazon. We create central efficiency solutions which save costs for thousands of internal services with minimal or zero efforts from their engineers; (3) professional network - we work with a group of Principal Engineers and Distinguished Engineers closely. Working with brilliant people helps you grow your career.


In this role, you will:

* Build 0-1 products and service that delight our customers
* Work closely with AL/ML customers across Amazon and innovate on their behalf
* Solve performance and efficiency problems that manifest at scale.
* Design metrics and measure performance and cost efficiency of services in Amazon's ecosystem.
* Collaborate with service teams to identify inefficiencies, and design and implement solutions.
* Design and develop highly available components and profiling tools.
* Lead and mentor a team of engineers.

Confirm your E-mail: Send Email