Senior DevOps Engineer
Nvidia
We are seeking a Senior DevOps Engineer to join our Farm team to improve its growing services infrastructure. You will be working with a team of passionate and skilled engineers who are continuously working to provide better tools to build and manage our infrastructure. Our team is a mix of varying levels of experience. We need a motivated, hardworking and focused individual who has a real passion for operational excellence, data systems, and automation.
What you'll be doing:
Own the services you build working with cross functional teamsComfortable with frequent code testing and deploymentContinuously improve infrastructure provisioning and management using automationIdentify areas to improve service resiliency through industry standard practicesSupport a globally distributed, On-Prem environment (LSF)Determine root-cause for production level incidents and write corresponding high-quality RCA reportsEnsure the highest level of up-time and Quality of Service (QoS) to internal customers through operational excellenceParticipate in team's on-call rotationWhat we need to see:
B.S. degree in Computer Science or related technical field or equivalent experience8+ years coding/scripting in at least two high level programming languages - Python, Perl, Go, Ruby, Groovy etc.Build and maintain scalable web applications using modern front-end frameworks, back-end technologies, databases, APIs, and cloud platforms.Good Knowledge in operating services including web servers, load balancers, relational/non-relational databases, messaging systems and storage solutionsDeep understanding of linux operation system and TCP/IP fundamental.Knowledge in high-performance computing environments, including job schedulers (e.g., Slurm, PBS, or Grid Engine), parallel computing, and performance tuning.Expertise with at least one major cloud service provider- AWS, GCP, AzureProficient in implementing and managing monitoring tools like Grafana and Prometheus, ensuring system performance, reliability, and real-time data visualization.Proficient in modern CI/CD techniques, GitOps and Infrastructure as Code(IaC)Detail oriented with great communication and documentation skillsWays to stand out from the crowd:
Develop, fine-tune, and deploy advanced LLM-based solutions for [specific applications, e.g., NLP, chatbots, content generation, or data analysisLinux certification from a well known vendor - RedHat, Oracle etc.Prior experience managing large scale Kubernetes deployment in productionStrong skills in modern container networking and storage architecture
Confirm your E-mail: Send Email
All Jobs from Nvidia