Troy, MI, USA
86 days ago
HPC System Administrator I
At Roush, we fuse technology and engineering to provide product development solutions to customers in a diverse range of industries. Widely recognized for providing engineering, testing, prototype, and manufacturing services to the transportation industry, Roush also provides significant support to the aerospace, defense and theme park industries.  With over 2,400 employees in facilities throughout the United States, Europe, Asia, and South America, our unique combination of creativity and tenacity activates big ideas on a global stage.  We want motivated, ambitious people who put the needs of our customers first, bring creativity to their work and will do whatever it takes to achieve success.  If you share our passion for providing innovative solutions to complex challenges, we want you on our team.

At Roush, we work alongside the best and brightest to do incredibly cool things you wouldn’t believe. At Roush, you are part of building the future.

The HPC System Administrator I will be responsible for day-to-day operational support of the Roush CAE HPC and VDI hardware and software infrastructure. Day to Day operations include supporting end-users with issues, driving root cause analysis and design task automations. This role will work cross functionally on various project teams and operations based on the direction of HPC System lead engineer. This role will also be involved in developing tools and scripts for simulation tests and optimization of simulation jobs and document all the changes. This position is located in Troy, MI.

Responsibilities:Responsible for the day-to-day operational support of the Roush CAE HPC Clusters, VDI and backup servers: manage and solve any hardware and software issues that may arise. (Systems Administration)Assist in hardware and software upgrade programs to implement new technologies. They will include developing cluster tools or solutions, automation of deployments, HPC job optimization, pre/post processing workflows, alerts, usage and performance metrics.Write Help documents for users, develop functional and technical designs for automated tools that can assist users with HPC job optimization following the Roush CAE HPC change management guidelines.Identify bottlenecks and assist in maximizing performance of our HPC applications.Provide advice and support to Roush HPC users.Interact confidently and professionally with various audiences and stakeholders at all levels.Keep abreast of latest HPC and industry developments and investigate the suitability of newly available technologies, including but not limited to: new CPU/GPU technologies, HMB, memory and high-speed interconnects, web-based software technologies and parallel high performance computing application tuning & optimization.Minimum Qualifications:Bachelor's degree in engineering, computer science or related fields.Experience with Red Hat Enterprise Linux or similar Linux distributions (Fedora, CentOS Stream, Alma Linux and/or Rocky Linux).Experience in bash, python, and/or similar scripting languages.Experience in Microsoft Office products (Excel, PowerPoint, SharePoint, Teams etc.)U.S. Citizen allowing for International Traffic in Arms Regulations (ITAR) compliance.Self-starter, able to identify requirements independently, then make proposals for solutions as well as the flexibility in dealing with change in priorities and working on several projects simultaneously.Excellent documentation skills and the ability to communicate well with people of diverse backgrounds and computer knowledge.High level of personal commitment, occasional availability on weekends and out of hours will be required to ensure the system up time and support system maintenance schedules.Aptitude to learn from others, share knowledge with others, and promote continuous improvement of our processes.Ability to work with the engineering staff and users to aid and instruct how to use the HPC resources optimally.Preferred Qualifications:Minimum of 2 years' experience of HPC system administration and supporting CAE users.Experience in installation, configuration and administration and use of CAE software (LS-DYNA, Nastran, StarCCM+, Abaqus, Fluent, etc.).Experience in installation, configuration and administration of queue systems such as SLURM/LSF/PBS.Experience in installation, configuration and administration of Virtual Desktop Infrastructure (VDI) applications.Willingness to try new tools / technologies and improve process and cost effectiveness.Knowledge of HPC interconnect technologies (InfiniBand, Omni-Path, MPI etc.).Knowledge and understanding of network technologies such as TCP/IP and networked file systems such NFS, GFS, Lustre, GPFDS.Basic project management skills.Our full-time benefits include medical, dental, vision, life insurance, earned sick time, STD, LTD, 401K, tuition reimbursement, paid vacation, and paid holidays.
To apply for this position and view all of our other career opportunities at Roush, click here: https://jobs.roush.com/us/en/
Visit our website:  www.roush.com
Like us on Facebook: www.facebook.com/RoushCareers
Roush is an EO employer – Veterans/Disabled and other protected categories. If you need reasonable accommodation for our employment application process due to disability, please contact
Roush Talent Acquisition at (734) 779-7087.
 
Confirm your E-mail: Send Email