Minimum 5 years experience in the below skills HPC systems Clusters Linux systems HPC HW knowledge especially in the server, GPU, networking, Storage, BIOS & BMC arenas TCP/IP fundamentals BE/BTech or MS degree + 6 to 10 years validated experience Computer Engineering or Electrical Engineer related fields
Design, implementation & support of high-performance compute clusters
Solid knowledge on HPC systems, including CPU/GPU architecture, scalable/robust storage, high-bandwidth inter-connects, and a knowledge of cloud based computing architectures
Apply their attention to detail to generate HW BOMs for the HCP Clusters, provide vendor management and oversee HW release activities.
Required Qualifications:
Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu)
Experience of crafting and maintaining robust storage
Strong HPC HW knowledge especially in the server, GPU, networking, Storage, BIOS & BMC arenas.
Experience in System-D, Net boot/PXE, Linux HA.
Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP, HTTP, LDAP, SMTP.
Ability to code and develop Shell and Python scripts.
Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc) .
Preferred Qualifications:
Possess a strong DevOps focus: Knowledge of setting up a continuous development pipeline (Jenkins), Repository software (Git-based), Singularity & Docker Containers.
Kubernetes, Prometheus & Grafana experience
Knowledge of Apache/Nginx, Setting up proxy/reverse proxy, application server routing, load balancing (HA Proxy)
Skills and Abilities:
Team Orientation & Interpersonal – Highly motivated teammate with ability to develop and maintain collaborative relationships with all levels within and external to the organization.
Organization & Time Management – Able to plan, schedule, organize, and follow up on tasks related to the job to achieve goals within or ahead of established time frames.