High Performance Computer Software Engineer II

Richardson, Texas


Employer: University of Texas at Dallas
Industry: High Performance Computing
Salary: $105000 per year
Job type: Full-Time

Posting Details
Posting Details

Posting Number
S05892P

Position Title
High Performance Computer Software Engineer II

Functional Title
High Performance Computing Operations Software Engineer

Department
High Performance Computing

Salary Range
Up to $105,000 Annually Dependent on Qualifications

Pay Basis
Monthly

Position Status
Regular full-time

Location
Richardson

Position End Date (if temporary)

Posting Open Date
08/27/2024

Posting Close Date
09/03/2024

Open Until Filled
No

Desired Start Date
09/23/2024

Job Summary
Reporting to the Director of HPC Operations. This is a mid-level HPC software engineer with a background in a High Performance Computing environment. To collaborate with and support our customers, this engineer must have demonstrated a consultative customer service attitudes in prior roles in similar organizations. Primary responsibilities include: Install and manage the vast list of software required to support researcher needs. Keep software secure by implementing a software lifecycle process to keep software patched and supported. Leverage automation to simplify management and administration of our HPC environment including HPC systems software. They will ensure software has a planned lifecycle in line with University policies in collaboration with the core HPC engineering team and users of the software across HPC clusters. The applicant must have broad industry knowledge of hardware and software services involved in building and operating HPC environments.

Minimum Education and Experience
High school diploma with eight (8) years recent applicable experience.
Associate Degree with six (6) years recent applicable experience.
Bachelor's Degree with four (4) years recent applicable experience.

Preferred Education and Experience
  • Master's degree in Computer Science or equivalent with two years of experience in corresponding research services, support efforts, products and technologies.
  • Current knowledge of HPC best practice and systems deployment and maintenance.
  • Troubleshooting methodology and awareness of industry standards.
  • Excellent interpersonal, written, and verbal communication skills are a must.
  • Good technical documentation, architecture diagramming, and organizational skills.
  • Ability to manage support tickets and prioritize considering varied scope, scale, and technical requirements.
  • Familiarity with data center operations fundamentals in networking and power.
  • Experienced in Linux operating system for installing and managing software.
  • Ability to package scientific software into RPMs, containers (and integrate with Lmod-so users can `module load `).
  • Familiarity with Apptainer/Singularity HPC /Docker/Kubernetes and supporting computational services using containers
Essential Duties and Responsibilities
  • Be the primary software support engineer for University HPC cluster with support from core HPC team for complex scenarios. Level 2 support experience at scale of 1 to 3 with 3 being a senior specialist.
  • Act as a role model in demonstrating integrity and ethical behavior in working with confidential and university information.
  • Assists in development and implementation of internal policies, rules, and operation procedures for Research Computing and Cyber infrastructure to guarantee various assurance models such as NIST 800-53 and NIST 800-171 under which assured research is conducted.
  • Maintain documentation of HPC services and software packages available on university HPC platforms and train end-users on use of HPC Services
  • Performs annual updates, expert level software coding (prefer Python, Linux Shell, etc.) in at least two or more languages.
  • Perform installation, configuration, updating, networking, performance monitoring and troubleshooting of software on HPC Systems.
  • Ability to develop, troubleshoot, modify, catalog, document, and update scripts to build utilities to automate common processes for administration and management of HPC Clusters
  • Ability to package scientific software into RPMs and integrate with Lmod-so users can module load
  • Able to compile, test and install many related open-source scientific software packages as requested by research faculty, staff, and students as standalone packages and containerized packages.


Physical Demands and Working Conditions

Physical Activities

Working Conditions

Additional Information
KNOWLEDGE , SKILLS & ABILITY
  • Familiarity with at least two high performance cluster operating systems such as OpenHPC, ROCKS , Bright/Nvidia Cluster Manager
  • Familiarity with large scale high performance parallel file storage systems such as WEKA , VAST , GPFS , BGFS , CEPH .
  • Experience with installing and supporting: Open source and commercial research related software, Python, R, Matlab, Mathworks, Julia, Ansys, Intel, nVidia cuda and GCC compilers.
  • Experience with all related dev ops tools such as GitHub, GitLab, Ansible, Spack or EasyBuild package management tools for rpm and or deb package building.
  • Experience with SLURM job scheduler


Additional Information:
  • Candidates will be subject to a criminal background check
  • On-call availability for quickly responding to and resolving system emergencies, both during regular and emergency off-hours.
  • Emergency on-call rotation availability for 24×7×365 coverage.
  • Hybrid Remote Work Available for Texas Residents with further discussion and agreement. A remote work agreement will need to be completed within 14 days of employment.
  • Sitting for extended periods of time. Dexterity of hands and fingers to operate a computer keyboard, mouse, power tools, and to handle other computer components. Lifting and transporting of moderately heavy objects, such as servers, switches, computers, and peripherals.
  • Visa sponsorship is not available.
Special Instructions Summary

Important Message
1) All employees serve as a representative of the University and are expected to display respect, civility, professional courtesy, consideration of others and discretion in all interactions with members of the UT Dallas community and the general public.

2) The University of Texas at Dallas is committed to providing an educational, living, and working environment that is welcoming, respectful, and inclusive of all members of the university community. UT Dallas does not discriminate on the basis of race, color, religion, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, national origin, disability, genetic information, or veteran status in its services, programs, activities, employment, and education, including in admission and enrollment. EOE , including disability/veterans. The University is committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities. To request reasonable accommodation in the employment application and interview process, contact the ADA Coordinator . For inquiries regarding nondiscrimination policies, contact the Title IX Coordinator .

Created: 2024-08-31
Reference: S05892P
Country: United States
State: Texas
City: Richardson
ZIP: 75081


Similar jobs: