HPC Systems Site Lead - Los Alamos, NM - DOE Q Clearance Needed

Los Alamos, New Mexico


Employer: Day & Zimmermann Group
Industry: Information Technology
Salary: Competitive
Job type: Full-Time

HPC Systems Site Lead needed for a Direct Hire opportunity with SOC's client to work onsite in Los Alamos, NM.

*Candidates must have an active (DOE) Q Clearance or have held one in the past 3 years to be considered for this role.

Job Description
Contributions impact technical components of Client products, solutions, or services regularly and sustainable. Applies advanced subject matter knowledge to solve complex business issues and is regarded as a subject matter expert. Provides expertise and partnership to functional and technical project teams and may participate in cross-functional initiatives. Exercises significant independent judgment to determine best method for achieving objectives. May provide team leadership and mentoring to others.

Responsibilities:
Service delivery
  • Maintain the HPC systems availability to the customer
  • Lead technical output of on-site client HW technicians, system admins, and system analysts
  • Serve as primary customer focal point for system support of systems and on-site activities
  • Full-time 100% presence on customer site for standard business hours.
  • Routine face-to-face and group interaction with site team to organize tasks, follow up, and assist with challenges they encounter
  • Track system health and Cases, review regularly (weekly) with customers and HPC leadership
  • Maintaining availability reports for tracking SLA's
  • Pre-plan system upgrades; review plans with team and customers, arrange for staffing and equipment, including pre-arrange open lines of communication in case of issues
  • Escalate Cases and assist team members escalating Cases to next-tier support, and follow-up to drive closure via escalation processes
  • Manage on-site parts inventory using business tools
  • Manage site tools and equipment
  • Maintaining the on-call schedule to support our 365 24x7 contracts
  • Assisting with hardware and system installation activities in new systems
Team support
  • Build strong working relationships with teammates, leadership, and customers
  • Maintain awareness of upcoming training and prompt team members to complete trainings
  • Maintain a team calendar of planned leave including on-call schedule for operational issues
  • Provide performance review input to the District Service Manager (DSM) and suggestions for team member performance and development.
  • Escalate to DSM any personnel issues, risk of missing SLA, or customer satisfaction
  • Maintain a clean and safe working environment
  • Support DSM in on-boarding new team members by providing site-specific details (e.g. customer network accounts, badge, parking, etc.)

Required Qualifications & Experience:
  • 8+ years of professional experience and a Bachelor of Arts/Science or equivalent degree in computer science or related area of study; without a degree, three additional years of relevant professional experience (11+ years in total).
  • In-depth knowledge of high-performance computing (HPC) systems.
  • Proficiency in managing and optimizing HPC environments, including system configuration, performance tuning, and troubleshooting.
  • Strong understanding of parallel computing, cluster management, and distributed computing technologies.
  • Experience with HPC workload managers and schedulers such as SLURM, PBS, or similar.
  • Advanced knowledge of Linux operating systems.
  • Familiarity with software development tools and environments commonly used in HPC, including compilers, debuggers, and performance analysis tools.
  • Experience with various scripting languages such as Python or Bash.
  • Proven experience in system administration, including hardware and software installation, maintenance, and upgrades.
  • Knowledge of network architecture, storage solutions, and data management within HPC environments.
  • Ability to implement and manage security protocols and best practices in a high-performance computing context to maintain customer security posture.
  • Strong project management skills, including planning, execution, and monitoring of HPC projects.
  • Ability to lead and coordinate a team of technical professionals, ensuring timely and successful project delivery.
  • Experience in resource allocation, budgeting, and performance metrics tracking for HPC projects.
  • Excellent problem-solving abilities, with a focus on identifying root causes and implementing effective solutions.
  • Strong analytical skills to assess system performance and make data-driven decisions for optimization.
  • Ability to troubleshoot complex technical issues in a high-stakes HPC environment.
  • Exceptional communication skills, both written and verbal, to effectively interact with team members, stakeholders, and clients.
  • Ability to convey complex technical information in a clear and concise manner to non-technical audiences.
  • Strong collaboration skills to work effectively within a multidisciplinary team and across organizational boundaries.
  • Extensive experience in HPC system management and administration, with a track record of successful project and team leadership.
  • Willingness to participate in ongoing professional development and training opportunities which may require travel.

Preferred Qualifications:
  • CompTIA A+ or Server+ Certification
  • Security+ Certification
  • Linux+ Certification
  • PMP or Project+
  • Vendor Certifications
  • Experience with ticket-tracking software (Salesforce, SmartSheets : any ticket tracking is good)

Employment Prerequisites
The following requirements must be met to be eligible for this position: successful completion of a background investigation and d rug urinalysis.

SOC, a Day & Zimmermann company, is an Equal Opportunity Employer, EOE AA M/F/Vet/Disability.

Created: 2024-08-22
Reference: 232703
Country: United States
State: New Mexico
City: Los Alamos


Similar jobs: