Software Engineer, Systems ML - HPC Specialist
Bellevue, Washington
Employer: Meta
Industry:
Salary: Competitive
Job type: Full-Time
Meta is seeking an AI Software Engineer to join our Research & Development teams. The ideal candidate will have industry experience working on AI Infrastructure related topics. The position will involve taking these skills and applying them to solve for some of the most crucial & exciting problems that exist on the web. Some aspects of this role as an HPC specialist may include authoring components such as cuBLAS, cuDNN, AITemplate, FlashAttention and development of runtimes such as LLM disaggregated runtime. HPC specialists spend time optimizing the program to reduce the accelerators idle time. They also develop tools to debug (cuda-gdb), profiler utilizing the accelerated computing hardware (such as PE's/SFU etc in MTIA or Transformer engine in H100). They are experts in systems who are able to design, debug and accelerate AI workloads from single-node scale up to multi-node scale out distributed systems. They also are able to influence the next generation of Silicon architectures (such as Tensor Core in V100. Transformer Engine in H100) based on the evolving AI workload needs. We are hiring in multiple locations.
Software Engineer, Systems ML - HPC Specialist Responsibilities
Minimum Qualifications
Preferred Qualifications
Start preparing
Learn about how to prepare for your interview with our interview guide, tips, and interactive experiences.
Visit interview prep
Software Engineer, Systems ML - HPC Specialist Responsibilities
- Apply relevant AI and machine learning techniques to build & optimize our intelligent systems that improve Metas products and experiences
- Develop custom/novel architectures, define use cases, and develop methodology & benchmarks to evaluate different approaches
- Apply in depth knowledge of how the machine learning system interacts with the other systems around it
- Assist in goal setting related to project impact, AI system design, and ML excellence
Minimum Qualifications
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
- 2+ years of experience in HPC and parallel computing.
- Proficiency in GPU programming using CUDA and familiarity with CUDA libraries (cuBLAS, cuDNN, etc.).
- Proven track record of leading successful HPC projects.
- Proven technical expertise in HPC architectures and technologies.
Preferred Qualifications
- PhD in Computer Science, Computer Engineering, or relevant technical field.
- Experience developing AI algorithms or AI-System infrastructure in C/C++ or Python.
- Experience developing AI Compiler (TorchInductor in PyTorch 2.0).
Start preparing
Learn about how to prepare for your interview with our interview guide, tips, and interactive experiences.
Visit interview prep
Created: 2024-05-04
Reference: 1517576482367228
Country: United States
State: Washington
City: Bellevue
ZIP: 98004
Similar jobs:
-
Sr. Systems Development Engineer (AWS Generative AI \u0026 ML Servers), AWS HW Engineering
Amazon in Seattle, Washington💸 $121000 per year -
Research Scientist, Systems ML - SW/HW Co-design (University Grad)
Meta in Bellevue, Washington -
Software Engineer, Systems ML - Frameworks / Compilers / Kernels
Meta in Bellevue, Washington -
Software Engineer, Systems ML - Frameworks / Compilers / Kernels
Meta in Bellevue, Washington -
Software Engineer, Systems ML - SW/HW Co-design
Meta in Bellevue, Washington -
AIML - Software Development Engineer in Test, ML Systems Evaluation
Apple in Seattle, Washington -
Software Engineer, Systems ML - Frameworks / Compilers / Kernels
Meta in Bellevue, Washington -
Software Engineer, Systems ML - HPC Specialist
Meta in Bellevue, Washington -
AIML - ML Engineer, Siri Multi-modal Systems
Apple in Seattle, Washington