Research Engineer, Large Model System

Seattle, Washington


Employer: TikTok
Industry: R&D
Salary: Competitive
Job type: Full-Time

Responsibilities

TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.

Why Join Us
At TikTok, our people are humble, intelligent, compassionate and creative. We create to inspire - for you, for us, and for more than 1 billion users on our platform. We lead with curiosity and aim for the highest, never shying away from taking calculated risks and embracing ambiguity as it comes. Here, the opportunities are limitless for those who dare to pursue bold ideas that exist just beyond the boundary of possibility. Join us and make impact happen with a career at TikTok.

The Applied Machine Learning - Machine Learning Systems team provides E2E machine learning experience and machine learning resources for the company. The team builds heterogeneous ML training and inference systems based on GPU and advanced chip technology and advances the state-of-the-art of ML systems technology to accelerate models such as stable diffusion and language modeling. The team is also responsible for the research and development of hardware acceleration technologies for cloud computing, via technologies such as distributed systems, compilers, HPC, and RDMA networking. The team is reinventing the ML infra for large-scale language models.

Job Description
Responsible for the machine learning system development of the company's large-scale models, researching new applications and solutions of related technologies in areas such as search, recommendation, advertising, content creation, conversation, and customer service, meeting the growing demand for intelligent interaction from users, and comprehensively improving users' lifestyles and communication methods in the future world.
The main work directions include:
1. Responsible for the design and development of the architecture of large-scale machine learning systems, solving technical difficulties such as high concurrency, high reliability, and high scalability of the system.
2. Covering various sub-directions of machine learning system, including resource scheduling, model training, model inference, data management, and workflow orchestration.
3. Responsible for the research and introduction of advanced technologies in machine learning systems, such as the latest hardware architecture, heterogeneous computing systems, and compiler-based optimization technologies.
4. Working closely with the algorithm teams to optimize the algorithm and system jointly.

Qualifications

1. Excellent coding ability, solid foundation in data structures and basic algorithms, proficient in C/C++ or Python, winners of ACM/ICPC, NOI/IOI and other competitions are preferred.
2. Familiar with at least one mainstream machine learning framework (TensorFlow/PyTorch).
3. Master the principles of distributed systems, and participated in the design, development, and maintenance of large-scale distributed systems.
4. Strong sense of responsibility, good learning ability, communication ability, and self-motivation.
5. Good communication and collaboration skills, able to explore new technologies with the team and promote technological progress.
6. The following experiences will be a big plus:
- Prior experience in large-scale projects or papers with great influence in the field of large models.
- Familiar with NLP, CV-related algorithms, and technologies, and experienced in large model training and RL algorithms.
- Experience in one of the following fields: CUDA, RDMA, AI Infrastructure, HW/SW Co-Design, High-Performance Computing, ML Hardware Architecture (GPU, Accelerators, Networking), ML for System, and Distributed Storage.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at joy.miao@tiktok.com

Created: 2024-06-29
Reference: A134575
Country: United States
State: Washington
City: Seattle
ZIP: 98109