Tech lead Site Reliability Engineer, Edge - USDS

Mountain View, California


Employer: TikTok
Industry: R&D
Salary: Competitive
Job type: Full-Time

Responsibilities

TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ("USDS") is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep U.S. users safe. Our focus is on providing oversight and protection of the TikTok platform and U.S. user data, so millions of Americans can continue turning to TikTok to learn something new, earn a living, express themselves creatively, or be entertained. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more.

Why Join Us
Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.
Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.
To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.
Join us.

Site Reliability Engineering combines software and system engineering with system operations to build and run large-scale, massively distributed infrastructure. Our Edge SREs ensure infrastructure services are reliable, fault-tolerant, efficiently scalable and cost-effective. We dive deep into the stack, including network, hardware, OS, and applications, to quickly resolve complex functional and performance issues.

As an Edge Site Reliability Engineer, you will have the opportunity to manage a variety of complex systems at scale, including systems that administer hyperscale datacenters and public cloud, global content distribution networks (CDNs) and load balancers that handle Tbps of traffic. You will also have the opportunity to collaborate with various teams to translate business needs into concrete action items, and improvements in system design or procedures.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

Responsibilities:
- Build data pipelines, tools, automations, visualizations and monitors to facilitate the operation and optimization of edge services.
- Data monitoring and alerting, data quality assurance and anomaly detection.
- Document team processes and policies, including methods of engagement and SLOs.
- Analyze, design and implement solutions at the system level to remove bottlenecks and improve edge service performance.
- Implement monitoring and alerting to improve issue detection and response.
- Work in a fast-paced environment. Participate in technical operations and rotations in response to performance and reliability issues.

Qualifications

Qualifications:
- Master's degree (or Bachelor's degree with 2+) years of experience in Computer Engineering, Electrical Engineering, Computer Science or related major.
- 3+ years experience working with Unix Linux systems from kernel to shell and beyond with experience working with system libraries, file systems, and client-server protocols.
- 2+ years experience in one or more programming languages such as Java, C++, Go, or scripting experience in Shell and Python.
- Strong analytical skills and the ability to solve real world problems in a fast moving environment.
- Experience in designing, analyzing and building automation and tools for large scale systems.

Preferred qualifications:
- Experience with the Hadoop ecosystem - HDFS, Yarn, Spark, etc.
- Self-driven and capable of working with ambiguity and moving projects from concept to delivery.
- Experience in building solutions with AWS, Google, Azures and other cloud services.
- Experience in networking technologies such TCP/IP, BGP, DNS, etc. in a carrier-grade environment.
- Experience in developing and operating one or more of following systems: OpenStack, Kubernetes, Nginx, ipvs, ELK stack, Hadoop, etc.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at https://shorturl.at/ktJP6

This role requires the ability to work with and support systems designed to protect sensitive data and information. As such, this role will be subject to strict national security-related screening.

Created: 2024-06-21
Reference: A104498
Country: United States
State: California
City: Mountain View


Similar jobs: