Sr Cloud Site Reliability Engineer, IS&T Ai & Data Platforms
Sunnyvale, California
Summary
Apple's Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business. We use the latest in open source technology and as committers on some of these projects, our team looks to push the envelope!
Working with multiple lines of business, we handle many streams of Apple-scale data. We bring it all together and unleash business value. We do all this with an outstanding group of software engineers, data scientists, SRE/MLOps engineers and managers. We are looking for a talented and dedicated engineers to join our team to bring passion for infrastructure and distributed systems, to build world-class platforms/products at a very large scale across cloud environments.
Key Qualifications
8+ years of experience in SRE/MLOps.
Experience operating and maintaining production systems in linux and public cloud infrastructure providers like AWS (EC2, EBS, S3, ElasticIP, Route 53, IAM).
Experience in cloud native orchestration systems like Kubernetes & enabling AutoScaling for both VM & Containerized workloads.
Strong proficiency with Helm and Kustomize for managing Kubernetes applications and configurations.
Possess good working knowledge of load balancers, firewalls, TCP/IP networking architecture and core technologies (http, dns, routing, etc).
Usage of configuration management tools: Ansible/Puppet/Chef/Saltstack.
Experience in GitOps or CICD tools: Spinnaker/Jenkins/Flux/ArgoCD.
Strong programming skills in Unix & Python/Java.
Experience with capacity planning, utilization reviews and performance tunings.
Should have critical thinking, good debugging and problem solving skills.
Experience in implementing, managing and refining business continuity solutions.
Description
Join Apple's Applied Machine Learning Team, as a Senior Software Engineer, to build & support innovative software applications. Candidates should have strong background in setting up and supporting the infrastructure for large scale big data applications in public cloud like AWS.
RESPONSIBILITIES:
- Focus on automation and providing insight for the Infrastructure service reliability and availability through extensible services & platforms.
- Design, implement and maintain software & tools for large-scale distributed systems especially Big Data stack of technologies like Iceberg, S3, HDFS, Hive, Ranger.
- Experience in operating and deploying container orchestration systems like Kubernetes &/ YARN.
- Utilize core computer science data structures, algorithms, and software tools in one of the languages - Python, Golang, Java or other JVM languages.
- Experience in managing data pipelines using Kafka, Flink, Spark, Airflow & Jupyter.
- Work with platform tools and automation systems including deployment automation practices especially across multi-AZ or DC infrastructure using CM tools like Saltstack, Ansible, Terraform, etc.
- Plan, design & implement business continuity, capacity management & observability across all services & levels of the stack.
- Build & Support CI/CD tools to port & manage applications on AWS & Kubernetes
- Build automation to enable self-healing systems.
- Trace SLIs for meeting the agreed upon SLAs.
- Ensure compliance with appropriate security standards.
- Deploy and debug systems built for horizontally scalable multi-tenant deployments.
- Solve and find workarounds for issues in customer-impacting, production systems.
- The candidate is expected to be self-motivated, proactive, and a solution-oriented individual.
Education & Experience
BS in computer science with 7-10 years or MS plus 5-7 years experience or related experience.
Additional Requirements
Apple's Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business. We use the latest in open source technology and as committers on some of these projects, our team looks to push the envelope!
Working with multiple lines of business, we handle many streams of Apple-scale data. We bring it all together and unleash business value. We do all this with an outstanding group of software engineers, data scientists, SRE/MLOps engineers and managers. We are looking for a talented and dedicated engineers to join our team to bring passion for infrastructure and distributed systems, to build world-class platforms/products at a very large scale across cloud environments.
Key Qualifications
8+ years of experience in SRE/MLOps.
Experience operating and maintaining production systems in linux and public cloud infrastructure providers like AWS (EC2, EBS, S3, ElasticIP, Route 53, IAM).
Experience in cloud native orchestration systems like Kubernetes & enabling AutoScaling for both VM & Containerized workloads.
Strong proficiency with Helm and Kustomize for managing Kubernetes applications and configurations.
Possess good working knowledge of load balancers, firewalls, TCP/IP networking architecture and core technologies (http, dns, routing, etc).
Usage of configuration management tools: Ansible/Puppet/Chef/Saltstack.
Experience in GitOps or CICD tools: Spinnaker/Jenkins/Flux/ArgoCD.
Strong programming skills in Unix & Python/Java.
Experience with capacity planning, utilization reviews and performance tunings.
Should have critical thinking, good debugging and problem solving skills.
Experience in implementing, managing and refining business continuity solutions.
Description
Join Apple's Applied Machine Learning Team, as a Senior Software Engineer, to build & support innovative software applications. Candidates should have strong background in setting up and supporting the infrastructure for large scale big data applications in public cloud like AWS.
RESPONSIBILITIES:
- Focus on automation and providing insight for the Infrastructure service reliability and availability through extensible services & platforms.
- Design, implement and maintain software & tools for large-scale distributed systems especially Big Data stack of technologies like Iceberg, S3, HDFS, Hive, Ranger.
- Experience in operating and deploying container orchestration systems like Kubernetes &/ YARN.
- Utilize core computer science data structures, algorithms, and software tools in one of the languages - Python, Golang, Java or other JVM languages.
- Experience in managing data pipelines using Kafka, Flink, Spark, Airflow & Jupyter.
- Work with platform tools and automation systems including deployment automation practices especially across multi-AZ or DC infrastructure using CM tools like Saltstack, Ansible, Terraform, etc.
- Plan, design & implement business continuity, capacity management & observability across all services & levels of the stack.
- Build & Support CI/CD tools to port & manage applications on AWS & Kubernetes
- Build automation to enable self-healing systems.
- Trace SLIs for meeting the agreed upon SLAs.
- Ensure compliance with appropriate security standards.
- Deploy and debug systems built for horizontally scalable multi-tenant deployments.
- Solve and find workarounds for issues in customer-impacting, production systems.
- The candidate is expected to be self-motivated, proactive, and a solution-oriented individual.
Education & Experience
BS in computer science with 7-10 years or MS plus 5-7 years experience or related experience.
Additional Requirements
- - Work closely with multiple cross functional teams to effectively co-ordinate and manage business user expectations.
- - Leadership, critical thinking and excellent verbal and written communication skills
- - Working on creating new utilities for operational efficiency.
Created: 2024-06-29
Reference: 200515360
Country: United States
State: California
City: Sunnyvale
ZIP: 95002
About Apple
Founded in: 1976
Number of Employees: 154000
Website: https://www.apple.com/
Career site: https://www.apple.com/careers/us/
Wikipedia: https://en.wikipedia.org/wiki/Apple_Inc.
Instagram: https://www.instagram.com/apple/
LinkedIn: https://www.linkedin.com/company/apple
Similar jobs:
-
Data Scientist - Ecosystem and Learning Platform
Roblox in San Mateo, California -
Tech Lead, Machine Learning Engineer-TikTok Multimedia, Data Platform
TikTok in San Jose, California -
Sr Engineering Program Manager, Ai & Data Platforms
Apple in Sunnyvale, California -
Staff, Data Scientist - Spark Driver Platform | Su...
Walmart in SUNNYVALE, California💸 $143000.00 per year -
GenAI Applications & Tools SW Developer, IS&T Ai & Data Platforms
Apple in Sunnyvale, California -
AIML - Senior Data Infrastructure Software Engineer, Machine Learning Platform and Technology
Apple in Santa Clara Valley (Cupertino), California -
Analytic Product Operation, Data Platform
TikTok in San Jose, California -
Senior Software Engineer - Data Development Platform
TikTok in San Jose, California -
DevOps Engineer (Big Data Platform), Applied Machine Learning
Apple in Sunnyvale, California -
Cloud Data Platform Engineer
Apple in Santa Clara Valley (Cupertino), California -
Software Engineer, IS&T Ai & Data Platforms
Apple in Sunnyvale, California -
Tech Lead, TikTok MultiMedia Data Platform
TikTok in San Jose, California -
Senior Data Engineer Cloud Platform
West Advanced Technologies (WATI) in Downey, California -
Backend Software Engineer, TikTok MultiMedia Data Platform
TikTok in San Jose, California -
Staff Software Engineer, Big Data Query Platform
Pinterest in San Francisco, California -
Sr Software Engineer, IS&T Ai & Data Platforms
Apple in Sunnyvale, California -
Senior Manager, Data Site Reliability Engineering, Ad Platforms
Apple in Cupertino, California -
Senior Engineering Manager, Apple Data Platform
Apple in Cupertino, California -
Sr. Distributed Database Engineer - Data Platform (ASE)
Apple in Cupertino, California -
Data Analyst, Video Platform - USDS
TikTok in Los Angeles, California