Site Reliability Engineer
Cupertino, California
Summary
The Apple Information Apps Engineering teams power some of the most widely used Apple applications, such as Apple News, Stocks, Weather, and Books. We do this at a massive, global scale. We meet our high expectations through dedication to best practices, which enables us to deliver a vast array of information that people worldwide use daily in over 150 countries.
We are seeking an experienced and dynamic Site Reliability Engineer (SRE) Operator to join our team in maintaining the reliability, availability, and performance of our systems. The ideal candidate will have a strong background in production monitoring, a deep understanding of development and operations, and a proven track record in managing large-scale production environments.
Key Qualifications
At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role.
Linux expertise.
Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
Experience in solving and resolving issues in Kubernetes from both an operating system and application perspective.
Hands on scripting with Python.
Building and operating container orchestrating systems like Kubernetes or EKS.
Designing, building and maintaining infrastructure with a cloud provider such as AWS.
Automation advocate - prior history of removing operational toil via software.
Both a strong sense of ownership as well as team comradery with clear and transparent communication abilities.
Self motivated, inquisitive and always looking to learn more.
Nice to have:
Networking , TCP/IP network fundamentals and basic troubleshooting.
Disaster recovery and capacity planning.
Deployment automation via Terraform or CloudFormation.
Systems built upon open source storage and search technologies including Cassandra, Kafka, Solr, Postgres and Redis.
Description
Our team is highly collaborative, working closely with partner teams to deliver the best results for Apple. We strive to find the best solution while also considering the need to get things done efficiently for each engineering challenge we face. Good ideas are valued and rewarded.
As an SRE at Apple, you will..
Operate, monitor, and triage all aspects of our production and non-production environments.
Pioneer and implement the next generation telemetry system for Apple News, Stocks, Weather and Books.
Prepare alert handling procedures, runbooks, and collaborate with our off-shore SRE team.
Automate deployment and orchestration of services into the cloud environment as well as other routine processes.
Actively participate in capacity planning and disaster recovery exercises.
Interact with and support partner teams including engineering, SRE, QA, and project management. Create self-service solutions for them.
Cultivate and maintain relationships with internal and external third party vendors.
Education & Experience
Bachelor of Science in Computer Science or other related discipline.
The Apple Information Apps Engineering teams power some of the most widely used Apple applications, such as Apple News, Stocks, Weather, and Books. We do this at a massive, global scale. We meet our high expectations through dedication to best practices, which enables us to deliver a vast array of information that people worldwide use daily in over 150 countries.
We are seeking an experienced and dynamic Site Reliability Engineer (SRE) Operator to join our team in maintaining the reliability, availability, and performance of our systems. The ideal candidate will have a strong background in production monitoring, a deep understanding of development and operations, and a proven track record in managing large-scale production environments.
Key Qualifications
At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role.
Linux expertise.
Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
Experience in solving and resolving issues in Kubernetes from both an operating system and application perspective.
Hands on scripting with Python.
Building and operating container orchestrating systems like Kubernetes or EKS.
Designing, building and maintaining infrastructure with a cloud provider such as AWS.
Automation advocate - prior history of removing operational toil via software.
Both a strong sense of ownership as well as team comradery with clear and transparent communication abilities.
Self motivated, inquisitive and always looking to learn more.
Nice to have:
Networking , TCP/IP network fundamentals and basic troubleshooting.
Disaster recovery and capacity planning.
Deployment automation via Terraform or CloudFormation.
Systems built upon open source storage and search technologies including Cassandra, Kafka, Solr, Postgres and Redis.
Description
Our team is highly collaborative, working closely with partner teams to deliver the best results for Apple. We strive to find the best solution while also considering the need to get things done efficiently for each engineering challenge we face. Good ideas are valued and rewarded.
As an SRE at Apple, you will..
Operate, monitor, and triage all aspects of our production and non-production environments.
Pioneer and implement the next generation telemetry system for Apple News, Stocks, Weather and Books.
Prepare alert handling procedures, runbooks, and collaborate with our off-shore SRE team.
Automate deployment and orchestration of services into the cloud environment as well as other routine processes.
Actively participate in capacity planning and disaster recovery exercises.
Interact with and support partner teams including engineering, SRE, QA, and project management. Create self-service solutions for them.
Cultivate and maintain relationships with internal and external third party vendors.
Education & Experience
Bachelor of Science in Computer Science or other related discipline.
Created: 2024-05-06
Reference: 200545395
Country: United States
State: California
City: Cupertino
About Apple
Founded in: 1976
Number of Employees: 154000
Website: https://www.apple.com/
Career site: https://www.apple.com/careers/us/
Wikipedia: https://en.wikipedia.org/wiki/Apple_Inc.
Instagram: https://www.instagram.com/apple/
LinkedIn: https://www.linkedin.com/company/apple
Similar jobs:
-
Software Development Engineer in Test -II, WWGST Quality Reliability Engineering
Amazon in Irvine, California💸 $115000 per year -
Service Reliability Engineer - iCloud
Apple in Santa Clara Valley (Cupertino), California -
Senior Site Reliability Engineer (SRE) - ASE / iCloud
Apple in Cupertino, California -
Hardware Reliability Engineer, Product Integrity
Google in Mountain View, California -
Site Reliability Engineer
AEG in San Francisco, California -
Site Reliability Engineer - AML Global Recommendation - USDS
TikTok in Mountain View, California -
Senior Site Reliability Engineer, TikTok Server Architecture
TikTok in San Jose, California -
Sr Reliability Engineer
Valero Energy in Benicia, California💸 $123520 - $169840 per year -
Sr Manager, Quality Engineering (Reliability Engineering)
Chipotle in Newport Beach, California -
Site Reliability Engineer, Recommendation Infrastructure - USDS
TikTok in Los Angeles, California -
AIML - Sr Engineering Manager, Siri Performance and Reliability
Apple in Cupertino, California -
Senior Software Engineering Manager, Reliability Engineering
Roblox in San Mateo, California -
Cloud DevOps / Site Reliability Engineer, Applied Machine Learning
Apple in Sunnyvale, California -
AI Ops Site Reliability Engineer - Data Infrastructure
TikTok in San Jose, California -
Plant Reliability Engineer
Gables Search Group in San Francisco, California -
Site Reliability Engineer - Video Platform - USDS (LA)
TikTok in Los Angeles, California -
Senior Site Reliability Engineer, Datastores - ThousandEyes
Cisco Systems Inc in San Francisco, California -
Principal Site Reliability Operations Engineer
Roblox in San Mateo, California -
Senior Site Reliability Engineer, Insight BPR
Apple in Elk Grove, California -
Site Reliability Engineer, TikTok Server Architecture
TikTok in San Jose, California