Distributed Database Engineer

Cupertino, California


Employer: Apple
Industry: Software and Services
Salary: Competitive
Job type: Full-Time

Summary
We're in search of exceptional software engineers who possess extensive expertise in building large-scale distributed databases, tackling challenges such as SQL optimization, vectorized engine execution, SIMD vectorization, and distributed query execution for Apache Spark. In this role, you will be at the forefront of pushing the boundaries in building a native SQL engine accelerator for Spark, operating at Apple's scale for big data analytics, and optimizing every aspect of performance.

We are particularly interested in engineers who have a deep understanding of systems like Spark, Trino, Presto, Arrow, or other relevant frameworks. We are also seeking individuals who are enthusiastic about the opportunity to collaborate closely with internal Apple teams as well as external communities.



Description
Our teams are tasked with providing a Data Platform as a service, utilizing technologies such as Spark, Trino, Iceberg, Druid, and Flink. This platform is made available to multiple internal departments within Apple and serves as the fundamental infrastructure for a wide range of data engineering and data science applications.

Improving the efficiency of Apache Spark would have significant benefits. It would enable faster delivery of data insights, leading to better business outcomes. Moreover, it would result in substantial cost reductions in compute expenses, considering the scale of Spark usage at Apple.

In our team, there are multiple individuals who play significant roles as project management committee (PMC) members and committers for projects like Spark, Flink, Arrow, Parquet, Iceberg, and other related initiatives that we actively utilize. We have been dedicated to enhancing these open-source projects through our upstream contributions. Notably, several team members received recognition through SIGMOD System Awards for their contributions to Spark in 2022 and Flink in 2023.

In order to enhance Spark's performance, we are developing a native engine for Apache Spark using Apache Arrow and Arrow DataFusion as the foundation. This engine is designed to replace Apache Spark's current execution engine, which is based on the JVM, with a new, vectorized engine written in a low-level system language. This implementation is already in use at Apple and has demonstrated significant enhancements in query performance and more efficient resource utilization.

You'll get the chance to collaborate with them in enhancing the native SQL engine and related technologies to a higher standard. This will involve working closely with both open-source communities and internal teams.

Here you'll find that we:

• Have a peaceful work environment where you can choose when to socialize and when to focus.
• Provide technical guidance and mentorship as needed
• Have an encouraging and growth-focused workplace culture.
• Encourage you to use accrued vacation time and maintain a healthy work-life balance.
• Offer excellent benefits!




Created: 2024-06-22
Reference: 200555667
Country: United States
State: California
City: Cupertino

About Apple

Founded in: 1976
Number of Employees: 154000


Similar jobs: