Principal Site Reliability Engineer

Redmond, Washington


Employer: Microsoft
Industry: Software Engineering
Salary: $137600 per year
Job type: Full-Time

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further . This is a world of more possibilities , more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.

Microsoft's Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products in our portfolio include Azure PostgreSQL, Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is tobuild the data platform for the age of AI, powering a new class of data-first applications and driving a data culture.

Within Azure Data, the databases team builds and maintains Microsoft's operational Database systems. We store and manage data in a structured way to enable multitude of applications across various industries. We are on a journey to enable developer friendly, mission-critical, AI enabled operational Databases across relational, non-relational and OSS offerings.

Running software as a service means more than just developing and releasing features. Ensuring reliability and serviceability is critical part of software cycle. This is where you come into the picture. As a Principal Site Reliability Engineer , you will ensure the service of Azure PostgreSQL runs smoothly with required reliability and availability. You will design and implement software to automatically detect and resolve live site production issues. You will work closely with feature teams to design, implement and release features that are reliable and serviceable.

We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served .

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities:

  • Demonstrates end-to-end expertise in distributed systems design, interactions between cloud technology layers and components, functions of physical network devices, and complex dependencies at scale. Defines recommended configurations for cloud technology solutions and develops or modifies the code base for infrastructures to optimize reliability, operability, and cost of supported products across organizations. Leverages technical expertise to define solutions that improve the impact of product engineering and Site Reliability Engineering (SRE) teams across organizations.
  • Develops, reviews, and provides feedback on code, scripts, systems, and/or platforms that automate complex but repetitive operations processes (e.g., monitoring, alerting, deployment, debugging, remediation, resource management) that can be applied across products at scale; identifies new opportunities to implement automation solutions and advocates for their adoption across organizations to improve reliability and efficiency in meeting customer/business requirements.
  • Drives efforts to optimally leverage tooling and automation to reliably and safely implement changes in production, increasing the velocity at which teams across organizations are able to release current and future products or changes at scale. Ensures that platform or system dependencies within and across products are accounted for, and advises engineering teams across organizations on change management strategies that utilize tooling and automation.
  • Analyzes data from telemetry pipelines and monitoring tools that detail operations metrics (e.g., availability, reliability, performance, efficiency) of systems, platforms, or products operating at scale within an organization. Advises Site Reliability Engineering (SRE) teams on optimal analysis strategies that utilize updated tooling and/or predictive models to identify and test proposed improvements in product development and operations, and ensures that impacts on operations metrics (e.g., Time-to-X) and Key Performance Indicators (KPIs) are monitored across organizations.
  • Shares and defines insights, best practices, and standards that can be applied to improve system, platform, and/or product development and operations across organizations. Continues to develop thought leadership, insights, and best practices through interactions with industry, cross-functional teams, and business leaders. Mentors and coaches engineers to help them identify and propose solutions that will have a positive customer/business impact.
  • Interact with customers at executive level as result of escalation from support or service outages.
  • Participate in 12x7 live site on-call rotations periodically.
Embody our culture and values

Qualifications:

Required /Minimum Qualifications
  • 8+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
    • OR Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

Microsoft Cloud Background Check:
  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred/Additional Qualifications
  • Bachelor's degree in Computer Science or Engineering or Mathematics or Physics or IT technical discipline
  • 10+ years of programming experience in C#, C OR C++ OR object-oriented programming
  • 4+ years of experience in developing and shipping features in an enterprise production backend server system.
  • Experience building Distributed Systems with guarantees.
  • Experience with internals of PostgreSQL or similar Enterprise Database Management system.
  • Understanding of data structures, algorithms, and distributed systems.
  • Solve problems by always leading with deep passion and empathy for customers.
  • Have a deep desire to work collaboratively, solve problems with groups, find win/win solutions and celebrate successes.
Site Reliability Engineering IC5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft will accept applications for the role until August 29, 2024.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form .

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.

#azdat

#azuredata

#azdat #ENGGJOBS

Created: 2024-08-22
Reference: 1754113
Country: United States
State: Washington
City: Redmond


Similar jobs: