Site Reliability Engineer

Austin, Texas


Employer: Compunnel
Industry: 
Salary: Competitive
Job type: Part-Time

Description:

The Client Site Reliability team is responsible for the operations and infrastructure of all consumer-facing production systems and developer-facing systems at Client Games, including NBA Client game services, customer-facing account services, and websites. This team handles systems and services spanning multiple datacenters both terrestrial and cloud-based. 

What We Need: 

We are looking for an expert engineer who is passionate about building multi-datacenter infrastructure and services. Robust systems and problem-solving skills are required as we develop solutions for game studios and support data centers around the world alongside a group of outstanding engineers. In this role, you will collaborate with network engineers, systems architects, and development staff to support our gamers and the needs of the business. 

What you will do  

What We Do 

Build and operate highly resilient systems in a multi-datacenter and cloud global environment serving game and consumer services 

Develop tools for the management and automation of the systems and service infrastructure 

Define and implement standards that will impact systems, services, and multiple software environments 

Diagnose and resolve technical issues from both internal and external customers and drive improvements to prevent them from recurring 

Participate in Site Reliability Engineering’s on-call rotation 

Who We Believe Will Be an Outstanding Fit 

You are eager to work in a fast-paced environment with other highly skilled engineers who are passionate about service availability and health!  

If the idea of building data center infrastructure services from greenfield to implementation moves you! 

Required Qualifications 

6+ years of demonstrated influence across one or more teams for large scale projects that drive impact and improvement across the organization 

6+ years of experience in an SRE role for online services in a multi-region, multi-cloud environment with specific experience in reliability and resiliency 

6+ years of developing tools for automation of processes or augmenting off the shelf tool functionality 

6+ years of AWS and/or GCP cloud experience running highly elastic mission critical workloads 

6+ years of coding experience in at least one or more of Python, Ruby, Java, or Go and a good understanding of code management 

6+ years of experience using Infrastructure as Code tools like Terraform, Pulumi, or others 

Extensive knowledge of software build, test, and deploy processes using Git, Jenkins, Puppet, Ansible, Docker/containers, and Kubernetes 

Experience with system analysis and troubleshooting 

Serve as a mentor to junior engineers and provide technical leadership to the organization. 

Bonus Points 

Prior hands-on experience running large scale multiplayer video games at scale 

Experience designing and crafting software for systems and network automation 

Debugging, code optimization, and routine task automation skills 

Demonstrated ability to decompose sophisticated problems. Ability to engage in lateral investigations. 

Must Haves:

3 to 5 years exp. Kubernetes, Data Dog, cloud services, large scale systems, AWS&GCP, minor Azure

GKE, home strung clusters on prem, and AKS (Very Small), EKS 

Consistent upgrades across all the clusters and clouds

Education: Bachelors Degree

Additional client information:

Created: 2024-04-23
Reference: PATDC4869938
Country: United States
State: Texas
City: Austin
ZIP: 78749


Similar jobs: