Manager, Site Reliability Engineering

New York, New York


Employer: Insight Global
Industry: Computer Engineering
Salary: Competitive
Job type: Full-Time

Insight global is searching for a Manager of SRE. In this position, you will lead a team of advanced Site Reliability Engineers responsible for the design, deployment, and maintenance of our production systems. You will play an important role in ensuring the reliability, scalability, and performance of our infrastructure, as well as driving continuous improvement initiatives. Monitor system performance, troubleshoot issues, and ensure timely incident response, root cause analysis, and problem resolution. Monitor system performance, troubleshoot issues, and ensure timely incident response, root cause analysis, and problem resolution. Your expertise in SRE practices and experience with the listed technologies will enable you to effectively guide the team towards achieving operational excellence. The Site Reliability Engineering team engages directly with our other engineering teams to onboard them onto our platform systems, reviewing and recommending design and architectural decisions, and guiding our engineering teams on how to implement the tooling provided by the larger Platform organization required to ensure systems can scale and react to changing conditions, with continuous improvement loops.

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com .

To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/ .

Required Skills & Experience

10 + years of SRE experience with 3 years of experience leading and managing production teams and systems
Expertise in Ansible, Concourse CI, Jenkins, Github Actions, EKS (Kubernetes), Linux Administration, terraform.
Understanding of SRE principles, including reliability, scalability, availability, and performance.
Proficient in scripting and automation (e.g., Python, Bash and GO).
Experience with infrastructure-as-code (IaC) tools, configuration management, and CI/CD pipelines.
Knowledge of cloud platforms (e.g., AWS, Azure, or Google Cloud) and containerization technologies (e.g., Docker).
Excellent problem-solving skills and the ability to thrive in a fast-paced, dynamic environment.
Strong communication and leadership skills, with the ability to collaborate effectively with both technical and non-technical stakeholders.

Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.

Created: 2024-10-14
Reference: 386063
Country: United States
State: New York
City: New York
ZIP: 10036


Similar jobs: