Senior Site Reliability Engineer II

Alpharetta, Georgia


Employer: LexisNexis Risk Solutions
Industry: Technology
Salary: Competitive
Job type: Full-Time

Senior Site Reliability Engineer II

Location: The position is open to applicants based in Alpharetta, Ga United States, or Home-Based/Fully Remote in the United States.

LexisNexis Risk Solutions is seeking a Senior Site Reliability Engineer II with proven industry experience to join our global engineering team.

Our teams are collaborative and forward-thinking; the successful candidate will help shape the operations and support for critical applications, customers, and projects, working closely with Development, QA, IT Operations and Customer Operations teams. You will be required to communicate and solve problems effectively, whilst handling a fast-paced working environment.

This will be combined with a transition to DevOps practices, agile support, and deployment processes. You will be a leading member of our team working with a diverse range of technologies. You will enjoy working in a friendly environment and benefit from our investment in staff. The role also requires On-Call rotation for off peak hours to maintain 24/7, 365 system availability.

Role Definition

This is an advanced professional level role for an SRE. Individuals may be responsible for one or more complex reliability and toil reduction projects. At this level, SREs operate as a subject matter expert in the discipline and will provide guidance to others including product and development teams to define and improve reliability within a product group. Senior SRE II requires a deep understanding of system and application code and will make data-driven recommendations which balance customer, development, and operational needs. They are champions for shared services, platforms, and architectural standards. Individuals in this role train and/or mentor junior staff.

Scope and Key Responsibilities
  • Recommends service level objectives in partnership Product and Dev teams
  • Master in observability tools and techniques
  • Acts as an escalation during incidents
  • Collaborates with dev to troubleshoot systems and app performance issues
  • Improves the SRE framework
  • Champions shared services and platforms to drive reliability
  • Can create disaster recovery plans including advanced fault injection
  • Advises on SRE training curriculum and content
  • Delivery of resilient application stacks via "Infrastructure as Code" and other DevOps practices
  • Monitoring and on-going support of critical, high revenue business applications
  • Diagnosis and resolution of complex system and application issues
  • Working with diverse technical and non-technical teams, including Development, QA, IT Operations, Customer Operations and Project Management teams
  • Write and maintain systems / application documentation for technical and non-technical
  • Migration of existing applications to Cloud environments

Essential Skills and Attributes
  • Professional experience of working within the public cloud - AWS, Azure
  • Use of orchestration tools such as Terraform, CloudFormation
  • Continuous Integration/Delivery Tools such as - GitLab, GitHub, Jenkins
  • Coding and scripting experience such as - PowerShell, Bash, Python or equivalent
  • Configuration management tools such as - Ansible, Puppet, Chef, or equivalents
  • Hands-on experience of Windows and Linux servers, including support and troubleshooting.
  • Previous analytic and troubleshooting experience is required
  • Cloud architecture and system design to solve key business problems and facilitate team goals.
  • Experience migrating application from on-premises to public cloud.

Desirable Skills
  • Experience working with containerized workloads such as Docker and Kubernetes.
  • System and application monitoring such as - Prometheus, Grafana, CloudWatch
  • Familiarity with Log Management tools such as - Elastic Stack, Graylog or Splunk
  • Experience working with relational databases such as MySQL, MS SQL Server or similar
  • Use of Secret Management services such as - Hashicorp Vault
  • Knowledge of change control and associated procedures.
  • Hands-on experience performing application static/dynamic security and penetration assessment with tools such as - SonarQube, CheckMarx, AppScan, BurpSuite, OWASP ZAP Proxy, WebInspect, Fortify, Veracode, Nessus etc.
  • Familiarity with different types of security vulnerabilities and tools for countermeasure
  • Experience with any high-level programming language.


Technical Skills
  • Observability
    • Has a deep technical understanding of observability techniques across the full stack and can bring clarity to complex incidents or performance issues.
    • Able to create templated observability dashboards and configuration using code so that others can implement quickly for their products.
    • Can influence the setting of appropriate SLOs and Error Budgets.
  • Incident Management:
    • Can prepare and assess other SREs for on-call readiness. Can act as a mentor and identifies what training is required.
    • Can conduct a post-mortem so that participants feel safe to contribute. Encourages a culture of learning from failure and shares post-mortem learning with a wider audience.
    • Can act as a senior on-call escalation point for SREs and engineers and provide guidance on restoration.
    • Can work with SRE and Development leads in identifying and rectifying the cause of excessive alerting and pager load caused by bugs, alerting or human processes.
  • Design for Reliability:
    • Has an advanced understanding of systems design including high availability, software deployment and recovery techniques. Advanced understanding of failure modes and systems behaviors.
    • Specializes in one of more technical areas like denial-of-service protection or containerization. Can provide consulting to others on some specialist topics.
    • Can make decisions on resilience and recoverability of a system by through load testing data and results from fault injection experiment.
    • Can document good SRE design practices and contributes to the SRE Framework.
  • Disaster Recovery.
    • Able to contribute to DR plans including setting recovery priorities, procedures and helping others to carry out specific runbook.
    • Able to create and lead DR practice scenarios and incident response procedures. Skilled in assessing gaps in knowledge and recommendations to improve.
    • Able to plan fault injection (chaos engineering) scenarios to simulate faults on components of system (memory, network, IO) or distributed system
  • Platforms and Automation:
    • Seeks engineering and architecture consensus for new standard components and services for inclusion into the Paved Road, including Platforms, CI/CD.
    • Promotes the adoption and standardization, and broader contributions via inner sourcing. Has a good understanding of how it benefits the SDLC.
    • Champions the use of Paved Road and drives efficiency by removing rework and silos across technology.
  • Reliability Culture:
    • Creates and shares SRE good practices and training material within the team.
    • Can coach other SREs on specific practices. Able to identify gaps in knowledge and provide guidance for personal development.
    • Can make recommendations to eliminate larger toil projects.


Behavioral Competencies
  • Collaboration and Teamwork
  • Customer & External Focus
  • Solves Problems and Analyses Issues
  • Learning Agility
  • Builds Relationships
  • Develops Others


What is it like to work here?

Outstanding - you have probably already got a feel for what we do and the technology we are involved with but what is really stands us out from the crowd is our culture. We are an agile, dynamic, and forward-thinking organization who understands the importance of looking after our staff. We pride ourselves on delivering high-quality products, providing our employees with interesting challenges for their personal and career development whilst also striking the right balance between work and family life.

Why Work for LexisNexis Risk Solutions (RSG)

Explore our passion for discovery.

Global companies and governmental entities rely on us to solve their most complex data challenges. Our employees collaborate to reduce risks and create opportunities for customers in more than 100 countries. We are adaptable, curious, and ambitious. That is why here, you will have the freedom to drive change, the trust to find your own path, and the space to explore more.

Women in technology:

LexisNexis Risk Solutions Group (RSG) is very supportive of women in Technology and has been a founding signature for the Tech Talent Charter.

Currently, 27% of our Technology workforce are women which is much higher than the UK average of 17%. We have the following initiatives in place to support women in technology:
  • Mentoring scheme for women in technology
  • Women's network forum
  • Regularly run events for schoolgirls about careers in technology to inspire the next generation of girls in tech.


About LexisNexis® Risk Solutions Group

LexisNexis® Risk Solutions Group is a portfolio of brands that span multiple industries providing customers with innovative technologies, information-based analytics and decision tools and data services that help businesses and governmental entities reduce risk and improve decisions to benefit people around the globe. Headquartered in metro Atlanta, Georgia, we have offices throughout the world and are part of RELX (LSE: REL/NYSE: RELX), a global provider of information and analytics for professional and business customers across industries.

At Lexis Nexis Risk Solutions Group having diverse employees with different perspectives is key to creating innovative new products for our global customers. We have 35 diversity employee networks globally and prioritize ensuring inclusive leadership is part of our culture. Our aim is for every employee to be the best version of themselves. We would actively welcome applications from candidates of diverse backgrounds and underrepresented groups.

We encourage applicants and employees to tell us about any health issues they may have to allow us to put in place reasonable adjustments that may support applicants in the application process and support employees to succeed in their role.

Please read our Candidate Privacy Policy

Created: 2024-07-05
Reference: R79489
Country: United States
State: Georgia
City: Alpharetta


Similar jobs: