Site Reliability Engineer - Lead

[atlanta, Ga], Georgia


Employer: Saxon Global
Industry: 
Salary: Competitive
Job type: Part-Time

Job Description:

Synopsis of the role:

Seeking creative, high-energy, diverse and driven software engineers with hands-on development skills to work on a variety of meaningful projects. Our software engineering positions provide you the opportunity to join a team of talented engineers working with leading-edge technology. You are ideal for this position if you are a forward-thinking, committed, and enthusiastic software engineer who is passionate about technology.

What you'll do:

• Work with teams across an organization and ensures core services reliability and keep an eye on capacity and performance.

• Responsible for blameless postmortems and proactive identification of potential outages factor into iterative improvement.

• Work closely with development and operations teams to build highly available, cost effective systems with extremely high uptime metrics.

• Hands on experience Configuring and Administering SCM(GIT, SVN), Build (CMake, Make files, Maven), Nexus, CI(Jenkins), CD Automation Tools

• Responsible for establishing end-to-end monitoring and alerting on all critical aspects to ensure SLAs and get proactive notifications of possible issues for all systems.

• Work with cloud operations team to resolve trouble tickets, developing and running scripts, and troubleshooting.

• Participate in 24x7X365 an on-call support for multiple core platforms globally. Using a "Follow the Sun" model, we expect working patterns will include on call duty, weekend and holiday season cover.

• Participate in release cycles of our offerings, deploying code to integration, staging and production environments, integrating with continuous integration (CI) and continuous delivery (CD) tools, monitoring, and change management

• Build Automation Work with Agile development teams to ensure smooth promotion of code, configuration and Docker images to production

• Oversee and adapt monitoring and alerting systems. Interact with automated monitoring and healing infrastructure to ensure healthy environments

• Develop automation to auto-correct or completely prevent issues in our solutions

• Perform software updates, peer code reviews, testing, and Common Vulnerabilities and Exposures (CVE) analysis; respond to security threats

• Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions

• Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment

• Identify potential process improvements across the entire engineering organization

• Define and drive architectural enhancements into system to mitigate potential failure points

• Provide impact assessment and mitigation plan for changes going into the production environment

• Investigate root cause of severe and systemic outages, identify corrective actions

• As we transition to the Public cloud (Google or AWS), build new build and deployment patterns.

What experience you need:

• A minimum 10 years of experience as a Developer/Lead/Architect.

• Bachelor's Degree in Computer Science, Information Management or in "STEM" Majors

• Experience with configuring, customizing, and extending monitoring tools (Appdynamics, Apica, Sensu, Grafana, Prometheus, Graphite, Splunk, Zabbix, Nagios etc.)

• 10+ years' experience with all stages of an agile software development lifecycle (CI/CD) supporting Java/Javascript UI applications (ex: Angular JS) and SAAS applications.

• 5 years of experience building JavaEE applications using, build tools like Maven/ANT, Subversion, JIRA Jenkins, Bitbucket and Chef

• 8+ years' experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, Confluence, GIT-BitBucket, Maven, Gradle, RunDeck, is a plus)

• 3+ years' experience with configuration management and automation (Ansible, Puppet, Chef, Salt)

• 3+ years' experience deploying and managing infrastructure on public clouds (AWS, GCP, or Azure or Pivotal)

• 3+ years experience working on Kubernetes and other related applications.

• Experience working with Nginx, Tomcat, HAProxy, Redis, Elastic Search, MongoDB, and RabbitMQ, Kafka, Zookeeper.

• 3+ years' experience in Linux environments (CentOS).

• Knowledge of TCP/IP networking, load balancers, high availability architecture, zero downtime production deployments. Comfortable with network troubleshooting (tcpdump, routing, proxies, firewalls, load balancers, etc.)

• Demonstrated ability to script around repeatable tasks (Go, Ruby, Python, Bash)

• Experience with large scale cluster management systems (Mesos, Kubernetes)

• Experience with Docker-based containers is a plus

• Able to dive into any level of a modern internet service (schedulers, containers, Linux kernel, caching, object storage, distributed filesystems, RDBMS, NoSQL, etc.)

Required Skills : GCP, Kubernetes,Jenkins,cloud.
Basic Qualification : Looking for a Lead SRE.
Additional Skills : Looking for a Lead SRE.
Background Check :Yes
Drug Screen :Yes
Notes :
Selling points for candidate :
Project Verification Info :
Candidate must be your W2 Employee :Yes
Exclusive to Apex :No
Face to face interview required :No
Candidate must be local :Yes
Candidate must be authorized to work without sponsorship ::No
Interview times set : :No
Type of project :Development/Engineering
Master Job Title :Eng: Other
Branch Code :St. Louis

Created: 2024-04-30
Reference: SG - 78368
Country: United States
State: Georgia
City: [atlanta, Ga]


Similar jobs: