Subject Matter Expert\/Senior Site Reliability Engineer

Reston, Virginia


Employer: Compunnel
Industry: 
Salary: Competitive
Job type: Part-Time

Top skills:

Expertise SRE Concepts.

Be able to message correctly with - Enterprise architect, Portfolio managers, Business Leaders.

Deep dive on operations excellence, observability, interoperability, Dynatrace, splunk, OpenTelemetry.

Be able to run running training for large audience.

 

Level 2 of SRE

Drive chaos engineering - used to use gremlin – now AWS FIS (Fault injection simulator).

Resiliency.

AWS Resiliency Hub.

 

Education/Experience:

AWS technologies

ECS, EC2, Lambda, Step Functions, EMR, Glue, S3, RDS, DynamoDB

Configuring Health Checks and Implementing Alarms in AWS

Identify CloudWatch metrics for the AWS services - EC2, ECS, EMR, Glue-ETL, S3, RDS, Lambda, DynamoDB, Redshift

Be able to classify the critical metrics and create CW Alarms using the AWS Console and also using Terraform templates.

Splunk, Dynatrace 

Create and manipulate dashboards / metrics.

Disaster Recovery / Failover Scenarios.

Understand how to make an environment resilient and highly available.

Influencing:

Candidate should be able to convince internal customers (AppDev teams) benefits of shifting left, leveraging SRE model for performance, availability, resiliency, etc. 

Ability to clearly explain these things 

Confidence and communication must have skills.

Understand how to implement KPI’s: 

MTTR (Mean Time to Resolve) and MTTD (Mean Time to Detect), SLO, SLI, Error Budgets.

Highly desired skills:

Disaster Recovery / Failover Experience – Load Balancing, Resiliency, High Availability.

Education: Bachelors Degree

Additional client information:

Created: 2024-05-14
Reference: PANDC4905646
Country: United States
State: Virginia
City: Reston


Similar jobs: