Job Description
Job Title: Site Reliability Engineer
Location : Wilmington, DE (Onsite)
Long Term Contract
Interview : Phone, Video and/or In-person Interview
Key Responsibilities:
- Lead and conduct detailed Root Cause Analysis (RCA) for incidents, identifying underlying issues and recommending corrective actions.
- Document and communicate findings from RCA processes, ensuring transparency and knowledge sharing across the organization.
- Develop and maintain incident postmortem reports, providing insights and actionable recommendations to stakeholders.
- Monitor system performance and reliability metrics, proactively identifying potential issues before they escalate.
- Contribute to the design and implementation of automated monitoring and alerting systems to improve incident detection and response times.
- Continuously improve the incident management process, incorporating feedback and lessons learned from RCA activities.
- Participate in incident response activities.
Qualifications :
- Bachelor's degree or equivalent experience in a software engineering discipline
- 6+ years of Software Engineering experience
- Excellent communication skills, with the ability to convey technical findings to both technical and non-technical audiences
- Excellent debugging and trouble shooting skills
- Experience in Site Reliability Engineering, DevOps, or a similar role, with a focus on incident management and RCA.
- Experience with monitoring and alerting tools (e.g., Prometheus, Grafana, Dynatrace).
- Familiarity with containerization technologies (e.g., Docker, Kubernetes).
Sandip Kumar
Sr. Tech Recruiter
Email:
sandip@stellentit.com
Address:
505 Knolle Court
Saint Augustine, FL 32092
Telephone:
+1 321-641-0093
Job Tags
Long term contract,