Key Responsibilities:
Incident Management:
● Lead the management of incidents from detection to resolution, ensuring timely communication and minimal impact on customers. ● Coordinate with on-call teams (both Devops and R&D) to address critical issues and provide rapid resolutions. ● Act as the incident manager during major events, providing updates and ensuring adherence to established incident protocols.
Root Cause Analysis:
● Perform post-incident root cause analysis (RCA) and create reports with actionable recommendations for improvement. ● Lead post-mortem discussions to identify system weaknesses and areas for enhancement.
Monitoring and Alerting:
● Improve and maintain real-time monitoring systems and alerts to ensure early detection of issues across our platforms. ● Work closely with development and devops teams to enhance observability and increase system visibility.
Automation:
● Identify repetitive tasks in the incident management process and automate them to reduce manual intervention and response times. ● Implement tools and processes that improve system resilience and reduce the frequency of incidents.
Qualifications:
● Proven experience in Site Reliability Engineering or a similar role with a strong focus on incident management. ● Strong understanding of incident response protocols, root cause analysis, and post-mortem processes. ● Experience with monitoring and alerting tools such as Prometheus, Grafana, Coralogix, or equivalent. ● Proficiency in cloud management (AWS) and a deep understanding of scaling and reliability practices. ● Familiarity with CI/CD pipelines and automation tools (e.g., Jenkins, Terraform, Github Actions). ● Excellent problem-solving skills with the ability to manage high-pressure situations. ● Strong communication skills, with the ability to clearly articulate technical issues to both technical and non-technical stakeholders. ● Experience working in an on-call rotation and leading incident response efforts.
You can contact us at info@apitree.cz or at +420 602 609 112
Whether you are looking for a supplier for your new software or want to be part of the team, leave us a message and we will get back to you as soon as possible.