Using Generic Mitigations and Playbooks as Code to Improve Reliability

Businesses today dread outages or production incidents. An outage results in engineers waking up in the middle of the night to try to resolve the incident, and more importantly, unhappy customers who can’t access these services. It’s more vital than ever to mitigate the customer impact of outages as fast as possible.

In this InfoQ moderated on-demand webinar, StackPulse CTO Leonid Belkind discusses the concept of generic mitigations and how this approach leads to a saner incident experience for on-call teams, as well as an improved customer experience during outages. He’ll also share tips on how to get started with generic mitigations by turning incident response playbooks from documents to executable code, and then touch on the benefits this approach brings to the table.

The chapters of this webinar are:

  1. Intros
  2. Agenda
  3. Challenge of SRE
  4. Mitigations – How they Help
  5. Sample Mitigation Strategies
  6. Playbooks-as-Code: Where they Fit
  7. Bringing it all Together
  8. Summary
  9. Q & A