Abstract: The complexity of managing and delivering the high level of reliability expected of web-based, cloud hosted systems today, and the expectation of Continuous Delivery of new features has led to the evolution of a totally new field of Service Reliability Engineering catered for such systems. Google, who has been a pioneer in this field, calls it Site Reliability Engineering (SRE). While it would be more aptly named Service Reliability Engineering, the name has caught on. The seminal work documenting Google approach and practices is in the book by Google by the same name (commonly referred to as the ‘SRE book’), and has become the defacto standard on how to adopt SRE in an organization. This session will cover adopting SRE as a practice in large enterprises.
Learning Outcomes: - - Understand SRE
- - How does SRE align with Agile
- - How does SRE align with DevOps
- - Adopting SRE in an enterprise
- - Use examples from the Apollo 13 incident as analogies to SRE in an Agile/DevOps environment