Who should attend
Software engineers, system administrators, IT support, server engineers, datacentre engineers, IT service management professionals and any other IT professional that wants to move into a high-performance IT infrastructure and operations role.
- Digital/IT professionals with at least 2 years of working experience
About the course
How do the tech giants manage their infrastructure? What can other organisations learn from them?
Mike Krieger, the co-founder of Instagram announced in 2012, that “2 backend engineers can scale a system to 30+ million users.” After the acquisition by Facebook, this team grew to 5.
Pinterest, another social media app. Was able to handle 18 million users and 410 terabytes of data and, with a company size of 12 people. How do they do it?
Cloud technologies are certainly fundamental to achieving such massive scale, but who implements, operates and maintains the services?
Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. Originally pioneered as an engineering practice at Google, this principle-based approach radicalises traditional IT service management processes to deliver, scale and recover faster and with a minimised reliance upon human intervention.
SRE is growing more important due to the need for high reliability as organisations increasingly run their business via highly-utilised modern IT services that are constantly changed to meet new market needs. This led to the implementation of SRE and its processes that ensure high reliability but NOT in a risk-adverse manner that would prevent agile change and innovation from taking place.
Many organistions have been adopting this discipline. In Singapore, some of the organisations that have adopted SRE include major banks, government organisations, systems integrators, digital food delivery providers, MNCs.
This course focuses upon the practical application of the core SRE Principles and Practices and describes how an organisation can make the shift from traditional IT system administration towards high scalability, with Site Reliability Engineering.
At the end of the course, the participants will be able to:
- Explain the differences between traditional operations, DevOps and Site Reliability Engineering
- Understand the importance of the SRE Principles and apply them in preventing and resolving problems to increase reliability
- Select an appropriate organisation topology to enable SRE and high-performance IT
- Design a policy to ensure SRE practices are carried out
- Understand and apply the key components of the CI/CD pipeline, including canarying releases and how to design a release tool chain
- Design and implement SLOs, SLIs and Error Budgets
- Create a business case for shifting from incident escalation to swarming for problem resolution
- Conduct blameless post-mortems to determine root causes via deep analysis
- Design chaos experiments
- Reduce manual toil using automation tools – with runbooks and helpdesk chatbots as examples
What Will Be Covered
This course will cover:
- What is Site Reliability Engineering?
- Organising for SRE
- Use of Error Budgets for Agility
- Release Engineering
- Performance Engineering
- Chaos Engineering
- Managing On-Call
- Reducing Toil through Automation
Jamie, a dual citizen of the UK and NZ, has spent over 20 years improving the business value of IT Services for public and private organisations in the UK, Australia and South East Asia. His experience spans the full spectrum of IT functions including: IT Strategy and Governance Business Devel...
Aaron has more than 18 years of IT technical, scrum and agile experience. Prior to joining ISS, he was the Agile Coach with Singapore Pools. He coached projects using agile methodologies and help to cultivate agile cultural in the company. Before joining Singapore Pools, he was the Scrum Master w...
Because of COVID-19, many providers are cancelling or postponing in-person programs or providing online participation options.
We are happy to help you find a suitable online alternative.