NICF- Site Reliability Engineering - Processes and Management

NUS Institute of Systems Science

How long?

  • 3 days
  • in person

What are the topics?

NUS Institute of Systems Science


Coursalytics is an independent platform to find, compare, and book executive courses. Coursalytics is not endorsed by, sponsored by, or otherwise affiliated with any business school or university.

Full disclaimer.


Comprehensive course analysis

Unbiased reviews from past participants
Global companies alumni of this course worked for
Positions of participants who took this course
Countries where most past participants are from
Individual needs analysis
Order for $20.00

Who should attend

Software engineers, system administrators, IT support, server engineers, datacentre engineers, IT service management professionals and any other IT professional that wants to move into a high-performance IT infrastructure and operations role.


  • Digital/IT professionals with at least 2 years of working experience

About the course

How do the tech giants manage their infrastructure? What can other organisations learn from them?

Mike Krieger, the co-founder of Instagram announced in 2012, that “2 backend engineers can scale a system to 30+ million users.” After the acquisition by Facebook, this team grew to 5.

Pinterest, another social media app. Was able to handle 18 million users and 410 terabytes of data and, with a company size of 12 people. How do they do it?

Cloud technologies are certainly fundamental to achieving such massive scale, but who implements, operates and maintains the services?

Site Reliability Engineering (SRE) is a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. Originally pioneered as an engineering practice at Google, this principle-based approach radicalises traditional IT service management processes to deliver, scale and recover faster and with a minimised reliance upon human intervention.

SRE is growing more important due to the need for high reliability as organisations increasingly run their business via highly-utilised modern IT services that are constantly changed to meet new market needs. This led to the implementation of SRE and its processes that ensure high reliability but NOT in a risk-adverse manner that would prevent agile change and innovation from taking place.

Many organistions have been adopting this discipline. In Singapore, some of the organisations that have adopted SRE include major banks, government organisations, systems integrators, digital food delivery providers, MNCs.

This course focuses upon the practical application of the core SRE Principles and Practices and describes how an organisation can make the shift from traditional IT system administration towards high scalability, with Site Reliability Engineering.

Key Takeaways

At the end of the course, the participants will be able to:

  • Explain the differences between traditional operations, DevOps and Site Reliability Engineering
  • Understand the importance of the SRE Principles and apply them in preventing and resolving problems to increase reliability
  • Select an appropriate organisation topology to enable SRE and high-performance IT
  • Design a policy to ensure SRE practices are carried out
  • Understand and apply the key components of the CI/CD pipeline, including canarying releases and how to design a release tool chain
  • Design and implement SLOs, SLIs and Error Budgets
  • Create a business case for shifting from incident escalation to swarming for problem resolution
  • Conduct blameless post-mortems to determine root causes via deep analysis
  • Design chaos experiments
  • Reduce manual toil using automation tools – with runbooks and helpdesk chatbots as examples

What Will Be Covered

This course will cover:

  • What is Site Reliability Engineering?
  • Organising for SRE
  • Use of Error Budgets for Agility
  • Release Engineering
  • Performance Engineering
  • Chaos Engineering
  • Managing On-Call
  • Reducing Toil through Automation


Jamie Donoghue

Jamie, a dual citizen of the UK and NZ, has spent over 20 years improving the business value of IT Services for public and private organisations in the UK, Australia and South East Asia. His experience spans the full spectrum of IT functions including: IT Strategy and Governance Business Devel...

Aaron Chua

Aaron has more than 18 years of IT technical, scrum and agile experience. Prior to joining ISS, he was the Agile Coach with Singapore Pools. He coached projects using agile methodologies and help to cultivate agile cultural in the company. Before joining Singapore Pools, he was the Scrum Master w...

NICF- Site Reliability Engineering - Processes and Management at NUS Institute of Systems Science

From  2889 SGD$2,187

Something went wrong. We're trying to fix this error.

Thank you for your application

We will contact the provider to ensure that seats are available and, if there is an admissions process, that you satisfy any requirements or prerequisites.

We may ask you for additional information.

To finalize your enrollment we will be in touch shortly.


Coursalytics is an independent platform to find, compare, and book executive courses. Coursalytics is not endorsed by, sponsored by, or otherwise affiliated with any business school or university.

Full disclaimer.

Because of COVID-19, many providers are cancelling or postponing in-person programs or providing online participation options.

We are happy to help you find a suitable online alternative.