Operational excellence pillar

The operational excellence pillar focuses on running and monitoring systems, and continually improving processes and procedures. Key topics include automating changes, responding to events, and defining standards to manage daily operations.

Definition

Amazon define operational excellence as a commitment to build software correctly while consistently delivering a great customer experience. It contains best practices for organizing your team, designing your workload, operating it at scale, and evolving it over time.

Goal

The goal of operational excellence is to get new features and bug fixes into customers’ hands quickly and reliably. Organizations that invest in operational excellence consistently delight customers while building new features, making changes, and dealing with failures. Along the way, operational excellence drives towards continuous integration and continuous delivery (CI/CD).

Design principles

  • Organize teams around business outcomes
  • Implement observability for actionable insights
  • Safely automate where possible
  • Make frequent, small, reversible changes
  • Refine operations procedures frequently
  • Anticipate failure
  • Learn from all operational events and metrics
  • Use managed services

Memorable keywords about the design principles of the operational excellence pillar :

Group work CI/CD Continuous improvement Good experience
Predictions Services Results Construction commitments

There are four best practice areas for operational excellence in the cloud:

You can read more in the official AWS documentation