Skip to content


Repository files navigation

Software Reliability Model - reliability-model

This Software Reliability Model (SRM) provides a flexible and explainable model of software reliability in terms of technical foundations, socio-technical constraints, and human factors. It is designed to help explore and explain software reliability to people of various roles who are involved in building and running software systems (especially when on-boarding new team members), and track progress in improving reliability. The SRM is also designed to make it easy to generate and update hierarchical metrics for product health scores across multiple teams.

The software reliability model is designed to be relevant to several different kinds of software systems:

  • internet and cloud-based software
  • desktop software
  • IoT and embedded software
  • (combinations of the above)

The different team measures, context, and genres of influences are linked to create a graph that helps to explain the dynamics around software reliability. The graph is then visualised using the visualisation tool Kumu.

Screenshot of the reliability model visualised in Kumu

Who created the Software Reliability Model?

This software reliability model was co-created by people from TELUS Digital (@telus) and Conflux (@ConfluxHQ), with significant contributions from:


The SRM is aimed at these kinds of people:

  • Product Owner / Product Manager
  • SRE Manager / SRE Lead
  • Software Architect / Systems Architect / Test Architect
  • Software Developer
  • Software Tester

The SRM helps these people to explore and discuss different aspects of software reliability to help make targeted improvements.

What's in the SRM?

The SRM is composed of 2 main parts:

  1. definitions of reliability factors in CSV format suitable for import to Kumu 📄
  2. graphs in Kumu generated by importing the CSV definitions 📊

The CSV files (and visualisation settings) are imported into Kumu to generate explorable graphs.

How to use the reliability model

There are several ways to use the SRM. Here are some suggestions:

  1. Freeform exploration: use the Kumu graphs to investigate different aspects of reliability in a free-form way.
  2. Guided Workshops: use the context groupings to do a deep dive into specific aspects of reliability. For example, run a 90-minute workshop on Decoupling and isolation or Speed of remeditation. Use the workshop to get a sense of awareness within the team of the team-level practices and measures that sit under that context parent node. Then repeat the workshop but with a new context.
  3. Metrics roll-up: use the SRM to score teams on their current reliability practices and status. The Metric and Measure details for each leaf node provide details of what to measure and the type of measurement. Aggregate the measures into the parent nodes until you have a single score for Reliability for that team.
  4. All 3 of the above: use all three above approaches for maximum benefit, helping the team members to understand how they can help to improve reliability on a daily basis.

Explore the latest version of the model on Kumu

Visit the latest stable version of the reliability model on Kumu:

Screenshot of SRM graph visualization on Kumu

See all versions of the model:

Types of factors in the model

There are several types of factors in the SRM - each factor type is shown differently in the Kumu graph:

  • team measure - team-level measures that influence reliability
  • context - the context in which measures are taken
  • genre - the high-level grouping of measures
  • reliability - the ultimate goal of all these factors

Tags to help explore the model

Tags are used to explore different dimensions of the model:

  • 4 Key Metrics - from the book Accelerate:
    • lead time
    • deployment frequency
    • Mean Time To Restore (MTTR)
    • Change failure rate
  • CodeScene - measures from the tool CodeScene (see
  • Continuous Delivery - measures from the Continuous Delivery dimension of MSDA
  • Deployment - measures from the Deployment dimension of MSDA
  • Deployment technique - techniques for reliability focused on deployment aspects
  • Flow - measures from the Flow dimension of MSDA
  • Human technique - techniques for reliability focused on human aspects
  • MSDA - measures from Multi-team Software Delivery Assessment (MSDA)
  • On-call - measures from the On-call dimension of MSDA
  • Operability - measures from the OPerability dimension of MSDA
  • RTCE - measures from the Reliability Through Customer Eyes (RTCE) principles devised by TELUS and Conflux.
  • Reliability and SRE - measures from the Reliability dimension of MSDA
  • Runtime technique - techniques for reliability focused on runtime aspects
  • Team Health - measures from the Team Health dimension of MSDA
  • Team Topologies - measures derived from the book Team Topologies
  • Team API - measures relating to the 'Team API' concept in Team Topologies
  • Team Autonomy - measures relating to team autonomy as discussed in Team Topologies
  • Team Cognitive Load - measures relating to the 'Team Cognitive Load' concept in Team Topologies
  • Testability - measures from the Testability dimension of MSDA
  • UX - measures relating to end-user experience
  • Version Control Hygiene - measures relating to good version control practices

Notes on the model and visualization

  1. Names of Kumu element types, connection types, and tags are "selector friendly" for the Kumu advanced editor - a single word.
  2. The graph layout visualization is controlled by the settings in the *.css view files (imported into the Advanced Editor settings in Kumu).
  3. The CSV data import in Kumu needs some attention to detail. Be sure to follow the CSV import details.

Books that influenced the model

These books influenced the reliability model significantly:

Possible improvements

  • Use CI to test Pull Requests against Kumu import:
    • Duplicate nodes?
    • Dangling connectors?


No description, website, or topics provided.







No releases published