Testing continuity with TMAP

You probably know the feeling. Friday afternoon, just before the weekend. Everything seems to be running smoothly, until suddenly the production system crashes. That's when you realise how vulnerable continuity actually is. Not because the system fails, but because no one knows exactly what will happen if things really go wrong. Continuity is not something you only test when everything crashes. It is a quality attribute that you consciously include in your test strategy. And within TMAP, it plays an important role: it allows you to demonstrate how fast, stable and recoverable your system is.

schedule 13 okt 2025
bookmark_border TMap® Quality for cross-functional teams
create

Why software continuity is important (and often forgotten)

In software teams, everything revolves around speed. New features, faster releases, shorter sprints, in short: everything to create more value as quickly as possible. But speed without continuity is like a race car without brakes: spectacular, until it goes wrong.

But without attention to continuity as a TMAP quality attribute:

  • Recovery in the event of a failure becomes pure panic.
  • Downtime lasts longer than necessary.
  • Backups turn out not to work.
  • The dependence on “that one DevOps engineer” becomes enormous.

And yet continuity is not a luxury. It determines whether your system continues to run during and after a failure. Whether your organization survives when something unexpected happens. And whether users still have confidence when everything comes to a standstill.

What does continuity mean within TMAP?

Within TMAP, continuity revolves around one central question: how well does your system remain available (even when things go wrong)? It's about more than just uptime. It's about recoverability, robustness, and smart risk management.

TMAP divides continuity into five components, which together determine how reliable your system really is. Let's briefly go through the most important ones.

1. Operational reliability – does everything continue to run as intended?

Operational reliability is about stability in daily operations. How reliable is the processing of data, transactions, or messages? Does everything continue to run when the load increases or a component slows down?

A system with high operational reliability has built-in error handling, clear logging, and predictable behavior under pressure. Think of automatic retries, alerts in case of delays, or fallback mechanisms for critical processes.

Within TMAP, you test this by looking at the “happy flow” and the exceptions. What happens if an input is incomplete, a service temporarily does not respond, or an API call is delayed? Operational reliability is not about whether something works, but whether it continues to work when circumstances change.

The goal: a system that not only runs well during the demo, but also at 8:30 a.m. on Monday morning, when the entire organization logs in at the same time.

2. Robustness – how well does your system absorb a blow?

Robustness is about resilience: how well can your system cope with disruptions, errors, or temporary malfunctions without the entire platform crashing?

In practice, this means catching errors without crashing, resuming processes without data loss, and scaling down in a controlled manner in the event of malfunctions instead of coming to a complete standstill. A robust system knows what to do when something goes wrong and communicates this clearly via error messages or alerts (which actually arrive).

TMAP helps teams test robustness by incorporating so-called “negative testing” and stress tests into the testing approach. For example, simulate a sudden network interruption, a full database, or an unreachable external API. The question then is not whether the system continues to work, but how it responds.

A robust system does not have to be perfect, but it must respond predictably to imperfection. And that is exactly where TMAP adds value: it makes robustness measurable and repeatable to test.

3. Recoverability – how quickly can you get back up and running after a crash?

Recoverability is the extent to which you can restore a system to a stable, working state after a failure or incident. It's about speed, precision, and control: how long does it take to restore data, restart processes, and give users access again?

Many teams think their backup strategy is in order. Until they need it.

At TMAP, recoverability is therefore not just about having backups, but about systematically testing recovery procedures. This includes:

  • Performing controlled restore tests;
  • Validating backup data for completeness;
  • Testing rollback scripts and migration scenarios.

A system with high recoverability not only has a plan B, but also knows that plan works. The difference between a disaster and a routine is often not in technology, but in preparation.

4. Degradation capability – continuing to operate despite failure

Sometimes it is better for a system to continue doing something than to do nothing at all. That is what degradation capability is all about: can the core of your application continue to function if part of the system fails? 

  • Suppose a web shop temporarily loses its payment module. 
    • In an ideal world, everything should continue to work. 
    • In the real world, customers should at least still be able to place orders, with payments being processed later.

Testing for degradation therefore means investigating how the system responds when parts are temporarily unavailable (and whether that “emergency mode” is controlled rather than chaotic).

Within TMAP, you test this with scenarios in which components are deliberately disabled. Think of load balancers that fail, queue services that temporarily do not process messages, or APIs that time out. The goal is not to prevent errors, but to learn whether your system can withstand them.

5. Fallback option – always have a plan B

When everything goes down, you want to be able to switch over. A fallback option measures the extent to which a system can be continued elsewhere (e.g., in another region, cloud environment, or data center).

Failover sounds obvious, but in practice it often goes wrong. Why? Because the fallback environment is rarely tested properly. The backup database is never accessed, scripts are outdated, and no one knows whether the DNS switch will work properly.

With TMAP, you can include these kinds of scenarios in your test strategy. Consider:

  • Simulating a failover to a secondary environment.
  • Checking whether all dependencies (APIs, logging, monitoring) continue to work after the switch.
  • Testing whether user sessions are retained during the switchover.

It's technical precision, but that's exactly what distinguishes teams that put out fires from teams that are prepared.

Recognizable practical examples

You only realize the true value of continuity testing when things go wrong. And that happens more often than teams are willing to admit.

“The backup that turned out not to be a restore”

A neat backup file every night. Until one was needed—and no one knew that the restore procedure had never been tested. The result: lost data, panic, sleepless nights.

“The staging that no one dared to use”

There was a recovery environment. But no one knew if it worked. The credentials were outdated, scripts broke halfway through, and no one had the rights to restore anything.

“The failover that crashed”

During a malfunction, a switch was made to a second environment. Only... it turned out not to be synchronized with production. The data was three days old and the log files were missing.

TMAP helps to prevent such situations on a structural basis. Not by doing more testing, but by testing smarter for resilience instead of perfection.

Testing continuity with TMAP

Within TMAP, you don't test continuity “on the side,” but as part of your quality strategy.

Some test questions that help:

  • What happens if an essential component fails?
  • Are clear recovery procedures documented?
  • Can you fall back on a previous release without data loss?
  • Has monitoring been tested to ensure it responds correctly in the event of incidents?

TMAP provides tools to translate these questions into concrete test cases and risk analyses. This allows you to build not only reliable software, but also trust among your team and stakeholders.

How does Testlearning help you with this?

Testlearning is all about structurally ensuring quality in modern development environments. In the TMAP: Quality for Cross-Functional Teams e-learning course, you will learn how to incorporate continuity testing into your work processes, whether you are a tester, developer, or product owner. You will learn:

  • how to set up test scenarios for robustness and recoverability;
  • how to identify risks early on in Agile and DevOps teams;
  • how to make continuity visible and measurable with TMAP.

Follow the training wherever and whenever it suits you: on your laptop, tablet, or mobile phone. Practical, concrete, and directly applicable in your project environment.