Article

The Future of Load Testing in a Cloud-Native World

Eric

TL;DR

Modern distributed systems rarely fail at the API boundary. An endpoint can return 200 OK while a queue stalls, an event is never published, or a downstream service silently fails. QALIPSIS was built to address this reality through observability-driven testing and deep validation of distributed workflows.

As part of the FFG-funded research project Traceon, QALIPSIS is exploring how operational observability data, distributed traces, and OpenTelemetry data can help derive realistic end-to-end microservice testing scenarios from actual system behavior. The project’s goal is to reduce manual mapping, limit fragile test scripts, improve reality-based coverage, and investigate opportunities for Automated test case synthesis from tracing logs.

Expected project outcomes include up to 70% test effort reduction and 50–80% more reality-based coverage, subject to research validation and practical implementation results.

Your API Returned 200 OK. So Why Did the System Still Fail?

The deployment succeeded.

The API responded correctly.

The monitoring dashboard stayed green.

The automated test passed.

A few minutes later, customers started calling.

The order was accepted, but the delivery workflow never completed. The customer account was created, but the welcome email was never scheduled. The inventory service never received the update. An asynchronous worker failed after the API had already returned a successful response.

This is one of the defining challenges of modern software engineering.

In distributed architectures, correctness is no longer determined by a single response. Business processes span APIs, databases, message queues, event streams, background workers, caches, and third-party services. Failures often occur in places that traditional testing never examines.

This challenge is exactly why QALIPSIS exists.

QALIPSIS was designed for testing complex distributed and asynchronous systems. Rather than focusing exclusively on external responses, it enables teams to validate the behavior of entire workflows across multiple services, dependencies, and execution paths.

The FFG-funded Traceon research project extends this vision further by investigating how production observability data can help create more realistic and maintainable testing approaches for modern microservice architectures.

Why Traditional E2E Testing Breaks in Distributed Systems

Traditional end-to-end testing assumes that engineers understand the most important business flows, manually document them, and maintain them over time.

That approach becomes increasingly difficult as systems grow.

A single customer action may trigger:

  • API requests
  • Service-to-service communication
  • Database transactions
  • Event publication
  • Queue processing
  • Background jobs
  • Cache updates
  • Retry mechanisms
  • Third-party integrations

To test these workflows, teams often rely on manually created scenarios.

The result is familiar:

  • Fragile test scripts that break whenever implementations change.
  • Excessive Manual mapping of technical and business processes.
  • Growing Test redundancy as multiple teams recreate similar
    scenarios.
  • Limited visibility into asynchronous execution paths.
  • High maintenance costs for end-to-end microservice testing.

Many testing strategies still operate primarily as black-box tests. They validate visible outputs but often miss the internal behavior that determines whether a business process actually succeeded.

As distributed systems become more dynamic, maintaining realistic test coverage becomes one of the most expensive parts of quality assurance.

Black-Box Testing vs. Internal System Validation

Traditional black-box testing asks a simple question:

What did the system return?

Internal system validation asks a more important one:

What actually happened?

Consider a typical order workflow.

A black-box test might verify:

  • HTTP status codes
  • API responses
  • Returned payloads

Internal validation investigates whether:

  • The database was updated.
  • Events were published correctly.
  • Queues received messages.
  • Background workers completed successfully.
  • Downstream systems processed expected actions.
  • State transitions occurred as intended.

This distinction becomes especially important when dealing with Asynchronous side-effects.

A successful API response does not guarantee a successful business process.

The payment processor may fail.

A message queue may stall.

A retry mechanism may never execute.

A background worker may terminate unexpectedly.

This is why end-to-end microservice testing increasingly requires visibility beyond the API boundary.

QALIPSIS was built to support this deeper validation model.

What Is QALIPSIS?

QALIPSIS is an enterprise testing platform designed specifically for distributed and asynchronous systems.

Unlike traditional testing tools that focus primarily on isolated API interactions, QALIPSIS supports:

  • End-to-end microservice testing

The platform supports both traditional end-to-end microservice testing and what many teams search for as end to end micro service testing when validating large-scale distributed environments.

  • Performance and load testing
  • Validation of distributed workflows
  • Verification of asynchronous side-effects
  • Kotlin-based test definition through a powerful DSL
  • Testing at enterprise scale

The platform was created to help engineering teams validate how systems behave under realistic conditions rather than idealized assumptions.

For CTOs, QA leaders, developers, DevOps engineers, and Site Reliability Engineers, the core objective remains simple:

Turn complex system behavior into reliable, executable, and
maintainable tests.

Traceon: Researching the Future of Observability-Driven Testing

Traceon is an FFG-funded research initiative coordinated by AERIS, the company behind QALIPSIS, in collaboration with SCCH (Software Competence Center Hagenberg).

The project investigates how operational observability data can support a new generation of testing methodologies.

Instead of relying exclusively on manually designed scenarios, Traceon explores whether production and staging telemetry can reveal realistic user journeys, service dependencies, execution patterns, and system interactions that may be valuable for testing purposes.

This research direction combines several important concepts:

  • Trace-based test generation
  • Operational data testing
  • Observability-driven testing
  • E2E synthesis
  • Pattern discovery from distributed traces
  • OpenTelemetry to E2E test workflow concepts

The goal is not to replace engineering expertise.

Rather, the project investigates how observability data can provide additional insight into how systems actually behave and how those insights could support the creation of more realistic testing assets.

As Eric Jesse, CEO of AERIS and lead architect of QALIPSIS, explains:

“Modern microservice systems generate an enormous amount of operational knowledge every day. Traceon explores how that knowledge can help teams create more realistic tests based on actual system behavior instead of assumptions. Our objective is to help engineering teams spend less time maintaining scripts and more time validating business-critical workflows.”

Why FFG Funding Matters

Innovation claims are easy to make.

Independent validation is harder.

The Austrian Research Promotion Agency (FFG) evaluates projects through a structured assessment process that considers innovation potential, technical feasibility, research methodology, and expected impact.

For QALIPSIS, Traceon represents an externally reviewed research initiative focused on advancing testing methodologies for distributed software systems.

FFG funding should not be interpreted as a guarantee of success or product maturity.

It does, however, serve as an external signal that the project’s research direction demonstrates meaningful innovation potential.

From Observability Data to Potential Test Scenarios

One of the most promising ideas behind Traceon is the possibility of learning from real system behavior.

Modern observability platforms already collect extensive information through:

  • Distributed tracing
  • OpenTelemetry logs
  • Metrics and monitoring data
  • Service interaction histories
  • Operational telemetry

These data sources contain valuable information about:

  • User journeys
  • Service dependencies
  • Execution sequences
  • Timing relationships
  • Common and uncommon workflows

These traces also expose payload dependencies, service interactions, state transitions, and execution paths that are often difficult to identify through manual analysis alone.

Traceon investigates how this information could contribute to an OpenTelemetry to E2E test workflow.

Conceptually, the research explores activities such as:

  1. Collecting observability and tracing information.
  2. Identifying recurring interaction patterns.
  3. Studying dependencies across distributed services.
  4. Filtering noise and irrelevant activity.
  5. Supporting expert validation of discovered workflows.
  6. Deriving structured QALIPSIS-compatible testing scenarios.

Importantly, these activities represent research objectives rather than production-ready capabilities.

The project explores how operational data testing could help teams reduce manual effort while improving the realism of testing assets.

How Do I Turn Jaeger, Zipkin, or OpenTelemetry Traces into Executable Performance Tests?

Modern observability platforms such as Jaeger, Zipkin, and OpenTelemetry already capture valuable information about user journeys, service interactions, dependencies, timing relationships, and execution paths.

Traceon investigates how these data sources could contribute to a future workflow that includes collecting traces, filtering noise, identifying usage paths, studying dependencies, validating findings with domain experts, and deriving QALIPSIS-compatible testing scenarios.

These concepts remain part of the project’s research direction and should not be interpreted as production-ready functionality.

If successful, such approaches could help bridge the gap between observability data and more realistic end-to-end microservice testing and performance validation.

How Can Production Traces Help Automate E2E Testing?

Production traces are essentially records of real-world system behavior.

They reveal:

  • Which services communicate
  • Which workflows occur most frequently
  • Which dependencies exist
  • Which payload dependencies matter
  • Which service interactions drive business-critical workflows
  • Which state transitions indicate successful process completion
  • Which execution paths are business critical

This makes them a valuable source of information for testing research.

Traceon investigates how trace mining, operational data analysis, and observability-driven testing techniques could help identify realistic scenarios that deserve validation.

By analyzing distributed tracing data, payload dependencies, service interactions, and execution sequences, the project investigates how realistic workflow generation can be supported through operational insights rather than manual assumptions.

Instead of building every scenario manually, engineering teams may eventually be able to leverage insights derived from real operational activity.

This concept sits at the heart of Trace-based test generation and Automated test case synthesis from tracing logs.

The objective is not simply to generate more tests.

The objective is to generate more relevant tests.

Realistic Load Testing Through Operational Data Testing

One of the biggest challenges in performance engineering is realism.

Many load tests simulate traffic patterns that look convincing but bear little resemblance to actual production usage.

Operational data testing offers a different perspective.

By analyzing real traffic patterns, organizations can better understand how users actually interact with distributed systems and how services behave under production conditions.

Teams can gain insight into:

  • User behavior patterns
  • Request frequencies
  • Service dependencies
  • Realistic concurrency levels
  • Resource utilization trends

Traceon investigates whether trace-derived workloads can support the creation of more realistic workloads for microservices testing and performance validation.

The objective is to better reflect production-like user behavior rather than relying exclusively on synthetic assumptions.

For organizations running large-scale distributed systems, realistic workload generation can be just as important as functional correctness.

Can AI and Machine Learning Help Generate Better Tests?

Potentially, yes.

Modern analysis techniques can identify patterns that are difficult to detect manually.

Research areas relevant to Traceon include:

  • Clustering of similar workflows
  • Sequence analysis
  • Pattern recognition
  • Trace mining
  • Statistical dependency analysis

These techniques may help reveal recurring usage paths and meaningful behavioral patterns hidden inside large observability datasets.

However, automation alone is not enough.

Any generated insights must remain understandable, explainable, and reviewable.

The Traceon vision is therefore explicitly designed for validation by Lead SREs, QA leads, developers, and domain experts.

Human expertise remains essential.

The objective is augmentation, not replacement.

As Dr. Stefan Fischer, Senior Researcher at SCCH, notes:

“The research challenge is not simply identifying patterns inside observability data. The real value comes from transforming those patterns into explainable knowledge that experts can validate and use with confidence.”

Why the QALIPSIS DSL Matters

Even the most sophisticated testing approach ultimately depends on maintainability.

QALIPSIS uses a Kotlin-based DSL because it allows testing scenarios to remain:

  • Readable
  • Structured
  • Version controlled
  • Executable
  • Extensible

Engineering teams can express complex workflows in a format that remains close to business intent while retaining the precision required for enterprise testing.

This is particularly important for organizations focused on Reducing test maintenance with real-user data.

If future trace-derived scenarios are incorporated into testing workflows, maintainability will be just as important as automation.

The Kotlin DSL provides the foundation for that balance.

Problem, Solution, and Expected Impact

The challenges of end-to-end microservice testing are well understood: manual mapping consumes valuable engineering time, fragile test scripts require constant maintenance, and traditional approaches often struggle to reflect how systems behave in production. The following comparison illustrates how the research direction explored by Traceon and the capabilities of QALIPSIS aim to address these challenges through observability-driven testing and operational data insights.

ChallengeTraditional ApproachQALIPSIS + Traceon Research Direction
Manual mappingManual scenario designInvestigate operational-data-derived scenarios
Fragile test scriptsContinuous script maintenanceExplore trace-informed scenario generation
Test redundancyDuplicate coverage effortsIdentify recurring behavior patterns
Limited realismAssumed user behaviorAnalyze actual system behavior
API-only validationSurface-level checksInternal system validation
Load test realismSynthetic workloadsTrace-informed workload modeling

Expected Project Impact

As a research initiative, Traceon focuses on investigating how production observability data can improve the efficiency, realism, and maintainability of testing workflows. The following metrics represent expected project outcomes and research objectives rather than guaranteed customer results.

MetricProject Goal
Test effort reductionUp to 70%
Reality-based coverage increase50–80% (up to 80% reality coverage)
Scenario relevanceImproved through operational insights
Maintenance effortReduced through better alignment with real behavior

These figures represent research targets and expected project outcomes rather than guaranteed customer results.

What This Means for Engineering Teams

While the underlying technologies involve distributed tracing, operational data analysis, and observability-driven testing, the ultimate value lies in how these innovations support everyday engineering work. The table below summarizes the potential benefits for different stakeholders across modern software organizations.

AudiencePrimary Value
CTOsGreater confidence in distributed-system reliability and release readiness
QA LeadsLess manual mapping and more realistic coverage opportunities
DevelopersMaintainable Kotlin DSL-based testing workflows
DevOps EngineersStronger alignment between observability and testing
SREsBetter visibility into distributed behavior and asynchronous failures

Conclusion

The complexity of modern software systems continues to grow.

Microservices, asynchronous workflows, distributed dependencies, and event-driven architectures demand a different approach to testing than the one many teams still rely on today.

QALIPSIS was created to address these challenges through enterprise-grade end-to-end microservice testing, performance validation, and deep workflow verification.

With the Traceon research project, QALIPSIS is now exploring how observability-driven testing, operational data testing, distributed tracing, and trace-based test generation can help engineering teams build more realistic and maintainable testing strategies.

The long-term vision is straightforward:

Help teams turn real system behavior into realistic, executable, and
maintainable tests.

That vision positions QALIPSIS as an innovative, trustworthy, and enterprise-ready platform for the future of distributed systems testing.

More blog posts