The Future of Load Testing in a Cloud-Native World

TL;DR

Modern distributed systems rarely fail at the API boundary. An endpoint can return 200 OK while a queue stalls, an event is never published, or a downstream service silently fails. QALIPSIS was built to address this reality through observability-driven testing and deep validation of distributed workflows.

As part of the FFG-funded research project Traceon, QALIPSIS is exploring how operational observability data, distributed traces, and OpenTelemetry data can help derive realistic end-to-end microservice testing scenarios from actual system behavior. The project’s goal is to reduce manual mapping, limit fragile test scripts, improve reality-based coverage, and investigate opportunities for Automated test case synthesis from tracing logs.

Expected project outcomes include up to 70% test effort reduction and 50–80% more reality-based coverage, subject to research validation and practical implementation results.

Your API Returned 200 OK. So Why Did the System Still Fail?

The deployment succeeded.

The API responded correctly.

The monitoring dashboard stayed green.

The automated test passed.

A few minutes later, customers started calling.

The order was accepted, but the delivery workflow never completed. The customer account was created, but the welcome email was never scheduled. The inventory service never received the update. An asynchronous worker failed after the API had already returned a successful response.

This is one of the defining challenges of modern software engineering.

In distributed architectures, correctness is no longer determined by a single response. Business processes span APIs, databases, message queues, event streams, background workers, caches, and third-party services. Failures often occur in places that traditional testing never examines.

This challenge is exactly why QALIPSIS exists.

QALIPSIS was designed for testing complex distributed and asynchronous systems. Rather than focusing exclusively on external responses, it enables teams to validate the behavior of entire workflows across multiple services, dependencies, and execution paths.

The FFG-funded Traceon research project extends this vision further by investigating how production observability data can help create more realistic and maintainable testing approaches for modern microservice architectures.

Why Traditional E2E Testing Breaks in Distributed Systems

Traditional end-to-end testing assumes that engineers understand the most important business flows, manually document them, and maintain them over time.

That approach becomes increasingly difficult as systems grow.

A single customer action may trigger:

API requests
Service-to-service communication
Database transactions
Event publication
Queue processing
Background jobs
Cache updates
Retry mechanisms
Third-party integrations

To test these workflows, teams often rely on manually created scenarios.

The result is familiar:

Fragile test scripts that break whenever implementations change.
Excessive Manual mapping of technical and business processes.
Growing Test redundancy as multiple teams recreate similar
scenarios.
Limited visibility into asynchronous execution paths.
High maintenance costs for end-to-end microservice testing.

Many testing strategies still operate primarily as black-box tests. They validate visible outputs but often miss the internal behavior that determines whether a business process actually succeeded.

As distributed systems become more dynamic, maintaining realistic test coverage becomes one of the most expensive parts of quality assurance.

Black-Box Testing vs. Internal System Validation

Traditional black-box testing asks a simple question:

What did the system return?

Internal system validation asks a more important one:

What actually happened?

Consider a typical order workflow.

A black-box test might verify:

HTTP status codes
API responses
Returned payloads

Internal validation investigates whether:

The database was updated.
Events were published correctly.
Queues received messages.
Background workers completed successfully.
Downstream systems processed expected actions.
State transitions occurred as intended.

This distinction becomes especially important when dealing with Asynchronous side-effects.

A successful API response does not guarantee a successful business process.

The payment processor may fail.

A message queue may stall.

A retry mechanism may never execute.

A background worker may terminate unexpectedly.

This is why end-to-end microservice testing increasingly requires visibility beyond the API boundary.

QALIPSIS was built to support this deeper validation model.

What Is QALIPSIS?

QALIPSIS is an enterprise testing platform designed specifically for distributed and asynchronous systems.

Unlike traditional testing tools that focus primarily on isolated API interactions, QALIPSIS supports:

End-to-end microservice testing

The platform supports both traditional end-to-end microservice testing and what many teams search for as end to end micro service testing when validating large-scale distributed environments.

Performance and load testing
Validation of distributed workflows
Verification of asynchronous side-effects
Kotlin-based test definition through a powerful DSL
Testing at enterprise scale

The platform was created to help engineering teams validate how systems behave under realistic conditions rather than idealized assumptions.

For CTOs, QA leaders, developers, DevOps engineers, and Site Reliability Engineers, the core objective remains simple:

Turn complex system behavior into reliable, executable, and
maintainable tests.

Traceon: Researching the Future of Observability-Driven Testing

Traceon is an FFG-funded research initiative coordinated by AERIS, the company behind QALIPSIS, in collaboration with SCCH (Software Competence Center Hagenberg).

The project investigates how operational observability data can support a new generation of testing methodologies.

Instead of relying exclusively on manually designed scenarios, Traceon explores whether production and staging telemetry can reveal realistic user journeys, service dependencies, execution patterns, and system interactions that may be valuable for testing purposes.

This research direction combines several important concepts:

Trace-based test generation
Operational data testing
Observability-driven testing
E2E synthesis
Pattern discovery from distributed traces
OpenTelemetry to E2E test workflow concepts

The goal is not to replace engineering expertise.

Rather, the project investigates how observability data can provide additional insight into how systems actually behave and how those insights could support the creation of more realistic testing assets.

As Eric Jesse, CEO of AERIS and lead architect of QALIPSIS, explains:

“Modern microservice systems generate an enormous amount of operational knowledge every day. Traceon explores how that knowledge can help teams create more realistic tests based on actual system behavior instead of assumptions. Our objective is to help engineering teams spend less time maintaining scripts and more time validating business-critical workflows.”

Why FFG Funding Matters

Innovation claims are easy to make.

Independent validation is harder.

The Austrian Research Promotion Agency (FFG) evaluates projects through a structured assessment process that considers innovation potential, technical feasibility, research methodology, and expected impact.

For QALIPSIS, Traceon represents an externally reviewed research initiative focused on advancing testing methodologies for distributed software systems.

FFG funding should not be interpreted as a guarantee of success or product maturity.

It does, however, serve as an external signal that the project’s research direction demonstrates meaningful innovation potential.

From Observability Data to Potential Test Scenarios

One of the most promising ideas behind Traceon is the possibility of learning from real system behavior.

Modern observability platforms already collect extensive information through:

Distributed tracing
OpenTelemetry logs
Metrics and monitoring data
Service interaction histories
Operational telemetry

These data sources contain valuable information about:

User journeys
Service dependencies
Execution sequences
Timing relationships
Common and uncommon workflows

These traces also expose payload dependencies, service interactions, state transitions, and execution paths that are often difficult to identify through manual analysis alone.

Traceon investigates how this information could contribute to an OpenTelemetry to E2E test workflow.

Conceptually, the research explores activities such as:

Collecting observability and tracing information.
Identifying recurring interaction patterns.
Studying dependencies across distributed services.
Filtering noise and irrelevant activity.
Supporting expert validation of discovered workflows.
Deriving structured QALIPSIS-compatible testing scenarios.

Importantly, these activities represent research objectives rather than production-ready capabilities.

The project explores how operational data testing could help teams reduce manual effort while improving the realism of testing assets.

How Do I Turn Jaeger, Zipkin, or OpenTelemetry Traces into Executable Performance Tests?

Modern observability platforms such as Jaeger, Zipkin, and OpenTelemetry already capture valuable information about user journeys, service interactions, dependencies, timing relationships, and execution paths.

Traceon investigates how these data sources could contribute to a future workflow that includes collecting traces, filtering noise, identifying usage paths, studying dependencies, validating findings with domain experts, and deriving QALIPSIS-compatible testing scenarios.

These concepts remain part of the project’s research direction and should not be interpreted as production-ready functionality.

If successful, such approaches could help bridge the gap between observability data and more realistic end-to-end microservice testing and performance validation.

How Can Production Traces Help Automate E2E Testing?

Production traces are essentially records of real-world system behavior.

They reveal:

Which services communicate
Which workflows occur most frequently
Which dependencies exist
Which payload dependencies matter
Which service interactions drive business-critical workflows
Which state transitions indicate successful process completion
Which execution paths are business critical

This makes them a valuable source of information for testing research.

Traceon investigates how trace mining, operational data analysis, and observability-driven testing techniques could help identify realistic scenarios that deserve validation.

By analyzing distributed tracing data, payload dependencies, service interactions, and execution sequences, the project investigates how realistic workflow generation can be supported through operational insights rather than manual assumptions.

Instead of building every scenario manually, engineering teams may eventually be able to leverage insights derived from real operational activity.

This concept sits at the heart of Trace-based test generation and Automated test case synthesis from tracing logs.

The objective is not simply to generate more tests.

The objective is to generate more relevant tests.

Realistic Load Testing Through Operational Data Testing

One of the biggest challenges in performance engineering is realism.

Many load tests simulate traffic patterns that look convincing but bear little resemblance to actual production usage.

Operational data testing offers a different perspective.

By analyzing real traffic patterns, organizations can better understand how users actually interact with distributed systems and how services behave under production conditions.

Teams can gain insight into:

User behavior patterns
Request frequencies
Service dependencies
Realistic concurrency levels
Resource utilization trends

Traceon investigates whether trace-derived workloads can support the creation of more realistic workloads for microservices testing and performance validation.

The objective is to better reflect production-like user behavior rather than relying exclusively on synthetic assumptions.

For organizations running large-scale distributed systems, realistic workload generation can be just as important as functional correctness.

Can AI and Machine Learning Help Generate Better Tests?

Potentially, yes.

Modern analysis techniques can identify patterns that are difficult to detect manually.

Research areas relevant to Traceon include:

Clustering of similar workflows
Sequence analysis
Pattern recognition
Trace mining
Statistical dependency analysis

These techniques may help reveal recurring usage paths and meaningful behavioral patterns hidden inside large observability datasets.

However, automation alone is not enough.

Any generated insights must remain understandable, explainable, and reviewable.

The Traceon vision is therefore explicitly designed for validation by Lead SREs, QA leads, developers, and domain experts.

Human expertise remains essential.

The objective is augmentation, not replacement.

As Dr. Stefan Fischer, Senior Researcher at SCCH, notes:

“The research challenge is not simply identifying patterns inside observability data. The real value comes from transforming those patterns into explainable knowledge that experts can validate and use with confidence.”

Why the QALIPSIS DSL Matters

Even the most sophisticated testing approach ultimately depends on maintainability.

QALIPSIS uses a Kotlin-based DSL because it allows testing scenarios to remain:

Readable
Structured
Version controlled
Executable
Extensible

Engineering teams can express complex workflows in a format that remains close to business intent while retaining the precision required for enterprise testing.

This is particularly important for organizations focused on Reducing test maintenance with real-user data.

If future trace-derived scenarios are incorporated into testing workflows, maintainability will be just as important as automation.

The Kotlin DSL provides the foundation for that balance.

Problem, Solution, and Expected Impact

The challenges of end-to-end microservice testing are well understood: manual mapping consumes valuable engineering time, fragile test scripts require constant maintenance, and traditional approaches often struggle to reflect how systems behave in production. The following comparison illustrates how the research direction explored by Traceon and the capabilities of QALIPSIS aim to address these challenges through observability-driven testing and operational data insights.

Challenge	Traditional Approach	QALIPSIS + Traceon Research Direction
Manual mapping	Manual scenario design	Investigate operational-data-derived scenarios
Fragile test scripts	Continuous script maintenance	Explore trace-informed scenario generation
Test redundancy	Duplicate coverage efforts	Identify recurring behavior patterns
Limited realism	Assumed user behavior	Analyze actual system behavior
API-only validation	Surface-level checks	Internal system validation
Load test realism	Synthetic workloads	Trace-informed workload modeling

Expected Project Impact

As a research initiative, Traceon focuses on investigating how production observability data can improve the efficiency, realism, and maintainability of testing workflows. The following metrics represent expected project outcomes and research objectives rather than guaranteed customer results.

Metric	Project Goal
Test effort reduction	Up to 70%
Reality-based coverage increase	50–80% (up to 80% reality coverage)
Scenario relevance	Improved through operational insights
Maintenance effort	Reduced through better alignment with real behavior

These figures represent research targets and expected project outcomes rather than guaranteed customer results.

What This Means for Engineering Teams

While the underlying technologies involve distributed tracing, operational data analysis, and observability-driven testing, the ultimate value lies in how these innovations support everyday engineering work. The table below summarizes the potential benefits for different stakeholders across modern software organizations.

Audience	Primary Value
CTOs	Greater confidence in distributed-system reliability and release readiness
QA Leads	Less manual mapping and more realistic coverage opportunities
Developers	Maintainable Kotlin DSL-based testing workflows
DevOps Engineers	Stronger alignment between observability and testing
SREs	Better visibility into distributed behavior and asynchronous failures

Conclusion

The complexity of modern software systems continues to grow.

Microservices, asynchronous workflows, distributed dependencies, and event-driven architectures demand a different approach to testing than the one many teams still rely on today.

QALIPSIS was created to address these challenges through enterprise-grade end-to-end microservice testing, performance validation, and deep workflow verification.

With the Traceon research project, QALIPSIS is now exploring how observability-driven testing, operational data testing, distributed tracing, and trace-based test generation can help engineering teams build more realistic and maintainable testing strategies.

The long-term vision is straightforward:

Help teams turn real system behavior into realistic, executable, and
maintainable tests.

That vision positions QALIPSIS as an innovative, trustworthy, and enterprise-ready platform for the future of distributed systems testing.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.