Skip to content

Monitoring IoT platform performance in production

Can your IoT platform detect degradation before your customers do?

About the company

A leading provider of IoT and telematics solutions, specialising in fleet management, asset tracking, and connected devices. Their microservice-based platform processes vast amounts of data from IoT devices worldwide.

Monitoring IoT platform performance in production

Industry

Industry 4.0, IoT

Key challenge

No visibility into live point-to-point performance across microservices; existing tools unable to simulate proprietary device protocols at scale

Stack under test

Proprietary TCP/UDP device connectors, Apache Kafka (event backbone), Elasticsearch (device state and analytics)

QALIPSIS deployment

Long-running campaigns in standalone mode for continuous production monitoring

Challenges

How to monitor production without real device protocols?

  • Existing tools could only target HTTP and could not simulate proprietary TCP/UDP protocols.
  • No visibility into bottlenecks across the full microservice pipeline.
  • Degradation was only detected after customers were already affected.
  • The team needed continuous monitoring with proactive alerting.

Solution: how QALIPSIS was used

How to simulate real device traffic with TCP and UDP?

  • TCP steps established persistent connections to proprietary device connectors.
  • Traffic mix closely mirrored the production device fleet’s communication patterns.

How to cross-verify the event pipeline through Kafka?

  • Kafka plugin consumed events from internal topics alongside device simulation.
  • Join operators matched each device payload against its corresponding processed event.
  • Bottleneck identified: event-enrichment service degraded above a load threshold.

How to validate data persistence in Elasticsearch?

  • Elasticsearch plugin verified every telemetry event was correctly stored and retrievable.
  • Issue found: indexing service intermittently rejected records under sustained peak load.

How to detect degradation before customer impact?

  • Long-running campaigns fed real-time statistics directly to the operations team.
  • Slack alerting triggered on failed and warning outcomes for immediate intervention.
  • Adaptable Kotlin DSL scenarios evolved as new device types were onboarded.

Results

higher throughput
full availability
smaller end-to-end latency
proactive detection

Conclusion

Challenge

No visibility into live point-to-point microservice performance, with existing tools unable to simulate proprietary device protocols at scale.

Solution

QALIPSIS deployed as continuous production monitoring combining TCP/UDP simulation, Kafka pipeline verification, and Elasticsearch validation with Slack alerting.

Gains

80% more events processed per minute, 35% lower end-to-end latency, zero downtime at peak; critical bottlenecks detected and resolved before customer impact.

More use cases to explore

Want to experience similar results for your IoT platform?
Request a Demo of QALIPSIS Today