Monitoring IoT platform performance in production

Can your IoT platform detect degradation before your customers do?

About the company

A leading provider of IoT and telematics solutions, specialising in fleet management, asset tracking, and connected devices. Their microservice-based platform processes vast amounts of data from IoT devices worldwide.

Monitoring IoT platform performance in production

Industry

Industry 4.0, IoT

Key challenge

No visibility into live point-to-point performance across microservices; existing tools unable to simulate proprietary device protocols at scale

Stack under test

Proprietary TCP/UDP device connectors, Apache Kafka (event backbone), Elasticsearch (device state and analytics)

QALIPSIS deployment

Long-running campaigns in standalone mode for continuous production monitoring

Challenges

How to monitor production without real device protocols?

Existing tools could only target HTTP and could not simulate proprietary TCP/UDP protocols.
No visibility into bottlenecks across the full microservice pipeline.
Degradation was only detected after customers were already affected.
The team needed continuous monitoring with proactive alerting.

Solution: how QALIPSIS was used

How to simulate real device traffic with TCP and UDP?

TCP steps established persistent connections to proprietary device connectors.
Traffic mix closely mirrored the production device fleet’s communication patterns.

How to cross-verify the event pipeline through Kafka?

Kafka plugin consumed events from internal topics alongside device simulation.
Join operators matched each device payload against its corresponding processed event.
Bottleneck identified: event-enrichment service degraded above a load threshold.

How to validate data persistence in Elasticsearch?

Elasticsearch plugin verified every telemetry event was correctly stored and retrievable.
Issue found: indexing service intermittently rejected records under sustained peak load.

How to detect degradation before customer impact?

Long-running campaigns fed real-time statistics directly to the operations team.
Slack alerting triggered on failed and warning outcomes for immediate intervention.
Adaptable Kotlin DSL scenarios evolved as new device types were onboarded.

Results

higher throughput

full availability

smaller end-to-end latency

proactive detection

Conclusion

Challenge

No visibility into live point-to-point microservice performance, with existing tools unable to simulate proprietary device protocols at scale.

Solution

QALIPSIS deployed as continuous production monitoring combining TCP/UDP simulation, Kafka pipeline verification, and Elasticsearch validation with Slack alerting.

Gains

80% more events processed per minute, 35% lower end-to-end latency, zero downtime at peak; critical bottlenecks detected and resolved before customer impact.

More use cases to explore

E-commerce

Request a Demo of QALIPSIS Today

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.