Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [hono-dev] Metrics for message processing

Argh, you were faster than me. Yes I fully agree with you. Even better would be if we could also distinguish between no credit to messaging network or other problems. But I am not sure if this is out of scope of metrics.

Mit freundlichen Grüßen / Best regards

 Daniel Maier

Cloud Services LWM2M (INST/ECS4) 
Bosch Software Innovations GmbH | Stuttgarter Straße 130 | 71332 Waiblingen | GERMANY | www.bosch-si.com
daniel.maier4@xxxxxxxxxxxx

Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B 
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Michael Hahn 




-----Ursprüngliche Nachricht-----
Von: hono-dev-bounces@xxxxxxxxxxx <hono-dev-bounces@xxxxxxxxxxx> Im Auftrag von Frank Karsten (INST/ECS4)
Gesendet: Mittwoch, 22. August 2018 09:54
An: hono developer discussions <hono-dev@xxxxxxxxxxx>
Betreff: Re: [hono-dev] Metrics for message processing

Hi Daniel,

This is what I was just addressing by my response (with the proposal to add a tag "api" to the metrics), so I guess we fully agree on this.

Mit freundlichen Grüßen / Best regards

 Karsten Frank

(INST/ECS4)
Bosch Software Innovations GmbH | Ullsteinstr. 128 | 12109 Berlin | GERMANY | www.bosch-si.com

Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Michael Hahn 




> -----Original Message-----
> From: hono-dev-bounces@xxxxxxxxxxx <hono-dev-bounces@xxxxxxxxxxx> On
> Behalf Of Maier Daniel (INST/ECS4)
> Sent: Mittwoch, 22. August 2018 09:49
> To: hono developer discussions <hono-dev@xxxxxxxxxxx>
> Subject: Re: [hono-dev] Metrics for message processing
> 
> Hi Kai,
> 
> I like your idea a lot. I was also confused from time to time by the "old" messaging
> metrics. Do you see any possibility to distinguish between undeliverable messages
> that got dropped because of "internal" communication issues (e.g. no credits for
> connection to tenant manager) or undeliverable messages that got dropped
> because no consumer is connected to messaging network? This would be very
> helpful for us to see if there is a problem within our system or if there is just no
> consumer connected by the user. Perhaps it would be possible to add the target
> system of the failure as a tag to the metric. With this we can at least distinguish if
> the problem was related to messaging network or for example to tenant manager.
> 
> Mit freundlichen Grüßen / Best regards
> 
>  Daniel Maier
> 
> Cloud Services LWM2M (INST/ECS4)
> Bosch Software Innovations GmbH | Stuttgarter Straße 130 | 71332 Waiblingen |
> GERMANY | www.bosch-si.com daniel.maier4@xxxxxxxxxxxx
> 
> Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B
> Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan
> Ferber, Michael Hahn
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: hono-dev-bounces@xxxxxxxxxxx <hono-dev-bounces@xxxxxxxxxxx> Im Auftrag
> von Hudalla Kai (INST/ECS4)
> Gesendet: Montag, 20. August 2018 17:11
> An: hono-dev@xxxxxxxxxxx
> Betreff: [hono-dev] Metrics for message processing
> 
> Hi list,
> 
> I am currently thinking about the metrics that we maintain for the messages we
> process. In the original design we had Hono Messaging as the central component
> that all protocol adapters had been connected to and which all downstream
> messages had to be sent to from the adapters. It therefore felt like the right place
> to implement the messaging metrics in and e.g. count the number of messages
> that have been forwarded successfully vs. the messages that had to be discarded
> due to a lack of credit.
> 
> With the deprecation of Hono Messaging, we are now maintaining the metrics in
> the protocol adapters directly. IMHO this is a good opportunity to think a little about
> the metrics we are maintaining as well.
> 
> Currently, we are record metrics for "processed", "discarded" and "undeliverable"
> messages. However, we have never clearly defined these terms. That was
> probably because the only place where it was relevant was Hono Messaging and
> the way it was implemented there served as the "definition".
> 
> As such, we currently use something like the following in Hono Messaging:
> 
> "processed": message from device complies with all requirements and has been
> successfully forwarded to the downstream consumer
> 
> "discarded": message has been sent pre-settled (by the adapter) and there is no
> credit available for the message to be forwarded. The message is then silently
> discarded, i.e. the sender is not informed about the failure to deliver the message.
> The device cannot distinguish this case from the "processed" case.
> 
> "undeliverable": message has been sent unsettled (by the adapter) and there is no
> credit available for the message to be forwarded. The message is then released
> and the adapter will signal the failure to deliver to the device (if the transport
> protocol allows to do so). The device may or may not be able to distinguish this
> case from the "processed" case.
> 
> The first metric clearly is of interest in order to see the current throughput of the
> system. The other two metrics, however, are harder to understand in the context of
> a particular protocol adapter because they require to understand how the transport
> protocol is mapped to AMQP 1.0. For example, a telemetry message that is
> published for a tenant using QoS 0 and for which no consumer is connected, will
> end up in the "discarded" metric whereas the same message published using QoS
> 1 would end up in the "undeliverable" metric, despite the fact that the reason for
> the failure to deliver is the same in both cases: no credit.
> 
> After some discussion about this with our operations team, it became clear, that
> from their perspective it is actually more interesting to get an indication of the
> reason for a problem in the metric itself. In particular, it is of interest to distinguish
> between cases where messages cannot be processed due to errors caused by the
> device, e.g. malformed headers, versus errors where a message cannot be
> processed due to problems in the back end infrastructure, e.g. a service not being
> available or the aforementioned lack of credit. In the former case we need to
> advise device developers how to fix the problem, in the latter case the ops team
> needs to get going themselves.
> 
> In addition to this coarse distinction, it is still helpful to know the ratio of credit used
> vs. credit available because this may serve as an indicator for scaling the
> infrastructure up or down.
> 
> I would therefore like to introduce additional (adapter specific) metrics that are
> better suited to cover these requirements. These metrics should be tagged with the
> protocol, host, tenant and message type (e.g. telemetry, event ...) if possible, e.g.
> a message might be unprocessable because it lacks tenant information. In such a
> case the problem could be recorded using the "UNKNOWN"
> tenant ...
> 
> "meter.hono.messages.processed" - message has been successfully processed.
> 
> "meter.hono.messages.unprocessable" - message cannot be processed because
> the message does not contain all required information, e.g. malformed topic name,
> missing header, not authorized etc. This metric is used by an adapter to record a
> message that it either discards silently or rejects (signaling the problem to the
> device). In no case will the message being processed.
> 
> "meter.hono.messages.undeliverable" -  message cannot be processed because of
> a problem not caused by the sender of the message (the device), e.g. Tenant
> service is not available, no credit available, etc. This metric is used by an adapter
> regardless of whether the transport protocol allows for signaling back the problem
> to the device or not. For instance, an MQTT message published using QoS 0
> doesn't allow to signal back the failure whereas HTTP allows to send back a status
> code in the HTTP response. In no case will the message being processed.
> 
> "counter|meter.hono.messages.capacity" - the number of credits remaining for
> sending messages. TODO determine if a counter or a meter is more reasonable to
> use.
> 
> 
> We could then deprecate the existing protocol adapter specific metric(s) and
> eventually remove them together with Hono Messaging.
> 
> 
> WDYT?
> 
> --
> Mit freundlichen Grüßen / Best regards
> 
> Kai Hudalla
> Chief Software Architect
> 
> Bosch Software Innovations GmbH
> Ullsteinstr. 128
> 12109 Berlin
> GERMANY
> www.bosch-si.com
> 
> Registered Office: Berlin, Registration Court: Amtsgericht Charlottenburg; HRB
> 148411 B
> Chairman of the Supervisory Board: Dr.-Ing. Thorsten Lücke; Managing Directors:
> Dr. Stefan Ferber, Michael Hahn
> _______________________________________________
> hono-dev mailing list
> hono-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this
> list, visit https://dev.eclipse.org/mailman/listinfo/hono-dev
> _______________________________________________
> hono-dev mailing list
> hono-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this
> list, visit
> https://dev.eclipse.org/mailman/listinfo/hono-dev
_______________________________________________
hono-dev mailing list
hono-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/hono-dev

Back to the top