Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [hono-dev] Metrics for message processing

Hi Kai,

Thanks for the IMHO good summary.

I think the separation of "DevOps related" and "client/devices related" metrics to record the overall operational state of Hono based installations
makes a lot of sense.

Thinking about the different reasons for "no credit available" I propose to extend the tags of the metrics:

From "host, tenant, type, protocol" 
To "host, tenant, type, protocol, api".

Where api should be set to the Hono-API that caused a problem (e.g. Tenant-API, Credentials-API, Event-API, Registration-API, and so on).

Like with the tenant being set to "UNKNOWN" if no tenant was provided by a device, the api could be set to "UNKNOWN", if the reason for not being able to deliver a message
is not related to any Hono-API.

In case of "no credit available", we should always have one of Hono's APIs involved, so this tag could always be filled.

By this means a DevOps team could better judge which part of the system might have to be scaled up.
WDYT?

All other points: +1 from me.

Mit freundlichen Grüßen / Best regards

 Karsten Frank

(INST/ECS4) 
Bosch Software Innovations GmbH | Ullsteinstr. 128 | 12109 Berlin | GERMANY | www.bosch-si.com

Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B 
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Michael Hahn 



> -----Original Message-----
> From: hono-dev-bounces@xxxxxxxxxxx <hono-dev-bounces@xxxxxxxxxxx> On
> Behalf Of Hudalla Kai (INST/ECS4)
> Sent: Montag, 20. August 2018 17:11
> To: hono-dev@xxxxxxxxxxx
> Subject: [hono-dev] Metrics for message processing
> 
> Hi list,
> 
> I am currently thinking about the metrics that we maintain for the messages we
> process. In the original design we had Hono Messaging as the central component
> that all protocol adapters had been connected to and which all downstream
> messages had to be sent to from the adapters. It therefore felt like the right place
> to implement the messaging metrics in and e.g. count the number of messages
> that have been forwarded successfully vs. the messages that had to be discarded
> due to a lack of credit.
> 
> With the deprecation of Hono Messaging, we are now maintaining the metrics in
> the protocol adapters directly. IMHO this is a good opportunity to think a little about
> the metrics we are maintaining as well.
> 
> Currently, we are record metrics for "processed", "discarded" and "undeliverable"
> messages. However, we have never clearly defined these terms. That was
> probably because the only place where it was relevant was Hono Messaging and
> the way it was implemented there served as the "definition".
> 
> As such, we currently use something like the following in Hono Messaging:
> 
> "processed": message from device complies with all requirements and has been
> successfully forwarded to the downstream consumer
> 
> "discarded": message has been sent pre-settled (by the adapter) and there is no
> credit available for the message to be forwarded. The message is then silently
> discarded, i.e. the sender is not informed about the failure to deliver the message.
> The device cannot distinguish this case from the "processed" case.
> 
> "undeliverable": message has been sent unsettled (by the adapter) and there is no
> credit available for the message to be forwarded. The message is then released
> and the adapter will signal the failure to deliver to the device (if the transport
> protocol allows to do so). The device may or may not be able to distinguish this
> case from the "processed" case.
> 
> The first metric clearly is of interest in order to see the current throughput of the
> system. The other two metrics, however, are harder to understand in the context of
> a particular protocol adapter because they require to understand how the transport
> protocol is mapped to AMQP 1.0. For example, a telemetry message that is
> published for a tenant using QoS 0 and for which no consumer is connected, will
> end up in the "discarded" metric whereas the same message published using QoS
> 1 would end up in the "undeliverable" metric, despite the fact that the reason for
> the failure to deliver is the same in both cases: no credit.
> 
> After some discussion about this with our operations team, it became clear, that
> from their perspective it is actually more interesting to get an indication of the
> reason for a problem in the metric itself. In particular, it is of interest to distinguish
> between cases where messages cannot be processed due to errors caused by the
> device, e.g. malformed headers, versus errors where a message cannot be
> processed due to problems in the back end infrastructure, e.g. a service not being
> available or the aforementioned lack of credit. In the former case we need to
> advise device developers how to fix the problem, in the latter case the ops team
> needs to get going themselves.
> 
> In addition to this coarse distinction, it is still helpful to know the ratio of credit used
> vs. credit available because this may serve as an indicator for scaling the
> infrastructure up or down.
> 
> I would therefore like to introduce additional (adapter specific) metrics that are
> better suited to cover these requirements. These metrics should be tagged with the
> protocol, host, tenant and message type (e.g. telemetry, event ...) if possible, e.g.
> a message might be unprocessable because it lacks tenant information. In such a
> case the problem could be recorded using the "UNKNOWN"
> tenant ...
> 
> "meter.hono.messages.processed" - message has been successfully processed.
> 
> "meter.hono.messages.unprocessable" - message cannot be processed because
> the message does not contain all required information, e.g. malformed topic name,
> missing header, not authorized etc. This metric is used by an adapter to record a
> message that it either discards silently or rejects (signaling the problem to the
> device). In no case will the message being processed.
> 
> "meter.hono.messages.undeliverable" -  message cannot be processed because of
> a problem not caused by the sender of the message (the device), e.g. Tenant
> service is not available, no credit available, etc. This metric is used by an adapter
> regardless of whether the transport protocol allows for signaling back the problem
> to the device or not. For instance, an MQTT message published using QoS 0
> doesn't allow to signal back the failure whereas HTTP allows to send back a status
> code in the HTTP response. In no case will the message being processed.
> 
> "counter|meter.hono.messages.capacity" - the number of credits remaining for
> sending messages. TODO determine if a counter or a meter is more reasonable to
> use.
> 
> 
> We could then deprecate the existing protocol adapter specific metric(s) and
> eventually remove them together with Hono Messaging.
> 
> 
> WDYT?
> 
> --
> Mit freundlichen Grüßen / Best regards
> 
> Kai Hudalla
> Chief Software Architect
> 
> Bosch Software Innovations GmbH
> Ullsteinstr. 128
> 12109 Berlin
> GERMANY
> www.bosch-si.com
> 
> Registered Office: Berlin, Registration Court: Amtsgericht Charlottenburg; HRB
> 148411 B
> Chairman of the Supervisory Board: Dr.-Ing. Thorsten Lücke; Managing Directors:
> Dr. Stefan Ferber, Michael Hahn
> _______________________________________________
> hono-dev mailing list
> hono-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this
> list, visit https://dev.eclipse.org/mailman/listinfo/hono-dev

Back to the top