Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [hono-dev] Metrics for message processing

> If we also want to account for the adapters' ability to interact with other services
> then we end up with
> 
> hono.messages.processed (received from device and successfully forwarded)
> hono.messages.unprocessable (received from device but something is wrong with
> the data) 
> hono.messages.undeliverable (successfully received and processed from
> device but could not be forwarded) 
> hono.messages.capacity (link credits between
> adapter and AMQP network) 
> hono.tenant.capacity (link credits between adapter
> and Tenant service) 
> hono.credentials.capacity (link credits between adapter and
> Credentials service) 
> hono.registration.capacity (link credits between adapter and
> Device Registration
> service)
> 
> with tags for host, tenant, type, protocol


To have a separate metric for the capacity of other services addresses my proposal well.
By this way it can be seen as a first hint if some service might need to be scaled (in a first step it may just
mean to increase the maximum credits this service gives to the adapter, it does not necessarily mean to scale
the number of instances of this service. Possibly the number of credits were just too limited?).


> 
> Can we agree on these metrics?

So I agree with these metrics.


Mit freundlichen Grüßen / Best regards

 Karsten Frank

(INST/ECS4) 
Bosch Software Innovations GmbH | Ullsteinstr. 128 | 12109 Berlin | GERMANY | www.bosch-si.com


Sitz: Berlin, Registergericht: Amtsgericht Charlottenburg; HRB 148411 B 
Aufsichtsratsvorsitzender: Dr.-Ing. Thorsten Lücke; Geschäftsführung: Dr. Stefan Ferber, Michael Hahn 




> -----Original Message-----
> From: hono-dev-bounces@xxxxxxxxxxx <hono-dev-bounces@xxxxxxxxxxx> On
> Behalf Of Hudalla Kai (INST/ECS4)
> Sent: Mittwoch, 22. August 2018 15:27
> To: hono-dev@xxxxxxxxxxx
> Subject: Re: [hono-dev] Metrics for message processing
> 
> On Wed, 2018-08-22 at 09:58 +0200, Jens Reimann wrote:
> >
> >
> > On Tue, Aug 21, 2018 at 6:01 PM, Marc Pellmann <pellmann@xxxxxxxxx>
> wrote:
> > > Hi Kai,
> > >
> > > adapting the messaging metrics to our new setup, focused on adapters
> > > and modifying them according to feedback from ops colleagues is a good thing!
> > >
> > > We should also remove the meter/counter etc. in the name of the
> > > metric. This was there to allow Spring Boot metrics without a
> > > dependency to a specific library. It seems that Spring boot has
> > > given up this approach with micrometer and it doesn't make that much sense
> either.
> > >
> >
> > I would really appreciate dropping the meter/counter prefix.
> >
> > > According to [1], the naming of metrics and tags are a good match.
> > > So to sum it up we have
> > >
> > > hono.messages.processed (received from device and successfully
> > > forwarded) hono.messages.unprocessable (received from device but
> > > something is wrong with the data) hono.messages.undeliverable
> > > (successfully received and processed from device but could not be
> > > forwarded) hono.messages.capacity (link credits between adapter and
> > > AMQP network)
> > >
> > > with tags for host, tenant, type, protocol
> > >
> > > [1] https://micrometer.io/docs/concepts#_naming_meters
> > >
> 
> If we also want to account for the adapters' ability to interact with other services
> then we end up with
> 
> hono.messages.processed (received from device and successfully forwarded)
> hono.messages.unprocessable (received from device but something is wrong with
> the data) hono.messages.undeliverable (successfully received and processed from
> device but could not be forwarded) hono.messages.capacity (link credits between
> adapter and AMQP network) hono.tenant.capacity (link credits between adapter
> and Tenant service) hono.credentials.capacity (link credits between adapter and
> Credentials service) hono.registration.capacity (link credits between adapter and
> Device Registration
> service)
> 
> with tags for host, tenant, type, protocol
> 
> I also want to point out that we are discussing the names of the metrics we use
> with Micrometer.
> 
> Can we agree on these metrics?
> 
> > > Marc
> > >
> > >
> > > On Mon, Aug 20, 2018 at 5:10 PM Hudalla Kai (INST/ECS4)
> > > <kai.hudalla@bosch-si .com> wrote:
> > > > Hi list,
> > > >
> > > > I am currently thinking about the metrics that we maintain for the
> > > > messages we process. In the original design we had Hono Messaging
> > > > as the central component that all protocol adapters had been
> > > > connected to and which all downstream messages had to be sent to
> > > > from the adapters. It therefore felt like the right place to
> > > > implement the messaging metrics in and e.g. count the number of
> > > > messages that have been forwarded successfully vs. the messages
> > > > that had to be discarded due to a lack of credit.
> > > >
> > > > With the deprecation of Hono Messaging, we are now maintaining the
> > > > metrics in the protocol adapters directly. IMHO this is a good
> > > > opportunity to think a little about the metrics we are maintaining
> > > > as well.
> > > >
> > > > Currently, we are record metrics for "processed", "discarded" and
> > > > "undeliverable"
> > > > messages. However, we have never clearly defined these terms. That
> > > > was probably because the only place where it was relevant was Hono
> > > > Messaging and the way it was implemented there served as the
> > > > "definition".
> > > >
> > > > As such, we currently use something like the following in Hono Messaging:
> > > >
> > > > "processed": message from device complies with all requirements
> > > > and has been successfully forwarded to the downstream consumer
> > > >
> > > > "discarded": message has been sent pre-settled (by the adapter)
> > > > and there is no credit available for the message to be forwarded.
> > > > The message is then silently discarded, i.e. the sender is not
> > > > informed about the failure to deliver the message. The device
> > > > cannot distinguish this case from the "processed" case.
> > > >
> > > > "undeliverable": message has been sent unsettled (by the adapter)
> > > > and there is no credit available for the message to be forwarded.
> > > > The message is then released and the adapter will signal the
> > > > failure to deliver to the device (if the transport protocol allows
> > > > to do so). The device may or may not be able to distinguish this
> > > > case from the "processed" case.
> > > >
> > > > The first metric clearly is of interest in order to see the
> > > > current throughput of the system. The other two metrics, however,
> > > > are harder to understand in the context of a particular protocol
> > > > adapter because they require to understand how the transport
> > > > protocol is mapped to AMQP 1.0. For example, a telemetry message
> > > > that is published for a tenant using QoS 0 and for which no
> > > > consumer is connected, will end up in the "discarded" metric
> > > > whereas the same message published using QoS 1 would end up in the
> > > > "undeliverable" metric, despite the fact that the reason for the
> > > > failure to deliver is the same in both cases:
> > > > no
> > > > credit.
> > > >
> > > > After some discussion about this with our operations team, it
> > > > became clear, that from their perspective it is actually more
> > > > interesting to get an indication of the reason for a problem in
> > > > the metric itself. In particular, it is of interest to distinguish
> > > > between cases where messages cannot be processed due to errors
> > > > caused by the device, e.g. malformed headers, versus errors where
> > > > a message cannot be processed due to problems in the back end
> > > > infrastructure, e.g. a service not being available or the
> > > > aforementioned lack of credit. In the former case we need to
> > > > advise device developers how to fix the problem, in the latter
> > > > case the ops team needs to get going themselves.
> > > >
> > > > In addition to this coarse distinction, it is still helpful to
> > > > know the ratio of credit used vs. credit available because this
> > > > may serve as an indicator for scaling the infrastructure up or
> > > > down.
> > > >
> > > > I would therefore like to introduce additional (adapter specific)
> > > > metrics that are better suited to cover these requirements. These
> > > > metrics should be tagged with the protocol, host, tenant and
> > > > message type (e.g. telemetry, event
> > > > ...) if
> > > > possible, e.g. a message might be unprocessable because it lacks
> > > > tenant information. In such a case the problem could be recorded
> > > > using the "UNKNOWN"
> > > > tenant ...
> > > >
> > > > "meter.hono.messages.processed" - message has been successfully
> processed.
> > > >
> > > > "meter.hono.messages.unprocessable" - message cannot be processed
> > > > because the message does not contain all required information,
> > > > e.g. malformed topic name, missing header, not authorized etc.
> > > > This metric is used by an adapter to record a message that it
> > > > either discards silently or rejects (signaling the problem to the
> > > > device). In no case will the message being processed.
> > > >
> > > > "meter.hono.messages.undeliverable" -  message cannot be processed
> > > > because of a problem not caused by the sender of the message (the
> > > > device), e.g. Tenant service is not available, no credit
> > > > available, etc. This metric is used by an adapter regardless of
> > > > whether the transport protocol allows for signaling back the
> > > > problem to the device or not. For instance, an MQTT message
> > > > published using QoS 0 doesn't allow to signal back the failure
> > > > whereas HTTP allows to send back a status code in the HTTP
> > > > response. In no case will the message being processed.
> > > >
> > > > "counter|meter.hono.messages.capacity" - the number of credits
> > > > remaining for sending messages. TODO determine if a counter or a
> > > > meter is more reasonable to use.
> > > >
> > > >
> > > > We could then deprecate the existing protocol adapter specific
> > > > metric(s) and eventually remove them together with Hono Messaging.
> > > >
> > > >
> > > > WDYT?
> > > >
> > > > --
> > > > Mit freundlichen Grüßen / Best regards
> > > >
> > > > Kai Hudalla
> > > > Chief Software Architect
> > > >
> > > > Bosch Software Innovations GmbH
> > > > Ullsteinstr. 128
> > > > 12109 Berlin
> > > > GERMANY
> > > > www.bosch-si.com
> > > >
> > > > Registered Office: Berlin, Registration Court: Amtsgericht
> > > > Charlottenburg; HRB
> > > > 148411 B
> > > > Chairman of the Supervisory Board: Dr.-Ing. Thorsten Lücke;
> > > > Managing
> > > > Directors:
> > > > Dr. Stefan Ferber, Michael Hahn
> > > > _______________________________________________
> > > > hono-dev mailing list
> > > > hono-dev@xxxxxxxxxxx
> > > > To change your delivery options, retrieve your password, or
> > > > unsubscribe from this list, visit
> > > > https://dev.eclipse.org/mailman/listinfo/hono-dev
> > >
> > > _______________________________________________
> > > hono-dev mailing list
> > > hono-dev@xxxxxxxxxxx
> > > To change your delivery options, retrieve your password, or
> > > unsubscribe from this list, visit
> > > https://dev.eclipse.org/mailman/listinfo/hono-dev
> > >
> >
> >
> >
> > _______________________________________________
> > hono-dev mailing list
> > hono-dev@xxxxxxxxxxx
> > To change your delivery options, retrieve your password, or
> > unsubscribe from this list, visit
> > https://dev.eclipse.org/mailman/listinfo/hono-dev
> _______________________________________________
> hono-dev mailing list
> hono-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this
> list, visit https://dev.eclipse.org/mailman/listinfo/hono-dev

Back to the top