Hi Christoph,
Sorry for jumping in the middle of your discussion with Wim. I'm
just starting my day here though and I hope to contribute to this.
On 2/1/2016 7:42 AM, Keimel, Christoph wrote:
Hi Wim,
Lets’ see if I get this correctly:
Let’s say that A wants to know when a touch
sensor on B is getting pressed (true/false). What I am doing
right now is: A puts up a whiteboard service
(TouchSensorSniffer). This service is picked up by B using a
ServiceTracker [1]. B then holds on to the service and calls
TouchSensorSniffer#onStateChanged whenever the touch sensor
state changes. (Of course I also clear my internal cache
when the service gets removed.)
I’ll use this simple setup to describe my
situation: After both A and B are started everything is fine
and B has discovered the TouchSensorSniffer from A. Now I
disconnect B from the network by pulling the LAN cable. Both
A and B continue to run.
Yes they continue to run, but one question is: On B (svc consumer)
does the remote service proxy get unregistered after 30s/keepAlive
timeout? If using service tracker, this should result in the
removeService method being called. It won't happen immediately
(since the default keepAlive is 30s), but it should happen. This is
because the generic provider has failure detection.
If the state of the touch sensor changes at
this moment B would try to send this information over the
TouchSensorSniffer to A. But since B is disconnected from
the network, this request fails after the timeout. B thinks
this is a temporary error and just logs it.
B should probably do something other than just log this as a
temporary error.
If I reconnect the LAN cable after a couple of
seconds and the press my touch sensor again, B will again
use the TouchSensorSniffer service to send the state change.
This time everything works out because the network is back
up: Cool. But let’s assume I don’t reconnect right away but
I wait until the keepalive period (default 30 seconds) is
over. What happens now is that the TouchSensorSniffer is
unregistered in B which is ok, since we assume that the
connection is gone for good.
Right...this is referred to as 'fail stop'. One has to assume that
the connection is gone for good, because it may actually be gone for
good :).
If I touch the sensor now B sees that no
TouchSensorSniffer services are registered and therefore
doesn’t send this information anywhere. Also good. Now,
after 60 seconds, I reconnect the LAN cable. Both A and B
are still running but B doesn’t pick up on the
TouchSensorSniffer from A. They stay disconnected.
Right.
This last part is based on my observations, so
I’m not sure I understand this completely. Does my
description come close to the truth and is this the result
that is to be expected?
Yes, I think so.
Or would you expect the discovery on B to find
the TouchSensorSniffer from A again after the network
connection has been reestablished?
This is where the specifics of the discovery provider interact with
the specifics of the distribution provider. Wim is the expert on
zookeeper, but just because the network connection is reestablished
I don't believe that will trigger a rediscovery of a previously
discovered service.
Or is the problem that I am holding on to an
instance of TouchSensorSniffer on B?
I think that holding onto the instance of TouchSensorSniffer on B is
essentially assuming that this existing connection will be
reestablished *within 30s*, and I think that this is probably not a
reasonable assumption for your problematic network.
I could stop using a ServiceTracker and look
into the OSGi service registry directly to search for all
implementations of TouchSensorSniffer anytime the state
changes via BundleContext#getServiceReferences. I see that
this would change to situation slightly, because I would use
BundleContext#ungetService right after sending the
information and then getting the service again for the next
event. But I am not sure that this would change the basic
situation, since the registry itself is already caching the
available remote services. Or am I wrong about this?
The service registry is holding onto the remote service proxy's
ServiceReference, but this proxy will be unregistered when/if the
remote service is unregistered via the failure
detection/keepAlive/timeout (30s by default). This unregistration
of the proxy should result in removeService (ServiceTracker) and
unbind for DS. Basically you need something to notify your code
when the proxy becomes unregistered so that you can give up/stop
using the TouchSensorSniffer on B.
Now, one question is: once detected, what should B do to recover
from a network failure? This can be a difficult question to answer
in general, because the failure could be permanent (so no use
retrying), or it could be very short and would/will heal very
quickly. Predicting the future is difficult :).
There are mechanisms to deal with these problems. One is
extending/customizing the OSGi Topology Manager, which would allow
implementing some recovery strategy for a service that has gone away
(e.g. import retry). Also there are/is some tuning of the ECF
generic provider failure detection that can be done. Finally, the
ECF generic provider (and others...like the JMS provider) also have
some notion of communication groups and group membership, and so
this can be used to associate remote services with each other.
Scott
|