Hi guys,
In our client-server OSGI application we
are using ECF
Zoodiscovery provider (v.1.0.100) for remote services
discovery. When testing the
application resiliency, we noticed that when unplugging
/plugging back the network
cable, the client in some cases doesn't get back the remote
OSGI services from the
server.
I started debugging this use case and found
out that in case
of session timeout both Zookeeper itself and Zoodiscovery try
reconnecting simultaneously.
This results in a connect-disconnect-connect operation instead
of just connect
and an inconsistent client state – connection finally gets
re-established, but
the client doesn’t ask the server for the remote services.
I think that Zoodiscovery should not
trigger disconnect/connect
in cases when Zookeeper does it on its own. But in this case
we would need to somehow
differentiate the disconnect events, which doesn't seem to be
possible at the
moment, since it comes from Zookeeper.
So, if anyone encountered this or similar
issue or has any suggestions
for a possible solution or work-around – I’d appreciate your
comments. I can
provide references to the blocks of code if need be.
Yuriy