Hi guys,
In our client-server OSGI application we are using ECF
Zoodiscovery provider (v.1.0.100) for remote services discovery. When testing the
application resiliency, we noticed that when unplugging /plugging back the network
cable, the client in some cases doesn't get back the remote OSGI services from the
server.
I started debugging this use case and found out that in case
of session timeout both Zookeeper itself and Zoodiscovery try reconnecting simultaneously.
This results in a connect-disconnect-connect operation instead of just connect
and an inconsistent client state – connection finally gets re-established, but
the client doesn’t ask the server for the remote services.
I think that Zoodiscovery should not trigger disconnect/connect
in cases when Zookeeper does it on its own. But in this case we would need to somehow
differentiate the disconnect events, which doesn't seem to be possible at the
moment, since it comes from Zookeeper.
So, if anyone encountered this or similar issue or has any suggestions
for a possible solution or work-around – I’d appreciate your comments. I can
provide references to the blocks of code if need be.
Yuriy