[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [ecf-dev] service discovery working even if port mis configured
|
Hi Peter,
On 4/24/2014 7:48 AM, Peter Hermsdorf wrote:
Hi Scott,
I don't think I understand what exactly you mean by 'stopping the
host'. Do you mean just remote service unregistration?...or do you
mean unceremonious host shutdown (e.g. kill -9 ), or something in
between, or ?
<deleted>
I'm not sure. I think it hinges on what you want WRT the 'stopping
the host' and the 'restart leading to new bind event'.
short answer: in any case ;)
'Any case' would indeed be nice, but of course what we are talking about
is byzantine fault tolerance [1]...a very hard set of distributed
systems problems.
We have a RCP client using service(s) of a single server instance.
When that server goes down (software update, network problem, crash
etc) the client can continue to work (just can't use these services in
that time), but need a way to reconnect/rediscover the service when
the server is online again....(without Client restart)
in the end the client needs to get an unbind when the service is not
available and a bind when he is online again.
Ok I see.
I'm going to break this down a little bit...as it relates to remote
services...just to talk through the issues and choices that can be made
about discovery, distribution, and their combination for implementing
remote services. Please forgive if this seems a little long-winded,
but in truth there are no technical silver bullets here.
First...to get the client to 'unbind'...i.e. have the remote service
proxy go away when the underlying host crashes...or the network
partitions...it requires that the distribution provider do some failure
detection. The ECF generic provider does have/do this failure
detection, and so when the host goes down (e.g. crashes), the generic
provider will detect this, and the remote service proxy will be
unregistered/go away/unbound...as you've already found in your tests.
Note this is not necessarily true of all distribution providers and/or
implementations of OSGi remote services...for example if your
distribution provider is based upon connectionless http, then the http
server may go down, and if the client already has a working proxy then
it may not be able to know that the remote service host has
crashed/become unavailable. But again, the ECF generic provider does do
such failure detection, and so the proxy unregistration upon host crash
does occur.
Now...to get the client/service consumer to 'rebind' to the new
service...when the host recovers and it becomes available...means that
the new service instance metadata (edef) has to be communicated to the
consumer *at that time*...i.e. dynamically via some sort of network
discovery (zookeeper, etc) rather than an edef file. This is why you
are not seeing the rebind happen with the static (or template-based)
edef...because that's completely initiated by the consumer/client...and
doesn't happen when the host recovers and makes a new instance of the
remote service available.
In short, I think what you probably need is *both* a distribution
provider with failure detection (generic, r-osgi, jms), and to use some
network discovery provider (e.g. zookeeper, dnssd, jslp, zeroconf).
Then the distribution provider can detect the host failure...to unbind
the remote service proxy when a crash happens...and the network
discovery can communicate the host's making a new remote service
available...*after* it becomes available.
Given your initial explanation of the remote service metadata (changing
a few of the edef property values), I had thought that using edef or
edef templates would meet your use case. But it seems you have some
additional requirements that make the dynamic aspect of network
discovery necessary...as I've outlined.
Hopefully this discussion is helpful. I do wish that the
failure/reliability properties of remote services could be entirely
hidden...but there's lots of distributed systems work that shows such
network transparency is not really possible (or at least not well
advised). IMO, one thing that OSGi remote services uniquely
provide...that makes them very attractive for remote services in
general...is the ability to map network-based failure to the dynamics of
OSGi and OSGi services (i.e. the service instances naturally come and go
at runtime).
I hope that this explanation is somehow more clear ;)
I'll see you and raise you on that hope :).
Scott