Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] Backwards compatibility



On Sat, Apr 18, 2020 at 6:31 PM Håvard Ottestad <hmottestad@xxxxxxxxx> wrote:

I think we should go with keeping track of which interface changes we have made and remove default implementations when releasing major versions. If we split our methods into two categories: "nice to implement for performance" and "needs to be implemented for a feature" - then we can keep default implementations for performance methods but remove them for feature methods.

I think that makes sense, at the very least to keep track of. I suggest we use a single umbrella issue to keep track of this, planned against the next major release milestone. Let's put each default method in a bullet list in the description, classified as performance vs. feature, and also add the first release it was included in. Having this list will also be very useful in writing release notes for the major release.

I've started to change my mind about the java 9 module system. No one at conferences seem to be using the module system themselves. There is some support in maven and junit, but it still seems like a big hassel. Essentially most people are just ignoring it as they upgrade past Java 8.
 
Interesting point. I had actually started wondering about that myself - you don't really hear much from other projects that use it.

I believe that, beside possibly the module system (which as you say may not be as good an option as thought), the only things we can do are really making sure that we make things (package)private as much as possible, and only expose things publicly when we need to - but then also pay the price and accept that when something is public and we want to modify it, we need to do so in a way that doesn't break existing code. I really don't like this notion that we now have where we mark things that are public as "it's public but use at your own risk, we're not promising anything"  - in the case of an experimental new feature I think that's acceptable, but even then we shouldn't be too liberal with that.

One thing we could start doing as a general pattern is fewer separate (sub)packages. As an example, look at the MemoryStore SAIL implementation. Its main package is org.eclipse.rdf4j.sail.memory. But then its store-specific implementations of the Value classes (MemIRI, MemLiteral, etc) are in org.eclipse.rdf4.sail.memory.model. The only reason these classes are public is that they technically have to be, because we chose to put them in a separate package. But now we're stuck with them being considered part of the public API (even if it's highly unlikely that anyone directly uses them). In hindsight, I believe it would have been better to just have them in the main package, using package-private visibility. Packages will be bigger and potentially a bit less easier to find our way in for us as developers, but it's a lot clearer to users what they can and cannot reuse.

I realize that is not a solution in all cases, but at the end of the day, I think the only thing we can do is take it on the chin, and accept that it's our responsibility as maintainers to keep public things compatible, even if that inconveniences us in how we'd ideally like to improve things.

And the reason why i'm bringing it up:

 - estimates and actuals in query plans are using no-op default methods in interfaces
 - support for retrieving the query plan is also changing some interfaces, again that will require more no-op default methods

These are very much features that users would expect to use regardless of underlying database. If the other databases don't want to implement all of them, that's very understandable, but it should be their explicit choice to do so and not a default option that we have chosen already.

In an ideal world, yes. In reality however, vendors and users have their own sets of priorities, and we can't force them to implement a particular feature. I'm not really sure that removing default implementations solves anything in that respect, especially since all they'll have to do is just add their own default implementations. Then again if there is a point at which it is opportune to do so, it is certainly the major release.

On a more general note, some background about why we are so conservative about compatibility: RDF4J is a project that is now nearly 20 years old. That is a long stretch of time for any code base (even if we rewrote the whole thing from scratch in 2008). One of the main reasons for its continued success and uptake is, in my view, the very conservative approach we have taken with making breaking changes, and the minimalist design approach we have towards the core interfaces. Keeping the core lean and simple, and choosing to add additional functionality in the form of additional wrappers or utilities rather than as part of the core interfaces/class themselves offers huge benefits for vendors in how they wish to couple their code with the framework.

All I'm saying is that if we choose to put something new into the core interfaces, we should be aware that we are potentially inconveniencing lots of different stakeholders, and we have to carefully consider how we add it in a way that causes the fewest headaches for them. Even with our conservative approach, we're seeing some pretty high-profile projects that never upgrade to the newest RDF4J (as a case in point, Protege's SPARQL plugin is still using Sesame 2), just because they don't have the cycles to fix compatibility issues. We can't hold their hands to make them upgrade, but what we can do is our stinking best to make it as easy as possible.

And just to be clear, with the query plans feature I think we have now found a good balance. Whether or not we eventually decide to remove the default implementations in the next major release, at the very least keeping track of this gives us the option to do so down the road. 

Cheers,

Jeen

Back to the top