Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0

From: "Jeen Broekstra" <jeen@xxxxxxxxxxxx>
Date: Fri, 12 Feb 2021 11:17:09 +1100
Delivered-to: rdf4j-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/rdf4j-dev/>
List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Cyrus-JMAP/3.5.0-alpha0-141-gf094924a34-fm-20210210.001-gf094924a

On Fri, 12 Feb 2021, at 10:37, Håvard Ottestad wrote:

Hi Jeen,

Things like SailSink don’t support inferred statements. You need one sink for explicit statements and one for inferred. This is the PR we have where we’ve discussed the implications of this. https://github.com/eclipse/rdf4j/pull/1937

Ah right - I hadn't made the connection but I remember now. I also remember that I did a quick spike on this and that it was a _lot_ of work. But perhaps now's the time to revisit this.

For the binding set it’s the hash based one and the work that Jerven has done to make it faster here: https://github.com/eclipse/rdf4j/pull/2742

I see - Jerven's work there in general, if we can get it to a merge-ready state, would be really good to have in a next release, I agree.

As for the ShaclSail and an architecture meeting, that might be possible closer to summer. Maybe June. Having said that I think the AST is quite straightforward as a representation of the SHACL structure in Java. The SHACL to SPARQL approach essentially turns into a “compiler problem” of figuring out how to generate the correct SPARQL from the AST (and to figure out what the SPARQL should be). On the other hand I can see how the transactional validation aspect is hard to grasp, I’m struggling with it myself now while trying to simplify the use of DISTINCT operators in order to fix the issue I’ve dubbed “compressed target chains”.

Just to be clear I'm not suggesting that your code is hard to grasp (or harder to grasp than any other code for complex problems). However, diving into someone else's code is always a significant time and effort investment, no matter how clear and transparent it is. Having a broad picture before you dive in is a tremendous time saver.

What I'm looking for is a slightly higher-level description of the main architectural ideas behind the code: for example how do you get from SHACL shapes in the database to an AST. How do you then get from an AST to operations on the database. Why do we even have an AST for SHACL (and not just a direct conversion to SPARQL). Why does the ShaclSail juggle two or three different SailConnections internally. And so on. Doesn't have to be a meeting by the way, if that's hard to fit in, I'd be equally grateful for a quick writeup.

Jerven also brought up multi threading and I would like to add to that. 1. when project loom becomes available it will revolutionize how we can use multithreading.

I know very little about project Loom to be honest, and it seems a bit far away for us to worry about now. We're still on Java 8 and in agony over maybe bumping to 11.

2. we had some discussions over on the google group because a user was asking about our thoughts on things like RXjava. https://groups.google.com/g/rdf4j-users/c/6BpFD8qfbbE

This came up in a conversation I had not too long ago myself as well - not RxJava specifically but the notion of reactive programming (in particular push-based parsers, but I guess the concept applies more broadly). These kinds of things, like Loom-enhanced multithreading, sound further away though, and not necessarily something we could fit in a release this year. But then again, if we can find people who are wiling to contribute and follow through, anything is possible of course :)

Jeen

Håvard

On 11 Feb 2021, at 23:56, Jeen Broekstra <jeen@xxxxxxxxxxxx> wrote:

On Fri, 12 Feb 2021, at 00:38, Håvard Ottestad wrote:

Hi Jeen,

Performance improvements for BindingSet. We’ve seen a 2x improvement during early testing. So that would be a great reason for users to upgrade.

Sounds good - which BindingSet are we talking about though? There's multiple implementations.

I’m with you on the SHACL validation of remote endpoints. That would be great to have. Also 100% SHACL compliance.

Currently, you're clearly the go-to SHACL expert - you're pretty much the only one of us who really understands the internal mechanics of the ShaclSail (both old and revised implementation). It would be good if we could somehow widen that, so that I and others can contribute more usefully to implementing some of this stuff, and it's not all on your shoulders.

A meeting with a few people where you can talk us through the architecture and design choices, and we can discuss approaches for further extensions/improvements, would be great. Would you be wiliing to organize something along those lines, with me, and perhaps Damyan or one of the other Ontotext devs if they're interested? Happy to help sort the logistics if you want.

Rewrite of the Sail structures to support inferred triples all the way down so we can finally fix that transaction isolation issue that’s been hanging around for forever.

I am aware of the transaction isolation issue, and getting to the bottom of it would be very useful. I'm not sure what you mean with "supporting inferred triples all the way down" though - don't we do that already? And how is it related to this issue (sorry it's been a while since I looked at the problem in depth). Anyway it's a good idea to give this some focus/priority, perhaps we should do more in-depth discussion of possible solutions on the relevant ticket.

Cheers,

Jeen

Håvard

On 11 Feb 2021, at 01:53, Jeen Broekstra <jeen@xxxxxxxxxxxx> wrote:

Hi folks,

Looking ahead after the 3.6 release, we have a decision to make: do we want to focus on getting a new major release out the door, or do we want to continue the 3.x series for a while longer?

The main justification for a major release are that we can do breaking changes: remove deprecated code, major refactors, or even deciding to bump our minimally-supported Java version (I must admit I'd _really_ like to bump to Java 11, but I understand the hesitation from some of our vendor partners).

Typically, to make the pain of upgrading for a major version easier to swallow for our users, we would accompany that with some massively useful improvements and new features as well.

So the question is: do we have enough material to justify making the next release a major one?

Here's what's currently planned for the 4.0.0 milestone:

https://github.com/eclipse/rdf4j/milestone/30

Almost all of this is purely "housekeeping": removing obsolete code etc. Important, but not really particularly exciting to users. Perhaps the most interesting thing is the task to make each package contained in only a single module (to allow for use as part of a Java 9 modular architecture).

We have a substantial backlog of feature requests and bugs. On my radar as big/important new features for 2021 are the use of SHACL validation against a (remote) repository / SPARQL endpoint, and further upgrades and extensions of our RDF* support. I'm also thinking that planning this for a major release at the outset gives us some freedom: we don't have to jump through so many hoops to make sure everything remains backward compatible (though of course even with a major release, if we can keep it compatible, we probably should).

Where do you stand on this? What are your big 2021 must-haves for RDF4J (if any)?

As an aside: regardless of whether the next release after 3.6 is a major or a minor, we will need to go through a release review for this one, as it's been a year since the last review.

Jeen
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev

References:
- Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
  - From: Jeen Broekstra
- Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
  - From: Håvard Ottestad

Prev by Date: Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
Next by Date: [rdf4j-dev] 3.6.0
Previous by thread: Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
Next by thread: Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
Index(es):
- Date
- Thread

Breadcrumbs