Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0

On Fri, 12 Feb 2021, at 06:42, jerven Bolleman wrote:

Hi Jeen, All,

Just a thought that I was having while writing this e-mail.
How about a virtual developer get together, sometime, including 
downstream projects/companies like graphdb and ontop.

Good idea. I'll see about organizing something along those lines. Me being in Oz makes finding a decent time slot a little tricky, but we can figure something out I'm sure.


Now back to 3.x vs 4.0

I am fine either way. For those who are getting paid, I suspect 
commercial planning is a major concern. Yet, I feel that in practice
upgrading a JVM version is not a major stumbling block. Either 
everything get's updated more or less, or nothing get's updated at all.
For what it is worth in our org we are all on java 11 for more than a 
year. Also a commercial party, can backport and maintain the 3 series 
even if most development happens on a cleaned up 4 branch.

We had a conversation about this about a year ago, when several people voiced some concern. We're a year on though, perhaps the situation's changed. See https://github.com/eclipse/rdf4j/issues/2046 . 


For major changes:

There are two things I would like to see for 4.0.
For the getStatements method on a SailConnection, I would like to expand 
the acceptable types. This is to enable what is called "predicate" or 
"filter" pushdown in the literature.

Currently we accept either a null or a Value. What I would like to have 
is a Value or a "VariableDescription".

Imagine the following query:

SELECT ?p
WHERE {
    ?z ?y ?p . # iteration 1
    ?q ?p ?x .
}

From the query we could analyze that ?p must be an IRI.
Yet when we call the getStatements method we have no way to pass in that 
knowledge. Which means that each Literal in the store is pulled out from
storage and into memory to immediately afterwards being thrown out.

It's an interesting idea. This is the kind of thing  where I'd really appreciate Ontotext's (and other Sail implementors, like e.g. the Halyard people) input as well.

Could you raise a ticket for this idea, so we don't lose track of it, and keep in-depth conversations on the topic trackable?

[snip]

I don't think that we would have many of these cases for the first step 
but the API change would be significant. Although could be done in a 
completely backwards compatible manner.

You'd have to introduce a new superclass or interface that covers both Values and "VariableDescription" though, so it would not be binary compatible. But yeah, we can make it relatively easy. 

The second change is that I would like to see the getStatements return 
an. CloseableIteration<List<? extends Statement>, SailException> instead 
of the current one statement at a time iterator. Again doable with 
default methods. Being able to return blocks of values would really 
allow for some significant speed ups. Again easily done with a default 
method so it doesn't need a major version.

Can I ask, with this, and also the other ideas (paralellizing query execution): do you have a specific triplestore implementation in mind that would make use of this? I mean for example our own NativeStore is not really set up to leverage these kinds of optimizations I think.

Cheers,

Jeen

Back to the top