Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0

From: "Jeen Broekstra" <jeen@xxxxxxxxxxxx>
Date: Fri, 12 Feb 2021 09:05:45 +1100
Delivered-to: rdf4j-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/rdf4j-dev/>
List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Cyrus-JMAP/3.5.0-alpha0-141-gf094924a34-fm-20210210.001-gf094924a

On Fri, 12 Feb 2021, at 06:42, jerven Bolleman wrote:

Hi Jeen, All,

Just a thought that I was having while writing this e-mail.
How about a virtual developer get together, sometime, including
downstream projects/companies like graphdb and ontop.

Good idea. I'll see about organizing something along those lines. Me being in Oz makes finding a decent time slot a little tricky, but we can figure something out I'm sure.

Now back to 3.x vs 4.0

I am fine either way. For those who are getting paid, I suspect
commercial planning is a major concern. Yet, I feel that in practice
upgrading a JVM version is not a major stumbling block. Either
everything get's updated more or less, or nothing get's updated at all.
For what it is worth in our org we are all on java 11 for more than a
year. Also a commercial party, can backport and maintain the 3 series
even if most development happens on a cleaned up 4 branch.

We had a conversation about this about a year ago, when several people voiced some concern. We're a year on though, perhaps the situation's changed. See https://github.com/eclipse/rdf4j/issues/2046 .

For major changes:

There are two things I would like to see for 4.0.
For the getStatements method on a SailConnection, I would like to expand
the acceptable types. This is to enable what is called "predicate" or
"filter" pushdown in the literature.

Currently we accept either a null or a Value. What I would like to have
is a Value or a "VariableDescription".

Imagine the following query:

SELECT ?p
WHERE {
?z ?y ?p . # iteration 1
?q ?p ?x .
}

From the query we could analyze that ?p must be an IRI.
Yet when we call the getStatements method we have no way to pass in that
knowledge. Which means that each Literal in the store is pulled out from
storage and into memory to immediately afterwards being thrown out.

It's an interesting idea. This is the kind of thing where I'd really appreciate Ontotext's (and other Sail implementors, like e.g. the Halyard people) input as well.

Could you raise a ticket for this idea, so we don't lose track of it, and keep in-depth conversations on the topic trackable?

[snip]

I don't think that we would have many of these cases for the first step
but the API change would be significant. Although could be done in a
completely backwards compatible manner.

You'd have to introduce a new superclass or interface that covers both Values and "VariableDescription" though, so it would not be binary compatible. But yeah, we can make it relatively easy.

The second change is that I would like to see the getStatements return
an. CloseableIteration<List<? extends Statement>, SailException> instead
of the current one statement at a time iterator. Again doable with
default methods. Being able to return blocks of values would really
allow for some significant speed ups. Again easily done with a default
method so it doesn't need a major version.

Can I ask, with this, and also the other ideas (paralellizing query execution): do you have a specific triplestore implementation in mind that would make use of this? I mean for example our own NativeStore is not really set up to leverage these kinds of optimizations I think.

Cheers,

Jeen

Follow-Ups:
- [rdf4j-dev] Side notes -> Re: planning ahead: 3.7 vs 4.0
  - From: jerven Bolleman

References:
- [rdf4j-dev] planning ahead: 3.7 vs 4.0
  - From: Jeen Broekstra
- Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
  - From: jerven Bolleman

Prev by Date: Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
Next by Date: [rdf4j-dev] Side notes -> Re: planning ahead: 3.7 vs 4.0
Previous by thread: Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
Next by thread: [rdf4j-dev] Side notes -> Re: planning ahead: 3.7 vs 4.0
Index(es):
- Date
- Thread

Breadcrumbs