Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
  • From: jerven Bolleman <jerven.bolleman@sib.swiss>
  • Date: Thu, 11 Feb 2021 20:42:57 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=sib.swiss; dmarc=pass action=none header.from=sib.swiss; dkim=pass header.d=sib.swiss; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1cEKzQWnu78BPQOpSW9xM+RpyvebYTJ3Dce/ltk+0t0=; b=k1bqTZTYABZ3mIAJayOwGnzxotp0X5wRNQeC610Y5h5wvUJPTv3Ix85O9pQRO2UQXcJUMnnHuUZgPD59JlkSZtHEDYoz6Xb0aUF3YlLMoaCA9fA7r399LSU7+dnkmrpsXdREwWGR07NfcAby6WhcZBjQl6yDNQAbocPgnjOK+mnFkQA9T3YpyJLAmuBg3cTgSsgA6meg+welBTgBEpYgEMw0lF/Oier/aSGzSuWNqNBWcBvaJeyF0MDuO12x+yfG81PceA5yNlwo6dQxjwWzNqPMeXZ8BKgiSXrxwe9zGOmaEe5uv0ObEnL7vTD18GDU1vGkjUX6f7TM+DgQ4qhOMA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NjGb0tLA2sdld7qCarr0dGZGGZH9ZmFg/NamQIcrkbVXv8DOK4lw+b2y6QLvWeS5Sb/dK9PDOKQlHaZqb8t8XmjOEmxsG66d3+LPlJPjSCBjXdE45w3+Hw6YRup1LgJmN7mLdKgAzlAepEtnmJSgR8+Ui74MDnqlIZALyMfqOI4Ee/3BAUPeA7UkM5Tj8Ig5u904YhBzO5feoupzy3O0VTRcGSO1KEZtOXZ2VgYPdU+wI1AZqcRHpBRfAhct00GnZplO1VtSffkH6/Dqd6/lAIbvih7n03Z1gTZeneohOyg83FLLU6kU8kJof7pHqMFu7xZQPBCjDMvjzhbtt6/wEg==
  • Delivered-to: rdf4j-dev@xxxxxxxxxxx
  • List-archive: <https://www.eclipse.org/mailman/private/rdf4j-dev/>
  • List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
  • List-subscribe: <https://www.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
  • List-unsubscribe: <https://www.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>
  • User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0

Hi Jeen, All,

Just a thought that I was having while writing this e-mail.
How about a virtual developer get together, sometime, including downstream projects/companies like graphdb and ontop.

Now back to 3.x vs 4.0

I am fine either way. For those who are getting paid, I suspect commercial planning is a major concern. Yet, I feel that in practice upgrading a JVM version is not a major stumbling block. Either everything get's updated more or less, or nothing get's updated at all. For what it is worth in our org we are all on java 11 for more than a year. Also a commercial party, can backport and maintain the 3 series even if most development happens on a cleaned up 4 branch.

For major changes:

There are two things I would like to see for 4.0.
For the getStatements method on a SailConnection, I would like to expand the acceptable types. This is to enable what is called "predicate" or "filter" pushdown in the literature.

Currently we accept either a null or a Value. What I would like to have is a Value or a "VariableDescription".

Imagine the following query:

SELECT ?p
WHERE {
   ?z ?y ?p . # iteration 1
   ?q ?p ?x .
}

From the query we could analyze that ?p must be an IRI.
Yet when we call the getStatements method we have no way to pass in that knowledge. Which means that each Literal in the store is pulled out from
storage and into memory to immediately afterwards being thrown out.

Having a "VariableDescription" instead of a null would allow the store to be smarter about where to start an iterator if things are in order.

SELECT ?p
WHERE {
   ?z ?y ?p . # iteration 1
   ?q ?p ?x .
   FILTER(?x > 10)
}

Here we could do a static analysis to attach to ?x that it must be greater than 10. A smarter getStatements method could start the iterator past value 10. etc.

I don't think that we would have many of these cases for the first step but the API change would be significant. Although could be done in a completely backwards compatible manner.

The second change is that I would like to see the getStatements return an. CloseableIteration<List<? extends Statement>, SailException> instead of the current one statement at a time iterator. Again doable with default methods. Being able to return blocks of values would really allow for some significant speed ups. Again easily done with a default method so it doesn't need a major version.

I have also been looking into parallelizing query execution without too much pain throughout the layers.

Basically, only parallelize the execution of the dominating join. Parallel execution, can really slow things down on small queries result sets. Just from the overhead of setting things up. Having, batches helps with that. e.g. if the first result part fits in the batch size don't multi-thread. etc.

In general I would love to sit down with someone and talk about what multi-threading is actually supported and used in the evaluation stack.

That said, I am still struggling to get the small join performance up with my existing code for GH-2741 :) and time is always in short supply.

Regards,
Jerven



On 11/02/2021 01:52, Jeen Broekstra wrote:
Hi folks,

Looking ahead after the 3.6 release, we have a decision to make: do we want to focus on getting a new major release out the door, or do we want to continue the 3.x series for a while longer?

The main justification for a major release are that we can do breaking changes: remove deprecated code, major refactors, or even deciding to bump our minimally-supported Java version (I must admit I'd _really_ like to bump to Java 11, but I understand the hesitation from some of our vendor partners).

Typically, to make the pain of upgrading for a major version easier to swallow for our users, we would accompany that with some massively useful improvements and new features as well.

So the question is: do we have enough material to justify making the next release a major one?

Here's what's currently planned for the 4.0.0 milestone:

https://github.com/eclipse/rdf4j/milestone/30 <https://github.com/eclipse/rdf4j/milestone/30>

Almost all of this is purely "housekeeping": removing obsolete code etc. Important, but not really particularly exciting to users. Perhaps the most interesting thing is the task to make each package contained in only a single module (to allow for use as part of a Java 9 modular architecture).

We have a substantial backlog of feature requests and bugs. On my radar as big/important new features for 2021 are the use of SHACL validation against a (remote) repository / SPARQL endpoint, and further upgrades and extensions of our RDF* support. I'm also thinking that planning this for a major release at the outset gives us some freedom: we don't have to jump through so many hoops to make sure everything remains backward compatible (though of course even with a major release, if we /can /keep it compatible, we probably should).

Where do you stand on this? What are your big 2021 must-haves for RDF4J (if any)?

As an aside: regardless of whether the next release after 3.6 is a major or a minor, we will need to go through a release review for this one, as it's been a year since the last review.

Jeen

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev


--
SIB logo
	*Jerven Tjalling Bolleman*
Principal Software Developer
*SIB | Swiss Institute of Bioinformatics*
1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
t +41 22 379 58 85
Jerven.Bolleman@sib.swiss - www.sib.swiss



Back to the top