[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [rdf4j-dev] planning ahead: 3.7 vs 4.0
|
- From: jerven Bolleman <jerven.bolleman@sib.swiss>
- Date: Thu, 11 Feb 2021 20:42:57 +0100
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=sib.swiss; dmarc=pass action=none header.from=sib.swiss; dkim=pass header.d=sib.swiss; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1cEKzQWnu78BPQOpSW9xM+RpyvebYTJ3Dce/ltk+0t0=; b=k1bqTZTYABZ3mIAJayOwGnzxotp0X5wRNQeC610Y5h5wvUJPTv3Ix85O9pQRO2UQXcJUMnnHuUZgPD59JlkSZtHEDYoz6Xb0aUF3YlLMoaCA9fA7r399LSU7+dnkmrpsXdREwWGR07NfcAby6WhcZBjQl6yDNQAbocPgnjOK+mnFkQA9T3YpyJLAmuBg3cTgSsgA6meg+welBTgBEpYgEMw0lF/Oier/aSGzSuWNqNBWcBvaJeyF0MDuO12x+yfG81PceA5yNlwo6dQxjwWzNqPMeXZ8BKgiSXrxwe9zGOmaEe5uv0ObEnL7vTD18GDU1vGkjUX6f7TM+DgQ4qhOMA==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NjGb0tLA2sdld7qCarr0dGZGGZH9ZmFg/NamQIcrkbVXv8DOK4lw+b2y6QLvWeS5Sb/dK9PDOKQlHaZqb8t8XmjOEmxsG66d3+LPlJPjSCBjXdE45w3+Hw6YRup1LgJmN7mLdKgAzlAepEtnmJSgR8+Ui74MDnqlIZALyMfqOI4Ee/3BAUPeA7UkM5Tj8Ig5u904YhBzO5feoupzy3O0VTRcGSO1KEZtOXZ2VgYPdU+wI1AZqcRHpBRfAhct00GnZplO1VtSffkH6/Dqd6/lAIbvih7n03Z1gTZeneohOyg83FLLU6kU8kJof7pHqMFu7xZQPBCjDMvjzhbtt6/wEg==
- Delivered-to: rdf4j-dev@xxxxxxxxxxx
- List-archive: <https://www.eclipse.org/mailman/private/rdf4j-dev/>
- List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
- List-subscribe: <https://www.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
- List-unsubscribe: <https://www.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>
- User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0
Hi Jeen, All,
Just a thought that I was having while writing this e-mail.
How about a virtual developer get together, sometime, including
downstream projects/companies like graphdb and ontop.
Now back to 3.x vs 4.0
I am fine either way. For those who are getting paid, I suspect
commercial planning is a major concern. Yet, I feel that in practice
upgrading a JVM version is not a major stumbling block. Either
everything get's updated more or less, or nothing get's updated at all.
For what it is worth in our org we are all on java 11 for more than a
year. Also a commercial party, can backport and maintain the 3 series
even if most development happens on a cleaned up 4 branch.
For major changes:
There are two things I would like to see for 4.0.
For the getStatements method on a SailConnection, I would like to expand
the acceptable types. This is to enable what is called "predicate" or
"filter" pushdown in the literature.
Currently we accept either a null or a Value. What I would like to have
is a Value or a "VariableDescription".
Imagine the following query:
SELECT ?p
WHERE {
?z ?y ?p . # iteration 1
?q ?p ?x .
}
From the query we could analyze that ?p must be an IRI.
Yet when we call the getStatements method we have no way to pass in that
knowledge. Which means that each Literal in the store is pulled out from
storage and into memory to immediately afterwards being thrown out.
Having a "VariableDescription" instead of a null would allow the store
to be smarter about where to start an iterator if things are in order.
SELECT ?p
WHERE {
?z ?y ?p . # iteration 1
?q ?p ?x .
FILTER(?x > 10)
}
Here we could do a static analysis to attach to ?x that it must be
greater than 10. A smarter getStatements method could start the iterator
past value 10. etc.
I don't think that we would have many of these cases for the first step
but the API change would be significant. Although could be done in a
completely backwards compatible manner.
The second change is that I would like to see the getStatements return
an. CloseableIteration<List<? extends Statement>, SailException> instead
of the current one statement at a time iterator. Again doable with
default methods. Being able to return blocks of values would really
allow for some significant speed ups. Again easily done with a default
method so it doesn't need a major version.
I have also been looking into parallelizing query execution without too
much pain throughout the layers.
Basically, only parallelize the execution of the dominating join.
Parallel execution, can really slow things down on small queries result
sets. Just from the overhead of setting things up. Having, batches helps
with that. e.g. if the first result part fits in the batch size don't
multi-thread. etc.
In general I would love to sit down with someone and talk about what
multi-threading is actually supported and used in the evaluation stack.
That said, I am still struggling to get the small join performance up
with my existing code for GH-2741 :) and time is always in short supply.
Regards,
Jerven
On 11/02/2021 01:52, Jeen Broekstra wrote:
Hi folks,
Looking ahead after the 3.6 release, we have a decision to make: do we
want to focus on getting a new major release out the door, or do we want
to continue the 3.x series for a while longer?
The main justification for a major release are that we can do breaking
changes: remove deprecated code, major refactors, or even deciding to
bump our minimally-supported Java version (I must admit I'd _really_
like to bump to Java 11, but I understand the hesitation from some of
our vendor partners).
Typically, to make the pain of upgrading for a major version easier to
swallow for our users, we would accompany that with some massively
useful improvements and new features as well.
So the question is: do we have enough material to justify making the
next release a major one?
Here's what's currently planned for the 4.0.0 milestone:
https://github.com/eclipse/rdf4j/milestone/30
<https://github.com/eclipse/rdf4j/milestone/30>
Almost all of this is purely "housekeeping": removing obsolete code etc.
Important, but not really particularly exciting to users. Perhaps the
most interesting thing is the task to make each package contained in
only a single module (to allow for use as part of a Java 9 modular
architecture).
We have a substantial backlog of feature requests and bugs. On my radar
as big/important new features for 2021 are the use of SHACL validation
against a (remote) repository / SPARQL endpoint, and further upgrades
and extensions of our RDF* support. I'm also thinking that planning this
for a major release at the outset gives us some freedom: we don't have
to jump through so many hoops to make sure everything remains backward
compatible (though of course even with a major release, if we /can /keep
it compatible, we probably should).
Where do you stand on this? What are your big 2021 must-haves for RDF4J
(if any)?
As an aside: regardless of whether the next release after 3.6 is a major
or a minor, we will need to go through a release review for this one, as
it's been a year since the last review.
Jeen
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev
--
SIB logo
*Jerven Tjalling Bolleman*
Principal Software Developer
*SIB | Swiss Institute of Bioinformatics*
1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
t +41 22 379 58 85
Jerven.Bolleman@sib.swiss - www.sib.swiss