Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] ShaclSail performance

It’s work done while trying to optimize for a large dataset. It was all in a single branch and I’ve tried to break it out into multiple PRs instead of a single large one like last month. 

One of the PRs is targeted at 4.0.0. There is also one branch that has some mixed changes where some of them might be better suited for 4.0.0. I’ll try to comment on that one. 

I could do a 3.8.0 release instead. That’s probably usually what I would do in this case, but since we wanted to do 4.0.0 as the next release I thought it might be easier to do a 3.7.2 release.

Since I’ve split things into separate PRs we have a chance to discuss them separately :)

Håvard

On 26 Jul 2021, at 12:22, Jeen Broekstra <jeen@xxxxxxxxxxxx> wrote:


Ok so when you said you were working on performance optimizations, I didn't expect you to open up another 11 concurrent pull requests immediately afterwards :)

As I said I'm fine with doing performance fixes in 3.7.2, but please do keep in mind that a patch release is supposed to minimize the risk of introducing new bugs. Doing a lot of performance fixes in the same part of the code (even if each individual fix is small) compounds that risk. Not too worried about it at the moment as you've done an excellent job with test coverage on the ShaclSail, but just something to keep in the back of our mind.


Jeen

On Sun, 25 Jul 2021, at 21:08, Jeen Broekstra wrote:


On Sun, 25 Jul 2021, at 19:19, Håvard Ottestad wrote:
Hi,

I’m working on performance optimizations for the ShaclSail so that bulk validation of large datasets works as expected. At the moment the bulk validation ends up keeping far too much data in memory and a few of the validation plans are too slow for bulk validation. 

I would like to aim most of these performance fixes for a new bug fix release (3.7.2) since there are no new features or user facing changes. ,

Sounds good. With performance fixes, it kind of depends on the impact of the performance fix a little (in terms of amount of refactoring), but in principle I think it's fine to do that kind of thing in a patch release.


For 4.0.0 I am still aiming for even better bulk validation support. The fixes I have at the moment are just stop-gap measures to make bulk validation work again after I broke it with my big ShaclSail rewrite earlier this year. 

4.0.0 will include a more thorough bulk validation support throughout the ShaclSail as well as a “large datasets detection” feature that automatically switches to bulk validation. 


Sounds exciting!

There's a quite a few bug fixes already lined up for a 3.7.2 release, so we can release it as soon as you're ready I guess.

Cheers,

Jeen
_______________________________________________
rdf4j-dev mailing list
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev


_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev

Back to the top