Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] validating a file with shacl sail

Hi,

 

The data I’ve been using is on https://github.com/Fedict/dcat/tree/master/all  (the datagovbe.nt.gz), though it gets (or it is supposed to be) updated weekly.

 

Speed is somewhat less of an issue in my case (the transformation from various sources and the full update of the website takes a lot more wall clock time),

user-friendly reporting is more important (though the compact notation of the DCAT-AP SHACL rules with blank nodes makes it harder to do so),

but I can turn the ValidationReport into an HTML page of course 😊

 

Bart

 

From: rdf4j-dev-bounces@xxxxxxxxxxx <rdf4j-dev-bounces@xxxxxxxxxxx> On Behalf Of Håvard Ottestad
Sent: vrijdag 29 maart 2019 9:32
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Subject: Re: [rdf4j-dev] validating a file with shacl sail

 

I’ll fix that typo. 

 

The ShaclSail supports implicit targets when the subject is both a NodeShape / PropertyShape and an rdfs:Class. Maybe one of the owl imports contains that info. 

 

13/11 minutes sounds decent, but I wonder how it would compare to for instance Stardog. 

 

There are still a lot of performance tuning options for the ShaclSail. I’ve been experimenting with a new memory store, but it’s too slow during indexing at the moment. Pushing joins to “before” the sorting might help. Maybe we can optimize for minCount 1, and even for combination of minCount/maxCount 1 => exactly 1. 

 

As you say, this isn’t the scenario I built the SHACL engine to handle. It’s supposed to be fast when making small changes to a large database. Single-shot validation of a file is still a big use case out there, so it’s worth the effort making that fast too. 

 

Is there a way you could share your data? Eg. is it public with a license that means it could be included in the RDF4J repo as a benchmark file (or a smaller subset)?

 

Thank you for letting me know how it went!

Håvard


On 29 Mar 2019, at 00:05, Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx> wrote:

Hi  Håvard,

 

 

Some initial feedback on the ShaclSail

 

Minor issue: typo’s in the Exception messages in “sail/shacl/AST/ShaclProperties.java” , often the message contains “aleady” (r is missing)

 

As mentioned before, I’m (ab)using the ShaclSail to validate a larger data set (600 K triples loaded into a MemStore) at once, instead of validating small changes,

with this set of rules from the European Semic/Joinup platform: https://github.com/SEMICeu/dcat-ap_shacl/blob/master/shacl/dcat-ap.shapes.ttl

 

It didn’t work initially, because the SHACL rules don’t explicitly set a targetClass (they rely on implicit class targets instead => NoShapesLoadedException),

So I added a targetClass on dcat:Catalog, dcat:Dataset and dcat:Distribution.

 

With the 3.0-snapshot it took about 13 minutes to load and validate on an old Core i5 Windows laptop, which is acceptable for my purposes.

With the experimental parallel validation enabled it took 11 minutes.

 

I’ll have to dig into the report to verify the validation errors, but it looks promising.

 

 

Best regards

 

Bart

 

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/rdf4j-dev


Back to the top