Just a quick note and some thoughts.
I’m developing a stand-alone SHACL validator, nothing fancy, which is to be integrated in my data.gov.be toolchain
Of course the SHACL part works like charm, thanks Håvard
😉
Only a few minor issues that will either be solved in 4.3 (severity level),
or are arguably issues with the SHACL files on semic.eu (name on nodeshape, and nodeshapes with empty shacl:property)
I was wondering if it would be hard (or interesting for other people) to collect statistics on
-
number of times a shape did _not_ have validation issues , or how many times a shape matched in total
-
number of different classes/properties/object values in a dataset
Use case for (a) is mainly a metric for data quality (shape violations divided by total),
while (b) is useful for harmonizing data (eg reducing differences) but probably useful for optimizing queries / data storage as well.
For the time being I’m (ab)using data cubes for publishing the statistics in TTL, but perhaps there is a better vocabulary.
And I’m guessing some data stores already collect some of this data.
Happy to look into it myself, though hints on how to get started would be appreciated
😊