Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] shac, statistics on classes etc
  • From: "Bart Hanssens (BOSA)" <bart.hanssens@xxxxxxxxxxxx>
  • Date: Thu, 27 Apr 2023 21:12:00 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bosa.fgov.be; dmarc=pass action=none header.from=bosa.fgov.be; dkim=pass header.d=bosa.fgov.be; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1v+HL+PjxSOyn0oJS95qn7fOt8nv47s1mjJ1pNlFYhc=; b=ogq+zLrcQ1gwoOmp3P42GNAskvyWGTNYdSosW45T0JdnJGegcQiDHmk/rzzxiTgj74hYELJDgqnirGW2+JMIL07RniJTqnm+jL0XfZxtwd43CZjaxeVHsYRgAtzDsMOVlkMFTPovGtalElDxEoexWamo4b6D65A2d4yVg0UCwIayaGwDMfnbcnhu8AFdn8cv+zIhgX7VDcofvE8V4ry6FBeiFPWnh+oshzu6wX8ROjyVTyJlw8Bkvt4RZ/hMDkcqbn7f79cvtODEF8H2yMaWWkoJpK5JRABW7R/Cx6BFVLGh/AasYKLC9QYm+a5Jd9bQfhYy27rA7RL0h0feOkMDmQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=b2TZRTPsaMDKOVJlNxHRpTWbzA2wyAFJkPgENxecL0GTJC58OQ/JJ96uXD2BRc791bnJr/nlUBtaUA56ScolAf3SX2g7VvKMhH32HUy9HAq4KWXFGE8N/qMQtLDnTV1zYO5NzjEYLysrxmikni4Nt4jPfS50CaNclHAuyEQAp/1nRuOmJBUp42v78BXutL5NnyT3U7aOhCkv1tqtdMZcbGueL7tqVH1lB+IuHJRAen94slnpR4ddT6FigPmP/mbRYkJVl5sS/7nBZgGp2sVwIKSqsI30NOF06FRuCZfTw8gTBh1kNo8wXu34d2/5xmxAYeUHS/Y0Ls+H47PQ0Ps0kw==
  • Delivered-to: rdf4j-dev@xxxxxxxxxxx
  • List-archive: <https://www.eclipse.org/mailman/private/rdf4j-dev/>
  • List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
  • List-subscribe: <https://www.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
  • List-unsubscribe: <https://www.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>
  • Thread-index: Adl4/8CV7896ZwozQN+6BbzQ5OfvXAASBi8AAADuO7A=
  • Thread-topic: [rdf4j-dev] shac, statistics on classes etc

Hi Matthew

 

it’s just the regular RDF4J SHACL sail developed by Håvard

+ a couple of minor tweaks to work around some issues in SHACL files that are arguably not entirely conforming to the SHACL spec.

And some very basic selects and a Pebble template to produce an HTML report instead of / next to a RDF/Turtle report

 

So no additional SHACL features.

 

Best regards

 

Bart

 

 

From: rdf4j-dev <rdf4j-dev-bounces@xxxxxxxxxxx> On Behalf Of Matthew Nguyen via rdf4j-dev
Sent: donderdag 27 april 2023 22:36
To: rdf4j-dev@xxxxxxxxxxx
Cc: Matthew Nguyen <nguyenm9@xxxxxxx>
Subject: Re: [rdf4j-dev] shac, statistics on classes etc

 

Hi Bart, does this tool support SHACL-SPARQL constraints? (https://www.w3.org/TR/shacl-af/)

-----Original Message-----
From: Bart Hanssens (BOSA) via rdf4j-dev <rdf4j-dev@xxxxxxxxxxx>
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Cc: Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx>
Sent: Thu, Apr 27, 2023 8:03 am
Subject: [rdf4j-dev] shac, statistics on classes etc

Hi,

 

Just a quick note and some thoughts.

 

I’m developing a stand-alone SHACL validator, nothing fancy, which is to be integrated in my data.gov.be toolchain

 

Of course the SHACL part works like charm, thanks Håvard 😉

Only a few minor issues that will either be solved in 4.3 (severity level),

or are arguably issues with the SHACL files on semic.eu (name on nodeshape, and nodeshapes with empty shacl:property)

 

I was wondering if it would be hard (or interesting for other people) to collect statistics on

  1. number of times a shape did _not_ have validation issues , or how many times a shape matched in total
  2. number of different classes/properties/object values in a dataset

 

Use case for (a) is mainly a metric for data quality (shape violations divided by total),

while (b) is useful for harmonizing data (eg reducing differences) but probably useful for optimizing queries / data storage as well.

 

For the time being I’m (ab)using data cubes for publishing the statistics in TTL, but perhaps there is a better vocabulary.

And I’m guessing some data stores already collect some of this data.

 

Happy to look into it myself, though hints on how to get started would be appreciated 😊

 

 

Best regards,

 

Bart

 

 

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev


Back to the top