Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] Multithreaded access to the MemoryStore

I’ll file an issue on GitHub. 

I ran the benchmarks on my computer at work, it’s an Intel machine. I can see a bigger difference between single threaded vs multi threaded validation. Around 2x. Funnily enough multi threaded validation is as fast as single threaded validation on my new laptop. 

I did some profiling and found out that the Intel machine runs into the same concurrency issues. Upping the thread count makes it even more obvious with the first thread taking a few ms and the last thread taking 8 seconds. 

The arm cpu in my new laptop has a much wider architecture than the Intel cpu. But they both run at similar clock frequency (ghz). So I think they both end up bound by the atomic operations used in the locks as well as the code that has to be run in sequence within the locked sections. 

Håvard

On 11 Nov 2021, at 00:40, Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx> wrote:



Looks like the mvn -Pbenchmarks package does not put the benchmark ttls resources in the jmh-benchmarks.jar …

Running the benchmarks on a laptop via Netbeans works, by the way

 

Best regards

 

Bart

 

From: Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx>
Sent: woensdag 10 november 2021 10:38
To: Håvard Ottestad <hmottestad@xxxxxxxxx>; rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Subject: Re: [rdf4j-dev] Multithreaded access to the MemoryStore

 

Indeed, first I've created the benchmark jars using mvn clean -Pbenchmarks package

Then I entered java -jar target/jmh-benchmarks.jar and got exceptions like:

 

# Benchmark: org.eclipse.rdf4j.sail.shacl.benchmark.AddRemoveBenchmarkEmpty.shacl

 

# Run progress: 0.00% complete, ETA 09:17:40

# Fork: 1 of 1

# Warmup Iteration   1: <failure>

 

java.lang.IllegalArgumentException: Input stream must not be 'null'

at org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:121)

at org.eclipse.rdf4j.rio.Rio.parse(Rio.java:299)

at org.eclipse.rdf4j.rio.Rio.parse(Rio.java:265)

at org.eclipse.rdf4j.rio.Rio.parse(Rio.java:237)

at org.eclipse.rdf4j.sail.shacl.Utils.loadShapeData(Utils.java:44)

at org.eclipse.rdf4j.sail.shacl.Utils.getInitializedShaclSail(Utils.java:87)

 

(I assume the benchmarks are looking for a ttl in a specific directory on the file system,
instead of java resource inside the jar itself ? Or maybe the build didn't include all the files...

I'll try to look into it this afternoon / evening)

 

Bart


From: Håvard Ottestad <hmottestad@xxxxxxxxx>
Sent: Wednesday, November 10, 2021 8:00
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Cc: Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx>
Subject: Re: [rdf4j-dev] Multithreaded access to the MemoryStore

 

mvn -Pbenchmarks package

Was that the command you tried? Jeen recently made some adjustments to the process to stop the jars from being published.

 

Håvard



On 9 Nov 2021, at 23:20, Bart Hanssens (BOSA) via rdf4j-dev <rdf4j-dev@xxxxxxxxxxx> wrote:



Hmz, I get quite a few exceptions (java.lang.IllegalArgumentException: Input stream must not be 'null') when running the set of SHACL benchmarks…

Is this expected behavior ? Or could this be because I’m running the tests from the command line (mvn -jar jmh-benchmarks.jar ?) and not inside a GUI ?

 

Best regards

 

Bart

 

From: Bart Hanssens (BOSA)
Sent: dinsdag 9 november 2021 12:49
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Subject: RE: [rdf4j-dev] Multithreaded access to the MemoryStore

 

Hi Håvard,

 

Haven’t used multithread access to the memorystore yet…

I do have access to a cheap “server” (consumer grade AMD Ryzen 5 3600 with 6c/12t) and JDK 17, so I’ll give that a spin.

 

Best regards

 

Bart

 

From: rdf4j-dev <rdf4j-dev-bounces@xxxxxxxxxxx> On Behalf Of Håvard Ottestad
Sent: dinsdag 9 november 2021 7:23
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Subject: [rdf4j-dev] Multithreaded access to the MemoryStore

 

Hi,

 

I’ve upgraded my laptop just a few days ago to a shiny new MacBook with 8+2 ARM cores. It’s a nice upgrade to my 9 year old MacBook with a quad core Intel processor. 

 

Things are definitely snappier, but I’m struggling to get much performance improvement when comparing single threaded SHACL validation to parallel validation. 

 

Profiler says that there are three major issues of overhead and contention are:

 - creating read locks

 - retrieving the mem-versions of values from the mem value store

 - keeping track of active iterators


Might be that the arm version of jdk 17 still needs optimizations to make locks faster, or maybe it’s due to the arm memory model being more relaxed than the Intel one. 

 

Anyone else run into the same issues?

 

Or someone with an 8+ core Intel/amd machine who would like to test it out? I’m using the benchmark linked below and not seeing much difference when enabling/disabling parallel validation. 

 

 

Cheers,

Håvard

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/rdf4j-dev


Back to the top