Re: [rdf4j-dev] New Memory Store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [rdf4j-dev] New Memory Store - Planning

From: Håvard Ottestad <hmottestad@xxxxxxxxx>
Date: Mon, 17 Jul 2017 16:13:30 +0200
Delivered-to: rdf4j-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/rdf4j-dev>
List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>

Looks like Stratified B-trees is the way to go for supporting versioning. Copy-on-write is also another possibility, but it is slower.

Paper on Stratified B-trees:

https://www.usenix.org/event/hotstorage11/tech/final_files/Twigg.pdf

I have never looked at this type of tree before. So I think my first goal is to just get a working B+ tree.

I was thinking about using a lookup table, which would convert a string into a long. Then I can inline the long in the B-tree node, making memory fetching more performant and also making comparisons more performant (long comparison is faster than string). But using a lookup table would break sorting. This doesn’t actually affect the store in any way, but it does make it impossible to do merge join efficiently. I’ll leave this aside for the time being though.

Would be so much more pleasant to make a memory store that didn’t support transactions or parallel updates.

Håvard

On 17 Jul 2017, at 14:11, James Leigh <james.leigh@xxxxxxxxxxxx> wrote:

Hi Håvard,

This would be great! Although, transaction support needs to be included
early in the design if you want to have a faster implementation then
the existing MemoryStore. Your indexes should support a multi-root B+
tree to allow versioning and concurrent read/writes to different states
of the store. I'd recommend you take a close look at SailSource
interface and understand it's lifecycle before getting too far in the
design of a new store.

There is no need to have an explicit lookup table for in-memory stores.
The purpose of an id-lookup table, in disk based indexes, is a simple
form of compression to avoid repeating the string value each time.
However, since you will be using Java Objects/points anyway, you
essentially already have a lookup table (memory position maps to a
char[] ;).

My goal this year is to start a similar process for the nativerdf store
as it has poor performance primary due to the lack of support for
versioning in its indexes. So, I'm very interested in how this works
out.

When your ready to share some code, it should be put into a features
branch to allow others to create PRs and other suggestions.

Regards,
James

On Mon, 2017-07-17 at 10:13 +0200, Håvard Ottestad wrote:
Hi Jeen,

Thanks for the feedback.

I would like to improve the current memory store instead, but the way
it's built with half the logic in the sail source and the other half
in the value factory makes it really hard to pry apart.

My main motivation is to get O(log N) or better lookup. The current
memory store is closer to O(N), which I think also impacts the insert
performance since it looks like all inserts check with hasStatement()
first.

How about this plan as a middle ground:

- Develop new memory store
- Release it as experimental
- Add transaction support
- Beta release with new memory store as default (eg. rename the old
one)
- 3-6 months later full release

Adding transaction support would bring it in line with the current
memory store, so feature wise users wouldn't loose anything. Do you
know of anyone using the transaction support in the current memory
store today? Especially the read-committed and snapshot isolation?
And for what purpose?

As for a memory store that spills to disk, I think that would be
better to design as a memory mapped disk store where you can disable
fsync for performance. Does the current disk based store use memory
mapped files?

Håvard

On Mon, Jul 17, 2017 at 12:59 AM, Jeen Broekstra <jeen.broekstra@gmai
l.com> wrote:
Sounds intriguing, but between the Model implementations and the
existing in-memory store, where does your approach fit in? From a
user perspective I’m thinking that another in-memory option might
be overly confusing.

Perhaps you could approach your ideas as improvements/configuration
options in the existing store and/or Model classes instead?

Another thought: I’ve long thought that it would be useful to have
an in-memory store that can seamlessly switch over to disk-based
persistence when the size goes up - another way to think of it is
as a fast-access caching layer for the native store. Could your
ideas fit in with that kind of project?

Cheers,

Jeen

On 15 Jul 2017, at 19:04, Håvard M. Ottestad <hmottestad@xxxxxxxx
m> wrote:

Hi,

I’m look at creating a new Memory Store that is configurable so
that it can be optimised for any scenario.

Here is what I’m thinking of doing:

- B+ tree indexes (because they support range queries)
- Thread based indexing (async)
- Deletes in diff index (merged when index.size > some max)
- Hashmap index for exists(triple1) queries

I don’t want to have transactional support. I use transactions
with disk based databases, but the memory store I usually only use
for embedded purposes that are single threaded.

Scenarios I want to support:
- read heavy
- write heavy
- transforms (deserialise, query, update, serialise)

In general, are there any recommendations or requirements that
others might have before I get too committed?

I also know that a lot of triple stores convert IRIs and literals
to a hash/integer so that the hash/integer is stored in the indexes
and the IRIs/literals are stored in a lookup table. I’m considering
doing this too, but it might not be of much benefit unless I
migrate to manual memory management using the unsafe library. Any
thoughts on using sun.misc.Unsafe?

Håvard
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/rdf4j-dev

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/rdf4j-dev

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/rdf4j-dev
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/rdf4j-dev

Follow-Ups:
- Re: [rdf4j-dev] New Memory Store - Planning
  - From: James Leigh

References:
- [rdf4j-dev] New Memory Store - Planning
  - From: Håvard M. Ottestad
- Re: [rdf4j-dev] New Memory Store - Planning
  - From: Jeen Broekstra
- Re: [rdf4j-dev] New Memory Store - Planning
  - From: Håvard Ottestad
- Re: [rdf4j-dev] New Memory Store - Planning
  - From: James Leigh

Prev by Date: Re: [rdf4j-dev] New Memory Store - Planning
Next by Date: Re: [rdf4j-dev] How is it going
Previous by thread: Re: [rdf4j-dev] New Memory Store - Planning
Next by thread: Re: [rdf4j-dev] New Memory Store - Planning
Index(es):
- Date
- Thread

Breadcrumbs