Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Geohashing vs R-tree (Luca Morandini)

All,

Just a general question regarding GeoMesa's geohashing, big-data-spatio-temporal, etc, one very influential post I've read recently was this one:

http://www.jandrewrogers.com/2015/03/02/geospatial-databases-are-hard/

The harsh criticisms of R-trees and space-filling curves as an indexing/sharding strategy is intriguing. I'm supposed to track down his "SpaceCurve" DB product demo to find out more eventually.

The question to GeoMesa list/authors: Can anyone address some of the arguments in that blog post against the design considerations of the tablet/hash strategy of GeoMesa?

Cheers,
Mike / Weft.io


On Tue, May 19, 2015 at 12:00 PM, <geomesa-users-request@xxxxxxxxxxxxxxxx> wrote:
Send geomesa-users mailing list submissions to
        geomesa-users@xxxxxxxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.locationtech.org/mailman/listinfo/geomesa-users
or, via email, send a message with subject or body 'help' to
        geomesa-users-request@xxxxxxxxxxxxxxxx

You can reach the person managing the list at
        geomesa-users-owner@xxxxxxxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of geomesa-users digest..."


Today's Topics:

   1. Geohashing vs R-tree (Luca Morandini)
   2. Re: Geohashing vs R-tree (Jim Hughes)


----------------------------------------------------------------------

Message: 1
Date: Tue, 19 May 2015 08:00:46 +1000
From: Luca Morandini <luca.morandini1@xxxxxxxxx>
To: Geomesa User discussions <geomesa-users@xxxxxxxxxxxxxxxx>
Subject: [geomesa-users] Geohashing vs R-tree
Message-ID: <555A610E.5010402@xxxxxxxxx>
Content-Type: text/plain; charset=utf-8; format=flowed

Folks,

I am drafting a presentation on Big Data spatial DBMSes, including GeoMesa. I read
some material, including the "Spatio-temporal Indexing in Non-relational
Distributed Databases" paper, but one question still lingers in my mind: isn't the
main advantage of geohashing over R-tree the absence of index rebalancing?

Regards,

Luca Morandini
Data Architect - AURIN project
Melbourne eResearch Group
Department of Computing and Information Systems
University of Melbourne
Tel. +61 03 903 58 380
Skype: lmorandini


------------------------------

Message: 2
Date: Mon, 18 May 2015 18:49:16 -0400
From: Jim Hughes <jnh5y@xxxxxxxx>
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] Geohashing vs R-tree
Message-ID: <555A6C6C.3010600@xxxxxxxx>
Content-Type: text/plain; charset=utf-8; format=flowed

Luca,

I'd agree more or less.  When Accumulo splits a tablet, that is
basically rebalancing in terms of the B^+ tree structure being
used/presented.  Splitting/rebalancing for R-trees is generally more
interesting and computationally intensive.

To unpack it a little bit, one could consider the time to insert a
record and/or build the index from scratch.  For a B^+-tree, inserts are
O(log n).  For an R-tree, the worst case of O(n) can be realized with
hotspotting.  As a real world example cited in that paper, it is noted
that millions of GDELT data are located a handful of points.  (Figure 10
shows that over 6 million points are associated with (0,0) and another 6
million with Washington, DC.)  That can give an R-tree some headaches.*

For my money, I'd love to a distributed database toolkit which would
help one use more complex data structures.  It'd be a quite challenging
project though...

Great question; let us know what else we can expand on,

Jim

* While working on that paper, we also tried out PostGIS with the same
dataset.  We didn't report on that since we were rather unsure of our
results.  Building the GIST index took days.  We did try to jitter the
data to avoid hotspotting, and I think that may have helped some.

On 05/18/2015 06:00 PM, Luca Morandini wrote:
> Folks,
>
> I am drafting a presentation on Big Data spatial DBMSes, including
> GeoMesa. I read some material, including the "Spatio-temporal Indexing
> in Non-relational Distributed Databases" paper, but one question still
> lingers in my mind: isn't the main advantage of geohashing over R-tree
> the absence of index rebalancing?
>
> Regards,
>
> Luca Morandini
> Data Architect - AURIN project
> Melbourne eResearch Group
> Department of Computing and Information Systems
> University of Melbourne
> Tel. +61 03 903 58 380
> Skype: lmorandini
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or
> unsubscribe from this list, visit
> http://www.locationtech.org/mailman/listinfo/geomesa-users



------------------------------

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

End of geomesa-users Digest, Vol 15, Issue 13
*********************************************


Back to the top