Emilio,
Thanks for
the response, I'll try out your suggestions.
John
Hi John,
I believe you're right, that HBase doesn't process large
number of scans especially well. In fact, we had to implement
the BatchScan functionality, as it doesn't exist in HBase. Our
current HBase implementation is brand new, and we don't have
the same real-world track record as we do with the Accumulo
implementation yet, so there may be some pain points. If there
are too many ranges being generated, you can set a 'target'
number using a system property: geomesa.scan.ranges.target. By
default, it's 2000. This isn't an absolute value, but an
estimate. Note that you might also need to set the
'looseBoundingBox' data store configuration option to false,
as fewer ranges will mean more false-positives.
Currently we don't use coprocessors with HBase, so all the
fine-grain filtering is done on the client. If you're
interested in that functionality, let us know.
Hope that helps, let us know either way!
Thanks,
Emilio
On 11/22/2016 08:42 AM, John Process
wrote:
Hi,
I just started looking into geomesa and
am mainly interested in using HBase as the backing
datastore. I began by experimenting with
geomesa-quickstart-hbase to generate the point features,
insert them into HBase, and run the query on them. I got
this to work with my existing remotely running HBase
instance (running version 1.2.0) but one thing that
immediately became apparent to me was that the query was
taking a very long time, on the order of minutes (after I
increased the timeout).
I did some profiling and ultimately
tracked it down to
geomesa-hbase/geomesa-hbase-datastore/src/main/scala/org/locationtech/geomesa/hbase/utils/BatchScan.scala:71
in the HBase table getScanner method. It appears there are
many thousands of small scan ranges that it executes
getScanner on. This makes sense based on how I understand
the spatial indexing to work, but the problem I'm finding is
that HBase seems to handle this type of batch scan query
quite poorly. It doesn’t seem to support processing an
entire group of scans in the same way Accumulo does.
Just as a sanity check I modified it to
scan the entire table instead of executing all of the
individual scans and it completes the query very quickly.
Clearly this doesn't scale and defeats the purpose of
indexing but it does help to demonstrate the problem.
So I am curious if anyone has encountered
this or perhaps if this is a known problem with HBase?
Thanks,
John
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://locationtech.org/mailman/listinfo/geomesa-dev