Hi John,
I believe you're right, that HBase doesn't process large number of
scans especially well. In fact, we had to implement the BatchScan
functionality, as it doesn't exist in HBase. Our current HBase
implementation is brand new, and we don't have the same real-world
track record as we do with the Accumulo implementation yet, so there
may be some pain points. If there are too many ranges being
generated, you can set a 'target' number using a system property:
geomesa.scan.ranges.target. By default, it's 2000. This isn't an
absolute value, but an estimate. Note that you might also need to
set the 'looseBoundingBox' data store configuration option to false,
as fewer ranges will mean more false-positives.
Currently we don't use coprocessors with HBase, so all the
fine-grain filtering is done on the client. If you're interested in
that functionality, let us know.
Hope that helps, let us know either way!
Thanks,
Emilio
On 11/22/2016 08:42 AM, John Process
wrote:
Hi,
I just started looking into geomesa and am
mainly interested in using HBase as the backing datastore. I
began by experimenting with geomesa-quickstart-hbase to
generate the point features, insert them into HBase, and run
the query on them. I got this to work with my existing
remotely running HBase instance (running version 1.2.0) but
one thing that immediately became apparent to me was that the
query was taking a very long time, on the order of minutes
(after I increased the timeout).
I did some profiling and ultimately tracked
it down to
geomesa-hbase/geomesa-hbase-datastore/src/main/scala/org/locationtech/geomesa/hbase/utils/BatchScan.scala:71
in the HBase table getScanner method. It appears there are
many thousands of small scan ranges that it executes
getScanner on. This makes sense based on how I understand the
spatial indexing to work, but the problem I'm finding is that
HBase seems to handle this type of batch scan query quite
poorly. It doesn’t seem to support processing an entire group
of scans in the same way Accumulo does.
Just as a sanity check I modified it to
scan the entire table instead of executing all of the
individual scans and it completes the query very quickly.
Clearly this doesn't scale and defeats the purpose of indexing
but it does help to demonstrate the problem.
So I am curious if anyone has encountered
this or perhaps if this is a known problem with HBase?
Thanks,
John
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://locationtech.org/mailman/listinfo/geomesa-dev
|