Re: [geomesa-dev] batched scans in HBase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-dev] batched scans in HBase

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Tue, 22 Nov 2016 09:17:11 -0500
Delivered-to: geomesa-dev@xxxxxxxxxxxxxxxx
List-archive: <https://locationtech.org/mhonarc/lists/geomesa-dev>
List-help: <mailto:geomesa-dev-request@locationtech.org?subject=help>
List-subscribe: <http://locationtech.org/mailman/listinfo/geomesa-dev>, <mailto:geomesa-dev-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://locationtech.org/mailman/options/geomesa-dev>, <mailto:geomesa-dev-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

Hi John,

I believe you're right, that HBase doesn't process large number of scans especially well. In fact, we had to implement the BatchScan functionality, as it doesn't exist in HBase. Our current HBase implementation is brand new, and we don't have the same real-world track record as we do with the Accumulo implementation yet, so there may be some pain points. If there are too many ranges being generated, you can set a 'target' number using a system property: geomesa.scan.ranges.target. By default, it's 2000. This isn't an absolute value, but an estimate. Note that you might also need to set the 'looseBoundingBox' data store configuration option to false, as fewer ranges will mean more false-positives.

Currently we don't use coprocessors with HBase, so all the fine-grain filtering is done on the client. If you're interested in that functionality, let us know.

Hope that helps, let us know either way!

Thanks,

Emilio

On 11/22/2016 08:42 AM, John Process wrote:

Hi,

I just started looking into geomesa and am mainly interested in using HBase as the backing datastore. I began by experimenting with geomesa-quickstart-hbase to generate the point features, insert them into HBase, and run the query on them. I got this to work with my existing remotely running HBase instance (running version 1.2.0) but one thing that immediately became apparent to me was that the query was taking a very long time, on the order of minutes (after I increased the timeout).

I did some profiling and ultimately tracked it down to geomesa-hbase/geomesa-hbase-datastore/src/main/scala/org/locationtech/geomesa/hbase/utils/BatchScan.scala:71 in the HBase table getScanner method. It appears there are many thousands of small scan ranges that it executes getScanner on. This makes sense based on how I understand the spatial indexing to work, but the problem I'm finding is that HBase seems to handle this type of batch scan query quite poorly. It doesn’t seem to support processing an entire group of scans in the same way Accumulo does.

Just as a sanity check I modified it to scan the entire table instead of executing all of the individual scans and it completes the query very quickly. Clearly this doesn't scale and defeats the purpose of indexing but it does help to demonstrate the problem.

So I am curious if anyone has encountered this or perhaps if this is a known problem with HBase?

Thanks,

John
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://locationtech.org/mailman/listinfo/geomesa-dev

Follow-Ups:
- Re: [geomesa-dev] batched scans in HBase
  - From: John Process

References:
- [geomesa-dev] batched scans in HBase
  - From: John Process

Prev by Date: [geomesa-dev] batched scans in HBase
Next by Date: Re: [geomesa-dev] batched scans in HBase
Previous by thread: [geomesa-dev] batched scans in HBase
Next by thread: Re: [geomesa-dev] batched scans in HBase
Index(es):
- Date
- Thread

Breadcrumbs