Hi,
I just started looking into geomesa and am mainly interested in using HBase as the backing datastore. I began by experimenting with geomesa-quickstart-hbase to generate the point features, insert them into HBase, and run the query on them.
I got this to work with my existing remotely running HBase instance (running version 1.2.0) but one thing that immediately became apparent to me was that the query was taking a very long time, on the order of minutes (after I increased the timeout).
I did some profiling and ultimately tracked it down to geomesa-hbase/geomesa-hbase-datastore/src/main/scala/org/locationtech/geomesa/hbase/utils/BatchScan.scala:71 in the HBase table getScanner method. It appears there are many thousands
of small scan ranges that it executes getScanner on. This makes sense based on how I understand the spatial indexing to work, but the problem I'm finding is that HBase seems to handle this type of batch scan query quite poorly. It doesn’t seem to support processing
an entire group of scans in the same way Accumulo does.
Just as a sanity check I modified it to scan the entire table instead of executing all of the individual scans and it completes the query very quickly. Clearly this doesn't scale and defeats the purpose of indexing but it does help to demonstrate
the problem.
So I am curious if anyone has encountered this or perhaps if this is a known problem with HBase?
Thanks,
John