Re: [geomesa-users] Geomesa Query Performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Geomesa Query Performance

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Fri, 1 Sep 2017 09:00:58 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

Hi Sandeep,

In general, we still have to scan where there *might* be data, even if there isn't actually any data there. Opening a scan, even if it returns no data, takes some time. For temporal queries, the number of ranges tends to be even larger, hence the slower performance.

I believe that Accumulo handles this a bit better than HBase, as it has a concept of a batch scanner that accepts multiple ranges, and it has some knowledge of the data start/end. In HBase, we have to run multiple scans using a thread pool [1], so it's not as efficient. We could possibly leverage HBase metadata to improve things a bit for that scenario (as future work).

We also have the concept of data statistics, which we could leverage to only scan the ranges that have data. However, it hasn't been implemented for HBase yet, and our current query planning doesn't use it since it's an optional feature. As more future work, it would be nice to leverage those stats in query planning.

To mitigate the issue, you can try increasing the "queryThreads" data store parameter, in order to use more threads during queries. You can also enable "looseBoundingBox", if you have currently disabled it. For temporal queries, increasing the temporal binning period may cause fewer ranges to be scanned [2]. However this may result in slower queries for very small temporal ranges, so it should be tailored to your use case.

As a final note, make sure that you have the distributed coprocessors installed and enabled [3], especially if you are not using loose bounding boxes.

Thanks,

Emilio

[1] https://github.com/locationtech/geomesa/blob/master/geomesa-index-api/src/main/scala/org/locationtech/geomesa/index/utils/AbstractBatchScan.scala
[2] http://www.geomesa.org/documentation/user/datastores/index_config.html#configuring-z-index-time-interval
[3] http://www.geomesa.org/documentation/user/hbase/install.html#register-the-coprocessors

On 08/31/2017 04:59 PM, Sandeep Singh wrote:

Hi,

I am trying the example tutorial, by setting up Hbase database with it. I am running the Hbase QuickStart tutorial http://www.geomesa.org/documentation/tutorials/geomesa-quickstart-hbase.html The tutorial runs fine, below are some of the problems which I notice in the query performance of bounding box.

I have inserted data with lat, lng range (30,60) to (35,65)

In this settings, I am doing query on my local machine:

a) In my first query, the location bounding box is: (30,60) to (30.1,60.1), it runs on an average in less than a second and return correct results.

b) In second query, I modified the location bounding box (10,10) to (30.1,60.1). This query also returns the same results as in query (a), which is expected, but on an average it takes around 3-4 seconds per query.

Since both queries should give me same results, but one is running much faster than the other. I notice the similar behavior in time domain queries too where the performance is even much worse (10x times slower or even more) if time ranges are not matching with data inserted. Below are some of my questions:

1) Is this expected behavior ?

2) I know one of the solution can be to reformat the query to map to the actual data spatial and temporal ranges inserted into Geomesa, which will require me to maintain additional metadata about the data. But, I think a better solution might be designed at Geomesa layer ?

Do, let me know if there is some kind of settings etc, which can affect this behavior. I have seen the same behavior on multiple other local machines and on cloud VMS by setting up Geomesa.

regards,

Sandeep Singh.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] Geomesa Query Performance
  - From: Sandeep Singh

References:
- [geomesa-users] Geomesa Query Performance
  - From: Sandeep Singh

Prev by Date: [geomesa-users] Geomesa Query Performance
Next by Date: Re: [geomesa-users] Geomesa Query Performance
Previous by thread: [geomesa-users] Geomesa Query Performance
Next by thread: Re: [geomesa-users] Geomesa Query Performance
Index(es):
- Date
- Thread

Breadcrumbs