Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] General Design Question - Geomesa + HBase

I'm not entirely sure about using spark for interactive queries - I believe that you can spin up an RDD and then leave it running to answer your queries. Others might have more experience.

Regarding pagination, it suffers from the same problems as sorting - meaning you have to sort to get a consistent order, and then you have to pull all the features back for each page request. You might want to look at enabling 'caching' on our data store config - for a given call to 'getFeatureSource', that feature source will keep the result set in memory for a given query. You have to be careful, of course, as it can quickly use up your heap space. It looks like right now sorting is still done on every invocation - that might be something we can optimize as well.

If you want to chat about implementing some of this, it would probably be easier to chat back-and-forth in our public gitter: https://gitter.im/locationtech/geomesa. I'm generally on there normal work hours EST.

thanks,

Emilio


On 11/17/2017 01:26 PM, Sundaram, Rama wrote:

Hi Emilio,

  Appreciate your quick turnaround.

On the top-n items – yes, top-n items after sorting. In fact, we also have a need to paginate through ‘n’ results at a time after sorting. I will explore the statistical functions that you mentioned and find out if it helps our requirements.

  Regarding the optimization of querying / (asc) sorting by attributes – sure. I am interested to find out what it takes to implement it.

  On spark support – since our use cases are pretty much interactive we are not sure if the overhead involved in spinning a spark job would be a viable alternative. I am curious to know the options available in a Geomesa+Spark environment to run interactive queries fired by a web front end.

 

Thanks,

Rama Sundaram

 

From: geomesa-users-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Emilio Lahr-Vivaz
Sent: Friday, November 17, 2017 12:14 PM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] General Design Question - Geomesa + HBase

 

Hello,

1. If you want queries against attributes to be fast, then you would have to index each of them. GeoMesa will still work without them being indexed, but it will have to scan all results. If you have an additional spatial and/or temporal predicate this may not be an issue, as those values are always indexed.

2. As results are not returned in any particular order from HBase due to batch scanning, sorting is done in memory on the client. When you say 'top' results, do you just mean the first 100 results after sorting? GeoMesa offers some statistical functions that you can run distributed (like top-k), if that is useful.

Since GeoMesa attribute indices are stored by value, they are already naturally sorted in HBase. There is an obvious opportunity to optimize the case for querying by an attribute and sorting (ascending) by that same attribute, without pulling all the data back to the client first. If that is something you'd like to contribute to, we can provide pointers.

Another approach would be to leverage GeoMesa's Spark support. This is generally the approach we advocate for types of analysis (like sorting) that don't align well with the underlying indices.

Thanks,

Emilio

On 11/17/2017 11:03 AM, Sundaram, Rama wrote:

Hi,

  We are getting ourselves familiar with Geomesa and evaluating its suitability for spatial analyses of several medium - large sets of point data ranging from 50K to 6 - 7 million points stored in HBase. Some of the analyses patterns

1.       Finding the intersection of a given set against one or more of several static sets of polygons stored in HBase and sort the results based on ANY chosen attribute (with each point having ~200+ attributes)

2.       Find out the stats of the point sets based on any of attribute data

 

These analyses will be done interactively over REST calls.

 

Some of the design questions that we are seeking your help to answer are

1.       Since we will be filtering and sorting by any of the ~200 attributes, do we need to add an attribute index for each of them?

2.       When we try to retrieve top 100 data sorted by an indexed attribute, we see queries taking ~17 seconds (on a 600K point set) whereas with a BBOX filter the same happens < 1 second. Is it because Geomesa is fetching all the data to the client and sorting them?

 

 

Thanks,

Rama Sundaram

 

 



This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

 




This message is intended only for the use of the addressee and may contain
information that is PRIVILEGED AND CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, please erase all copies of the message
and its attachments and notify the sender immediately. Thank you.


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top