Hi Emilio,
Appreciate
your quick turnaround.
On the top-n
items – yes, top-n items after sorting. In fact, we also
have a need to paginate through ‘n’ results at a time after
sorting. I will explore the statistical functions that you
mentioned and find out if it helps our requirements.
Regarding the
optimization of querying / (asc) sorting by attributes –
sure. I am interested to find out what it takes to implement
it.
On spark
support – since our use cases are pretty much interactive we
are not sure if the overhead involved in spinning a spark
job would be a viable alternative. I am curious to know the
options available in a Geomesa+Spark environment to run
interactive queries fired by a web front end.
Thanks,
Rama Sundaram
Hello,
1. If you want queries against attributes to be fast, then you
would have to index each of them. GeoMesa will still work
without them being indexed, but it will have to scan all
results. If you have an additional spatial and/or temporal
predicate this may not be an issue, as those values are always
indexed.
2. As results are not returned in any particular order from
HBase due to batch scanning, sorting is done in memory on the
client. When you say 'top' results, do you just mean the first
100 results after sorting? GeoMesa offers some statistical
functions that you can run distributed (like top-k), if that
is useful.
Since GeoMesa attribute indices are stored by value, they are
already naturally sorted in HBase. There is an obvious
opportunity to optimize the case for querying by an attribute
and sorting (ascending) by that same attribute, without
pulling all the data back to the client first. If that is
something you'd like to contribute to, we can provide
pointers.
Another approach would be to leverage GeoMesa's Spark support.
This is generally the approach we advocate for types of
analysis (like sorting) that don't align well with the
underlying indices.
Thanks,
Emilio
On 11/17/2017 11:03 AM, Sundaram, Rama
wrote:
Hi,
We are getting ourselves familiar with
Geomesa and evaluating its suitability for spatial analyses
of several medium - large sets of point data ranging from
50K to 6 - 7 million points stored in HBase. Some of the
analyses patterns
1.
Finding the intersection of a
given set against one or more of several static sets of
polygons stored in HBase and sort the results based on ANY
chosen attribute (with each point having ~200+ attributes)
2.
Find out the stats of the
point sets based on any of attribute data
These analyses will be done interactively
over REST calls.
Some of the design questions that we are
seeking your help to answer are
1.
Since we will be filtering and
sorting by any of the ~200 attributes, do we need to add an
attribute index for each of them?
2.
When we try to retrieve top
100 data sorted by an indexed attribute, we see queries
taking ~17 seconds (on a 600K point set) whereas with a BBOX
filter the same happens < 1 second. Is it because Geomesa
is fetching all the data to the client and sorting them?
Thanks,
Rama Sundaram
This message is intended only for the use of the addressee
and may contain
information that is PRIVILEGED AND CONFIDENTIAL.
If you are not the intended recipient, you are hereby
notified that any
dissemination of this communication is strictly
prohibited. If you have
received this communication in error, please erase all
copies of the message
and its attachments and notify the sender immediately.
Thank you.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users