Hi Joel and others,
As a follow-up to this, I wanted to share that we're working a fix
for a related sorted issue. I'm hopeful that we're having something
turned around in a few days.
Thanks for mentioning the DataTables api; it looks like a fun way to
show a reasonably sized dataset.
Jim
On 05/03/2015 03:46 PM, Jim Hughes
wrote:
Joel,
This is a great question. To reframe the question, it sounds like
you'd like to be able to sort a query by a column (ascending and
descending) and page through the results.
In full generality, this is a tall order for a database layer
living on top of a distributed key-value store. GeoMesa uses
sharding for our spatial index to distribute data evenly across
the cloud. To be as efficient as possible, queries use multiple
threads to read from several tablet servers at a time. This means
that two subsequent queries will very likely get back results in
different orders (hence paging is hard).
I think you are on the right track with caching/storing queries to
serve up. Assuming that users are going to interact with the same
query for a few minutes, could you possibly cache the queries in
memory with a timeout of a minute or two? A load request would
hit GeoMesa, but the subsequent sort and page requests could work
against the data in memory. If the user leaves and comes back,
their query may have to be re-requested.
For GeoMesa, we have worked a little bit with caching in the
GeoTools layer, but we haven't ironed out all the issues. To give
it a spin, add 'caching -> true' in the DataStore params. As I
experimented with caching just now, I noticed that we don't look
at the sorting part of the query. This should be an incredibly
easy fix.* If in-memory caching is a suitable solution, I can
help add a few lines to get sorting to work with caching. Other
than that, it might be good to think through what cache settings
we could expose to the user to make caching viable.
The obvious downside is that if there are too many users relative
to available memory, this plan will fail. As a more complex
possibility, one could imagine writing a users query results to a
'temporary' Accumulo table*. Records in this table could be
indexed by session id / user / query id. During the first write,
one would be able to pick a column and sort order. From there,
paging might make sense. Reversing the sort order or sorting on
another column would require sorting in memory or creating another
temporary copy of the data.**
Thanks,
Jim
* The code for the Caching Feature Collection is here:
https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/data/AccumuloFeatureSource.scala#L111-154
** Rather than actually trying to figure out separate tables for
each user and when it is safe to delete them, one could configure
Accumulo's AgeOffFilter for the table. Copies of queries would be
deleted after a configurable time.
*** Now that I'm thinking of it, assuming that query results are
small-ish (5k records), if there are only a few columns (say under
10), one could write entries which would be sort (forwards and
backwards) on each column to the temporary table. It would
require a tad custom Accumulo work, but it'd be relatively
straightforward.
On 05/01/2015 04:42 PM, Joel Folkerts
wrote:
Good
afternoon. I am working on a project that is serving Geomesa
results to users through a web interface by means of a REST
API. Currently, the users construct a geospatial query, the
API in turn sends this query to Geomesa, which then returns
all of the records back through the API to the user. We run
into problems when the returning dataset is over 5,000
records (which it normally is) and we end up crashing the
user's browser.
What we're trying
to avoid to writing Geomesa search results to HDFS and then
layering Impala on top of it. While this would solve the
problem, we risk wasting a tremendous amount of HDFS space.
Our ultimate goal
is to connect a DataTables UI to Accumulo/Geomesa and being
able to only retrieve the data that we want, i.e. 10 records
out of 100,000 records.
Any ideas, design
patterns, or code samples would be very much appreciated.
Thank you in advance!
-Joel
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
|