Joel,
This is a great question. To reframe the question, it sounds like
you'd like to be able to sort a query by a column (ascending and
descending) and page through the results.
In full generality, this is a tall order for a database layer living
on top of a distributed key-value store. GeoMesa uses sharding for
our spatial index to distribute data evenly across the cloud. To be
as efficient as possible, queries use multiple threads to read from
several tablet servers at a time. This means that two subsequent
queries will very likely get back results in different orders (hence
paging is hard).
I think you are on the right track with caching/storing queries to
serve up. Assuming that users are going to interact with the same
query for a few minutes, could you possibly cache the queries in
memory with a timeout of a minute or two? A load request would hit
GeoMesa, but the subsequent sort and page requests could work
against the data in memory. If the user leaves and comes back,
their query may have to be re-requested.
For GeoMesa, we have worked a little bit with caching in the
GeoTools layer, but we haven't ironed out all the issues. To give
it a spin, add 'caching -> true' in the DataStore params. As I
experimented with caching just now, I noticed that we don't look at
the sorting part of the query. This should be an incredibly easy
fix.* If in-memory caching is a suitable solution, I can help add a
few lines to get sorting to work with caching. Other than that, it
might be good to think through what cache settings we could expose
to the user to make caching viable.
The obvious downside is that if there are too many users relative to
available memory, this plan will fail. As a more complex
possibility, one could imagine writing a users query results to a
'temporary' Accumulo table*. Records in this table could be indexed
by session id / user / query id. During the first write, one would
be able to pick a column and sort order. From there, paging might
make sense. Reversing the sort order or sorting on another column
would require sorting in memory or creating another temporary copy
of the data.**
Thanks,
Jim
* The code for the Caching Feature Collection is here:
https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/data/AccumuloFeatureSource.scala#L111-154
** Rather than actually trying to figure out separate tables for
each user and when it is safe to delete them, one could configure
Accumulo's AgeOffFilter for the table. Copies of queries would be
deleted after a configurable time.
*** Now that I'm thinking of it, assuming that query results are
small-ish (5k records), if there are only a few columns (say under
10), one could write entries which would be sort (forwards and
backwards) on each column to the temporary table. It would require
a tad custom Accumulo work, but it'd be relatively straightforward.
On 05/01/2015 04:42 PM, Joel Folkerts
wrote:
Good
afternoon. I am working on a project that is serving Geomesa
results to users through a web interface by means of a REST
API. Currently, the users construct a geospatial query, the
API in turn sends this query to Geomesa, which then returns
all of the records back through the API to the user. We run
into problems when the returning dataset is over 5,000 records
(which it normally is) and we end up crashing the user's
browser.
What we're trying to
avoid to writing Geomesa search results to HDFS and then
layering Impala on top of it. While this would solve the
problem, we risk wasting a tremendous amount of HDFS space.
Our ultimate goal is
to connect a DataTables UI to Accumulo/Geomesa and being able
to only retrieve the data that we want, i.e. 10 records out of
100,000 records.
Any ideas, design
patterns, or code samples would be very much appreciated.
Thank you in advance!
-Joel
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
|