[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [geomesa-users] issue with KNN efficiency
|
Dear Elad,
Thanks for writing in! The KNN algorithm is a relatively
new addition to the analytics built into GeoMesa and I believe this is
the first time we've received community feedback regarding it.
The KNNQuery does indeed exploit the geohash index in order to increase
efficiency and avoid pulling large numbers of false positives. Starting
from a geohash containing your reference point, the process executes
a series of small queries in an outward spiral of geohashes until K
neighbors are found, and then all remaining geohashes that may contain
nearer neighbors are also swept to ensure the nearest neighbors have
been found.
As this is a WPS process, the issue could be with GeoMesa core, the
plugin, or GeoServer. I'd like to ask a few questions to collect some
more information to assist you in debugging the problem:
1) What are the indications that the entire layer is being extracted?
2) Did you build GeoMesa from source, and if so, what is the git commit
hash? If you're using a jar pulled from Nexus, when was it pulled?
3) Can you please send the relevant portions of your GeoServer logs,
showing the log output from the KNNQuery process?
4) This is somewhat redundant, but can you please send the parameters
you fed into the KNN process? The XML file would be fabulous.
5) Can you please describe the geographic distribution of the data a
bit? For example, how widely distributed are those 200k Features? And
given K, how large an area would you expect the KNN to reside?
6) Can you please send the details of your layer? In particular I'd like
to know if caching is enabled in the Accumulo Feature Store and the
fields in the Coordinate Reference Systems portion of the layer info.
Best regards, and thanks again for the feedback,
Michael Ronquest
On 03/12/2015 09:47 AM, Elad Katz wrote:
Hi,
I am trying to use KNNQuery to find the KNN of a point from a big
layer (~200,000 features) and it seems GeoMesa extracts the entire
layer from the server in order to find the nearest neighbor.
This makes the algorithm impossible to use with big layers, and makes
no sense - the geohash index should be used to find the nearest
neighbors efficiently (in O(output size), not O(layer size)).
I think the relevant line is:
https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/process/knn/KNNQuery.scala#L77
Thanks
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users