Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] issue with KNN efficiency

Dear Elad,
Thanks for writing in! The KNN algorithm is a relatively new addition to the analytics built into GeoMesa and I believe this is the first time we've received community feedback regarding it.

The KNNQuery does indeed exploit the geohash index in order to increase efficiency and avoid pulling large numbers of false positives. Starting from a geohash containing your reference point, the process executes a series of small queries in an outward spiral of geohashes until K neighbors are found, and then all remaining geohashes that may contain nearer neighbors are also swept to ensure the nearest neighbors have been found.

As this is a WPS process, the issue could be with GeoMesa core, the plugin, or GeoServer. I'd like to ask a few questions to collect some more information to assist you in debugging the problem:

1) What are the indications that the entire layer is being extracted?
2) Did you build GeoMesa from source, and if so, what is the git commit hash? If you're using a jar pulled from Nexus, when was it pulled? 3) Can you please send the relevant portions of your GeoServer logs, showing the log output from the KNNQuery process? 4) This is somewhat redundant, but can you please send the parameters you fed into the KNN process? The XML file would be fabulous. 5) Can you please describe the geographic distribution of the data a bit? For example, how widely distributed are those 200k Features? And given K, how large an area would you expect the KNN to reside? 6) Can you please send the details of your layer? In particular I'd like to know if caching is enabled in the Accumulo Feature Store and the fields in the Coordinate Reference Systems portion of the layer info.

Best regards, and thanks again for the feedback,
Michael Ronquest





On 03/12/2015 09:47 AM, Elad Katz wrote:
Hi,
I am trying to use KNNQuery to find the KNN of a point from a big layer (~200,000 features) and it seems GeoMesa extracts the entire layer from the server in order to find the nearest neighbor. This makes the algorithm impossible to use with big layers, and makes no sense - the geohash index should be used to find the nearest neighbors efficiently (in O(output size), not O(layer size)).
I think the relevant line is:
https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/process/knn/KNNQuery.scala#L77
Thanks


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users



Back to the top