Re: [geomesa-users] issue with KNN efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] issue with KNN efficiency

From: Jim Hughes <jnh5y@xxxxxxxx>
Date: Thu, 12 Mar 2015 10:13:25 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <http://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

Hi Elad,

Mike Ronquest may respond with more details since he wrote that code. Check a few lines higher: https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/process/knn/KNNQuery.scala#L51

There are we are calling a static method which will build up a range query which will hopefully contain just the K nearest neighbors.

Unfortunately, evaluating KNN for a database with little additional index structure is challenging and generally requires repeated, expanding queries. Our KNN method does take some hints (https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/process/knn/KNNQuery.scala#L41-46), namely, searchDistanceInMeters and maxDistanceInMeters.

The first is used as a hint for how big of an initial box to consider; the second is used to stop the algorithm from chugging through all the points in the layer.

If you set the log level to TRACE for the package org.locationtech.geomesa.core.process.knn, you should see output from this line https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/process/knn/KNNQuery.scala#L85 which will tell you how many misses the KNN algorithm is making.

From there, you can twiddle the searchDistanceInMeters.

More generally, once GeoMesa has selectivity estimates and a sketch of the two-dimensional distribution, we can probably guesstimate the searchDistanceInMeters fairly effectively. That feature is on the roadmap, and is likely several months away.

Thanks,

Jim

On 03/12/2015 09:47 AM, Elad Katz wrote:

Hi,

I am trying to use KNNQuery to find the KNN of a point from a big layer (~200,000 features) and it seems GeoMesa extracts the entire layer from the server in order to find the nearest neighbor.

This makes the algorithm impossible to use with big layers, and makes no sense - the geohash index should be used to find the nearest neighbors efficiently (in O(output size), not O(layer size)).
I think the relevant line is:
https://github.com/locationtech/geomesa/blob/accumulo1.5.x/1.x/geomesa-core/src/main/scala/org/locationtech/geomesa/core/process/knn/KNNQuery.scala#L77

Thanks
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

References:
- Re: [geomesa-users] issue with KNN efficiency
  - From: Elad Katz

Prev by Date: Re: [geomesa-users] issue with KNN efficiency
Next by Date: Re: [geomesa-users] issue with KNN efficiency
Previous by thread: Re: [geomesa-users] issue with KNN efficiency
Next by thread: Re: [geomesa-users] issue with KNN efficiency
Index(es):
- Date
- Thread

Breadcrumbs