Re: [geomesa-users] KNN-Queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] KNN-Queries

From: Michael Ronquest <ronquest@xxxxxxxx>
Date: Tue, 14 Jul 2015 14:18:35 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <http://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Hi Marcel,

Thanks for writing in, as well as your interest in the KNNmethod in GeoMesa. Once things are working for you, I'd be *very*interested in receiving additional feedback, as well as hearing a bitabout your use case.

In short, the KNN algorithm begins by searching in a geohash thatcontains your point of reference, with the spatial scale of the geohashset in the query process. Once all features in that central geohash areprocessed, the algorithm then begins to "spiral" out to neighboringgeohashes as needed to either find k neighbors, or to ensure the currentk "best" neighbors are indeed the k nearest neighbors.

Your instinct regarding the KNNQuery is correct: that is what you wantto use. Apologies for the "magic" parameters: KNNQuery is used by theKNearestNeighborSearchProcess, and the parameters are better explainedthere.Note: the KNNSearchProcess class is used by GeoServer WPS processes,with a good deal of related boilerplate, so stay away from that.


The runNewKNNQuery method has these parameters:

source: SimpleFeatureSource ===> where your data reside: note thisreally should be a GeoMesa Source as we attempt to exploit itsgeospatial index in the algorithm

query: Query ===> your "base" query which would include filters onattributes, time and space.


numDesired: Int ===> this is simply "k", how many points you seek

searchDistanceInMeters:Double ===> this is the "typical" distance you'dexpect to find k points in your data and serves as a "initial guess" forthe search and defines the spatial scale at which the iterative query byGeoHash will run.If I was looking for 1000 tweets in Manhattan over the course of a day,I'd set this to ~500 meters, while if I'm looking for 1000 tweets aroundNageezi, New Mexico, I'd set this to 100000 meters or more. The searchis iterative here, so err toward smaller distances here (at thepotential cost of a slower process, as more "geohash queries" will needto be made).

maxDistanceInMeters: Double ===> this is the maximum distance at whichthe algorithm will search and acts almost like an additional predicateon your Query: this prevents runaway queries. For example, imagine inyour case if you ask for k=1000 when you only have 100 Features aroundBeijing. The KNN process would then "spiral" out from Beijing, geohashby geohash, querying GeoMesa each time for additional Features. If youonly have sparse data outside of Beijing, then the KNN algorithm mychurn for a great while, perhaps over the entire planet! So thisparameter prevents that. It is possible to get edge effects here, soerror on much larger distances here.

aFeatureForSearch: SImpleFeature ===> this is the reference point aroundwhich to search.



With the parameters defined, you'd then do something like this:

||

|Query theQuery = new Query("gdelt", timeFilter, new String[] |||{||"SQLDATE"||, ||"geom"| |})|);


        // want 100 points
        Int k = 100;

        // Beijing is dense....
        Double guessedDistance = 1000.0;|
|
        // very roughly the "radius" of china
        Double maxLimitDistance = 2500000.0

|| NearestNeighbors neighbors = KNNQuery.runKNNQuery(fs,theQuery, k, guessedDistance, maxLimitDistance, beijingCenter);

|
|||||||||||||||||||||
|

where fs and timeFilter are as you've previously defined them andbeijingCenter is a SimpleFeature with your point as its geometry.


Hopefully this will help. Please report back on further issues or success.

Cheers,
Mike

Follow-Ups:
- Re: [geomesa-users] KNN-Queries
  - From: Marcel

References:
- [geomesa-users] KNN-Queries
  - From: Marcel

Prev by Date: [geomesa-users] KNN-Queries
Next by Date: Re: [geomesa-users] KNN-Queries
Previous by thread: [geomesa-users] KNN-Queries
Next by thread: Re: [geomesa-users] KNN-Queries
Index(es):
- Date
- Thread

Breadcrumbs