Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] KNN-Queries

Hi Marcel,
Thanks for writing in, as well as your interest in the KNN method in GeoMesa. Once things are working for you, I'd be *very* interested in receiving additional feedback, as well as hearing a bit about your use case.

In short, the KNN algorithm begins by searching in a geohash that contains your point of reference, with the spatial scale of the geohash set in the query process. Once all features in that central geohash are processed, the algorithm then begins to "spiral" out to neighboring geohashes as needed to either find k neighbors, or to ensure the current k "best" neighbors are indeed the k nearest neighbors.

Your instinct regarding the KNNQuery is correct: that is what you want to use. Apologies for the "magic" parameters: KNNQuery is used by the KNearestNeighborSearchProcess, and the parameters are better explained there. Note: the KNNSearchProcess class is used by GeoServer WPS processes, with a good deal of related boilerplate, so stay away from that.

The runNewKNNQuery method has these parameters:
source: SimpleFeatureSource ===> where your data reside: note this really should be a GeoMesa Source as we attempt to exploit its geospatial index in the algorithm

query: Query ===> your "base" query which would include filters on attributes, time and space.

numDesired: Int ===> this is simply "k", how many points you seek

searchDistanceInMeters:Double ===> this is the "typical" distance you'd expect to find k points in your data and serves as a "initial guess" for the search and defines the spatial scale at which the iterative query by GeoHash will run. If I was looking for 1000 tweets in Manhattan over the course of a day, I'd set this to ~500 meters, while if I'm looking for 1000 tweets around Nageezi, New Mexico, I'd set this to 100000 meters or more. The search is iterative here, so err toward smaller distances here (at the potential cost of a slower process, as more "geohash queries" will need to be made).

maxDistanceInMeters: Double ===> this is the maximum distance at which the algorithm will search and acts almost like an additional predicate on your Query: this prevents runaway queries. For example, imagine in your case if you ask for k=1000 when you only have 100 Features around Beijing. The KNN process would then "spiral" out from Beijing, geohash by geohash, querying GeoMesa each time for additional Features. If you only have sparse data outside of Beijing, then the KNN algorithm my churn for a great while, perhaps over the entire planet! So this parameter prevents that. It is possible to get edge effects here, so error on much larger distances here.

aFeatureForSearch: SImpleFeature ===> this is the reference point around which to search.


With the parameters defined, you'd then do something like this:

||
|Query theQuery = new Query("gdelt", timeFilter, new String[] |||{ ||"SQLDATE"||, ||"geom"| |})|);

        // want 100 points
        Int k = 100;

        // Beijing is dense....
        Double guessedDistance = 1000.0;|
|
        // very roughly the "radius" of china
        Double maxLimitDistance = 2500000.0

|| NearestNeighbors neighbors = KNNQuery.runKNNQuery(fs, theQuery, k, guessedDistance, maxLimitDistance, beijingCenter);
|
|||||||||||||||||||||
|
where fs and timeFilter are as you've previously defined them and beijingCenter is a SimpleFeature with your point as its geometry.

Hopefully this will help. Please report back on further issues or success.

Cheers,
Mike





Back to the top