Re: [geomesa-users] KNN-Queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] KNN-Queries

From: Michael Ronquest <ronquest@xxxxxxxx>
Date: Wed, 15 Jul 2015 11:23:42 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <http://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <http://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

Hi Marcel,

It is interesting if you are seeing a performance differencebetween the two methods: runNewKNNQuery just creates the GeoHashSpiraland NearestNeighbors for you, and then runs the runKNNQuery method. Doyou think you could quantify the performance difference? Also whatparameters are you currently using for "k", "searchDistanceInMeters" and"maxDistanceInMeters"?

You can run your query without a filter by using the ECQL filterINCLUDE, which includes everything. Specifically,org.opengis.filter.Filter.INCLUDE from GeoTools is what you want.

It sounds like you've got an interesting thesis topic on your hands! Inthe future we'd be interested to hear about your results!


All the best,
Mike



On 07/15/2015 07:12 AM, Marcel wrote:

Hey Mike,
thanks for the detailed answer. With this it was possible to get myknn-query working. I tested the KNNQuery.runKNNQuery and theKNNQuery.runNewKNNQuery method. I decided to take the first option,because the performance seems to be a little better.Is there any possibility that I can run my query without a filter? Idont want to filter on time but when I create something likenew Query("gdelt", null, new String[]{"SQLDATE", "geom"}) (set filterto null) the program won´t finish.
I´m currently working on my masterthesis with focus on storage andquerying geotemporal data in the hadoop ecosystem. Thats why I examinesome technologies in detail. I dont have a specific use case, so I´msatisfied working with the GDELT-Dataset (I noticed, that the column"url" was discarded).
Regards,
Marcel.


Am 14.07.2015 20:18, schrieb Michael Ronquest:
Hi Marcel,
Thanks for writing in, as well as your interest in the KNNmethod in GeoMesa. Once things are working for you, I'd be *very*interested in receiving additional feedback, as well as hearing a bitabout your use case.
In short, the KNN algorithm begins by searching in a geohash thatcontains your point of reference, with the spatial scale of thegeohash set in the query process. Once all features in that centralgeohash are processed, the algorithm then begins to "spiral" out toneighboring geohashes as needed to either find k neighbors, or toensure the current k "best" neighbors are indeed the k nearestneighbors.
Your instinct regarding the KNNQuery is correct: that is what youwant to use. Apologies for the "magic" parameters: KNNQuery is usedby the KNearestNeighborSearchProcess, and the parameters are betterexplained there.Note: the KNNSearchProcess class is used by GeoServer WPS processes,with a good deal of related boilerplate, so stay away from that.
The runNewKNNQuery method has these parameters:
source: SimpleFeatureSource ===> where your data reside: note thisreally should be a GeoMesa Source as we attempt to exploit itsgeospatial index in the algorithm
query: Query ===> your "base" query which would include filters onattributes, time and space.
numDesired: Int ===> this is simply "k", how many points you seek
searchDistanceInMeters:Double ===> this is the "typical" distanceyou'd expect to find k points in your data and serves as a "initialguess" for the search and defines the spatial scale at which theiterative query by GeoHash will run.If I was looking for 1000 tweets in Manhattan over the course of aday, I'd set this to ~500 meters, while if I'm looking for 1000tweets around Nageezi, New Mexico, I'd set this to 100000 meters ormore. The search is iterative here, so err toward smaller distanceshere (at the potential cost of a slower process, as more "geohashqueries" will need to be made).
maxDistanceInMeters: Double ===> this is the maximum distance atwhich the algorithm will search and acts almost like an additionalpredicate on your Query: this prevents runaway queries. For example,imagine in your case if you ask for k=1000 when you only have 100Features around Beijing. The KNN process would then "spiral" out fromBeijing, geohash by geohash, querying GeoMesa each time foradditional Features. If you only have sparse data outside ofBeijing, then the KNN algorithm my churn for a great while, perhapsover the entire planet! So this parameter prevents that. It ispossible to get edge effects here, so error on much larger distanceshere.
aFeatureForSearch: SImpleFeature ===> this is the reference pointaround which to search.
With the parameters defined, you'd then do something like this:

||
|Query theQuery = new Query("gdelt", timeFilter, new String[] |||{||"SQLDATE"||, ||"geom"| |})|);
        // want 100 points
        Int k = 100;

        // Beijing is dense....
        Double guessedDistance = 1000.0;|
|
        // very roughly the "radius" of china
        Double maxLimitDistance = 2500000.0
|| NearestNeighbors neighbors = KNNQuery.runKNNQuery(fs,theQuery, k, guessedDistance, maxLimitDistance, beijingCenter);
|
|||||||||||||||||||||
|
where fs and timeFilter are as you've previously defined them andbeijingCenter is a SimpleFeature with your point as its geometry.
Hopefully this will help. Please report back on further issues orsuccess.
Cheers,
Mike



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, orunsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, orunsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] KNN-Queries
  - From: Marcel

References:
- [geomesa-users] KNN-Queries
  - From: Marcel
- Re: [geomesa-users] KNN-Queries
  - From: Michael Ronquest
- Re: [geomesa-users] KNN-Queries
  - From: Marcel

Prev by Date: [geomesa-users] Moving from geomesa-accumulo1.5-1.0.0-rc.7 to geomesa-1.1.0-rc.2
Next by Date: Re: [geomesa-users] Moving from geomesa-accumulo1.5-1.0.0-rc.7 to geomesa-1.1.0-rc.2
Previous by thread: Re: [geomesa-users] KNN-Queries
Next by thread: Re: [geomesa-users] KNN-Queries
Index(es):
- Date
- Thread

Breadcrumbs