[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [geomesa-users] Transformations, Edge effects?
|
Hi Marcel,
The KNN query works by repeatedly querying small spiralling regions
around the center point until it has found K neighbors. Theoretically,
each region should be exclusive, so there shouldn't be any duplicates
(assuming Point geometries in your features). However, there isn't any
explicit de-duplicating, so it's entirely possible there is a subtle
bug. It was also written to work against our older data index, whereas
now the queries are likely going to a different index, which might
introduce bugs.
I've filed a defect to track the issue here:
https://geomesa.atlassian.net/browse/GEOMESA-910
Unfortunately, the developer who wrote the KNN query has moved on to
another project. However, he mentioned that you might want to set your
search distance to a smaller value, especially if you are only trying to
retrieve 10 features. I believe that the search distance should be a
best guess as to the radius that contains your K neighbors.
Additionally, you are likely running into the same issues I mentioned
previously with large date ranges. Increasing your memory or chunking up
your time frames might alleviate that.
Thanks,
Emilio
On 09/09/2015 12:20 PM, Marcel wrote:
Hey,
okay I checked these functions and found that distance() is
calculating the euclidean distance. This is not suitable for my
purposes. So I wrote a work around by creating a circle with a certain
radius and edges, calculated these new positions using
GeodeticCalculator and finally creating a polygon with these points.
While iterating over my data I can calculate the distance.
The duplicate entry occurs for a KNN-Query with a temporal constraint.
If you want to reproduce the issue, I added my query down here. I
think it´s a bug within the KNN-Query. The resulting globaleventid
which occurs four times is: 253015471.
/**
* find top 10 events where the usa investigated anything
(eventrootcode = 09) in the years from 2004 to 2014 with washington as
* origin.
*/
private static Iterator<SimpleFeatureWithDistance>
getResultsForQuery18(Map<String, String> dsConf) {
SimpleFeatureSource featureSource =
SimpleFeatureSourceFactory.getSimpleFeatureSource(dsConf);
GeometryFactory geomFactory = new GeometryFactory();
// coordinates for washington
double[] coordinates = { -77.0145665, 38.8993488 };
Coordinate coord = new Coordinate(coordinates[0],
coordinates[1]);
Point point = geomFactory.createPoint(coord);
DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
Date start = null;
Date end = null;
try {
start = df.parse("2004-01-01");
end = df.parse("2014-12-31");
} catch (java.text.ParseException e) {
e.printStackTrace();
}
FilterFactory2 ff = CommonFactoryFinder.getFilterFactory2();
Filter timeFilter =
ff.between(ff.property(GDELTConstants.DATE), ff.literal(start),
ff.literal(end));
ArrayList<Filter> attributeFilters = new ArrayList<Filter>();
Filter attributeFilter1 =
ff.equal(ff.property(GDELTConstants.EVENT_ROOT_CODE),
ff.literal("09"), false);
attributeFilters.add(attributeFilter1);
Filter attributeFilter2 =
ff.equal(ff.property(GDELTConstants.ACTOR1_COUNTRY_CODE),
ff.literal("USA"), false);
attributeFilters.add(attributeFilter2);
Filter attributeFilter3 =
ff.not(ff.isNull(ff.property(GDELTConstants.ACTOR1GEO_LONG)));
attributeFilters.add(attributeFilter3);
Filter attributeFilter4 =
ff.not(ff.isNull(ff.property(GDELTConstants.ACTOR1GEO_LAT)));
attributeFilters.add(attributeFilter4);
Filter attributeFilterCombined = ff.and(attributeFilters);
Filter completeFilter = ff.and(timeFilter,
attributeFilterCombined);
int numberOfResults = 10;
NearestNeighbors neighborsPrepare =
NearestNeighbors.apply(point, numberOfResults);
// initial guess for getting k points - assuming that one day
will not result k points
double searchDistanceInMeters = 21000000;
//maximum distance between two points on earth
double maximumdistanceInMeters = 40075160;
GeoHashSpiral spiral = GeoHashSpiral.apply(point,
searchDistanceInMeters, maximumdistanceInMeters);
Query q = new
Query(dsConf.get(AccumuloDataStoreConfiguration.FEATURE_NAME),
completeFilter,
new String[] { GDELTConstants.GLOBAL_EVENTID,
GDELTConstants.DATE, GDELTConstants.GEOM });
NearestNeighbors neighbors =
KNNQuery.runKNNQuery(featureSource, q, spiral, neighborsPrepare);
return
JavaConversions.asJavaIterator(neighbors.getK().iterator());
}
I wrote another query and just returned the eventrecord with
globaleventid = 253015471. Only one record returned.
Also this query is very slow. Do you have any ideas for a speed up by
chaning some parameters like searchDistanceInMeters or
maximumDistanceInMeters?
Thanks,
Marcel Jacob.
Am 04.09.2015 20:24, schrieb Jim Hughes:
Hi Marcel,
The functions you are calling are actually GeoTools methods. To see
a list available, you can check out your the WFS GetCapabilities
document from GeoServer (1) under the ogc:Function_Names tag.
Distance is a tricky thing: When ones ask for a distance
calculation, the units will be determined by the Coordinate Reference
System (CRS). GeoMesa makes the assumption that all your data is
longitude / latitude which is EPSG:4326. In that CRS, the unit of
measurement is degrees.
In order to get a 'more helpful' answer, libraries like GeoTools
GeodeticCalculator or our GeoMesa wrapper (2) can take two points
specified in lon-lat and return the distance in meters. Those
libraries use the Haversine formula or the Vincenty's formula (3).
For the Point(0,0), it is on a corner of a GeoHash. In our
implementation, GeoHashes contain their bottom and left edge, so this
point is in the 's' 5-bit GeoHash.
One easy way to see duplicate data in GeoMesa is if you have entered
the same data multiple times without specifying the feature id. If
that's not what has happened, you may have found a bug. If you can
write up some steps to reproduce, I'm happy to check things out.
Thanks,
Jim
1. For example:
http://your-server/geoserver/ows?service=wfs&version=1.0.0&request=GetCapabilities
2. GeoMesa's Scala wrapper about the GeoTools GeodeticCalculator:
https://github.com/locationtech/geomesa/blob/master/geomesa-utils/src/main/scala/org/locationtech/geomesa/utils/geohash/GeomDistance.scala
Unit tests/examples of use:
https://github.com/locationtech/geomesa/blob/master/geomesa-utils/src/test/scala/org/locationtech/geomesa/utils/geohash/GeomDistanceTest.scala
3.
https://en.wikipedia.org/wiki/Haversine_formula
https://en.wikipedia.org/wiki/Vincenty's_formulae
On 09/04/2015 12:54 PM, Marcel wrote:
Hey,
I played around with some geomesa transformations like strConcat()
and distance(). This returns the distance in degrees which looks
kind of unfamiliar to me. Is there a transformation, which returns
the distance in meters or kilometers (given two points)? Which
distance do you calculate (euclidean distance, distance using
haversin formula or based on an ellipsoid)?
Looking at the results of another query I noticed that one record
occurs 4 times (Point(0, 0)). I could imagine that there is the
boundary of a geohash and this point intersects with all of the four
geohashes around. Do I have to remove these duplicates afterwards?
Thanks again,
Marcel.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users