Re: [geomesa-users] Queries on multiple features

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Queries on multiple features

From: Jim Hughes <jnh5y@xxxxxxxx>
Date: Mon, 5 Dec 2016 10:31:56 -0500
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

Hi Andrey,

Just to jump in...

If you are doing this in GeoServer and if the data sizes are reasonable, you might try out this GeoServer plugin: http://docs.geoserver.org/stable/en/user/extensions/querylayer/index.html. It is glue-code doing the obvious thing in terms of aggregating a spatial request built from the first layer.

For Spark SQL, we are working internally to investigate and implement some improvements. The work that Emilio linked to is from a previous effort. As part of this work, I'm actively looking into how spatial joins can be handled efficiently. We are implementing spatial predicates and functions, so that should help with building up SQL queries to do what you are describing.

As an early suggestion/note, as you use Spark, there are ways to distribute and broadcast the smaller dataset. I've found that can be handy for optimizing joins.

Generally, big data joins are hard, and we are working to support them as fully (and sensibly) as possible.

Cheers,

Jim

On 12/05/2016 10:17 AM, Andrey Morskoy wrote:

Thanks Emilio.
On Mon, Dec 5, 2016 at 5:14 PM, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
If you are going to use spark, we have a spark sql integration that you may find useful:

http://www.geomesa.org/documentation/user/web_data.html#analytic-web-service

It may not work well for joining on spatial predicates though - spatial predicates are handled by our GeoMesa layer, and non-spatial predicates are handled by the spark sql layer.
On 12/05/2016 10:04 AM, Andrey Morskoy wrote:
Thanks Emilio.
Seems that spatial joins are not a thing GeoMesa implements, so pre-join by Spark here seems to be the easiest way. That is the classical batch pre-aggregation, which I tried to avoid.

Other approach is I could try (with offline ETL job again) to build Merged feature schema, containing both POI (as points) and Naturals (MultiPolygon). I will check whether it is possible at all
On Mon, Dec 5, 2016 at 4:25 PM, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
I don't believe that there is a CQL method to join two schemas. As you mentioned, a spark join would probably be the best approach. Another option would be to do a two-part query, first against natural objects and then building up a filter against POI using the returned geometries. If you're using geoserver, you might look into 'application schema' and complex features, which let you merge two schemas into a single 'view' (it might be overly complex and also pretty slow though):

http://docs.geoserver.org/stable/en/user/data/app-schema/complex-features.html

Thanks,

Emilio

On 12/05/2016 07:02 AM, Andrey Morskoy wrote:
I have a common case, as it seems, but not sure, if I understand how to handle it correctly.

First of all, I have 2 feature types stored in geomesa.example Accumulo catalog:

* Natural objects (polygons + suppose attribute "type"=water/forest)

* POI (points + suppose attribute "name")

What is the correct way to query "Give me list of POI within 1km to Natural objects of type=water"

It looks like a Spark RDD join - but in fact I am searching for native CQL query in this case.

Thanks
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________ geomesa-users mailing list geomesa-users@xxxxxxxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________ geomesa-users mailing list geomesa-users@xxxxxxxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] Queries on multiple features
  - From: Andrey Morskoy

References:
- [geomesa-users] Queries on multiple features
  - From: Andrey Morskoy
- Re: [geomesa-users] Queries on multiple features
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Queries on multiple features
  - From: Andrey Morskoy
- Re: [geomesa-users] Queries on multiple features
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Queries on multiple features
  - From: Andrey Morskoy

Prev by Date: Re: [geomesa-users] Queries on multiple features
Next by Date: Re: [geomesa-users] Will GeoMesa support AWS DynamoDB?
Previous by thread: Re: [geomesa-users] Queries on multiple features
Next by thread: Re: [geomesa-users] Queries on multiple features
Index(es):
- Date
- Thread

Breadcrumbs