Hi Andrey,
Just to jump in...
If you are doing this in GeoServer and if the data sizes are
reasonable, you might try out this GeoServer plugin:
http://docs.geoserver.org/stable/en/user/extensions/querylayer/index.html.
It is glue-code doing the obvious thing in terms of aggregating a
spatial request built from the first layer.
For Spark SQL, we are working internally to investigate and
implement some improvements. The work that Emilio linked to is
from a previous effort. As part of this work, I'm actively
looking into how spatial joins can be handled efficiently. We are
implementing spatial predicates and functions, so that should help
with building up SQL queries to do what you are describing.
As an early suggestion/note, as you use Spark, there are ways to
distribute and broadcast the smaller dataset. I've found that can
be handy for optimizing joins.
Generally, big data joins are hard, and we are working to support
them as fully (and sensibly) as possible.
Cheers,
Jim
On 12/05/2016 10:17 AM, Andrey Morskoy wrote: