Hi Andrey,
      
      Just to jump in...  
      
      If you are doing this in GeoServer and if the data sizes are
      reasonable, you might try out this GeoServer plugin: 
http://docs.geoserver.org/stable/en/user/extensions/querylayer/index.html. 
      It is glue-code doing the obvious thing in terms of aggregating a
      spatial request built from the first layer.  
      
      For Spark SQL, we are working internally to investigate and
      implement some improvements.  The work that Emilio linked to is
      from a previous effort.  As part of this work, I'm actively
      looking into how spatial joins can be handled efficiently.  We are
      implementing spatial predicates and functions, so that should help
      with building up SQL queries to do what you are describing.
      
      As an early suggestion/note, as you use Spark, there are ways to
      distribute and broadcast the smaller dataset.  I've found that can
      be handy for optimizing joins.
      
      Generally, big data joins are hard, and we are working to support
      them as fully (and sensibly) as possible.
      
      Cheers,
      
      Jim
      
      On 12/05/2016 10:17 AM, Andrey Morskoy wrote: