Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] query that checks "between" times does not work for GeoMesaSpark

Hi Serge,

As follow-up for this, we filed a bug (1), Emilio sorted out a fix, and it has been merged into master today.  It should address this issue. 

If you have a chance to try it out, that'd be great. 

Thanks,

Jim

1. https://geomesa.atlassian.net/browse/GEOMESA-881

On 07/30/2015 07:34 PM, James Hughes wrote:
Hi Serge,

Thanks for writing in with this error.  GeoMesa Spark handles queries a little differently than the export tool and the GeoServer UI.  In the latter two tools, we can gather all the data into one JVM.  

The GeoMesa query planner sometimes takes a query and splits it into multiple Accumulo scan plans.  That can lead to duplicate results, and hence in the typical use case, GeoMesa deduplicates the results.

For GeoMesa's Spark support, deduplication may cause immense trouble and performance issues.  Due to that, when a query is turned into a multiple scan plans, we can throw the exception you see.  We have recently introduced a better spatio-temporal index, and it is being used to satisfy your query.  That new index splits up data by week, and creates different scan plans for the first, middle, and last weeks.  

In this case, we are being too cautious and we'll add some special handling for this case.  It may take a few days since I am out on vacation.

Thanks again; I'll try back once we have a fix sorted.

Cheers,

Jim

----- Original Message -----
From:
"Geomesa User discussions" <geomesa-users@xxxxxxxxxxxxxxxx>

To:
<geomesa-users@xxxxxxxxxxxxxxxx>
Cc:

Sent:
Thu, 30 Jul 2015 18:51:05 -0400
Subject:
[geomesa-users] query that checks "between" times does not work for GeoMesaSpark


I am running the next query on gdelt_Ukraine example from spark:

    val ds = DataStoreFinder.getDataStore(params).asInstanceOf[AccumuloDataStore]

    val cqlFilter = CQL.toFilter("[[bbox(geom, 34, 46, 35, 45.8)] AND [SQLDATE BETWEEN '2012-02-01T00:00:00.000Z' AND '2015-05-02T00:00:00.000Z']]")

    val q = new Query("gdelt", cqlFilter)

    // Configure Spark

    val conf = new Configuration

    val sparkConf = new SparkConf(true).setMaster("local")

             .setAppName("testSpark")

             .set("spark.executor.memory", "1g")            

    val sconf = GeoMesaSpark.init(sparkConf, ds)

    val sc = new SparkContext(sconf)


    // Create an RDD from a query

    val queryRDD = GeoMesaSpark.rdd(conf, sc, params, q)

    logger.info("Count queryRDD: " + queryRDD.count())The resulting RDD count() is 0.


The ERROR that I see on console is:

[2015-07-30 17:34:54,932] ERROR org.locationtech.geomesa.compute.spark.GeoMesaSpark$: The query being executed requires multiple scans, which is not currently supported by geomesa. Your result set will be partially incomplete. This is most likely due to an OR clause in your query. Query: BBOX(geom, 34.0,45.8,35.0,46.0) AND SQLDATE BETWEEN '2012-02-01T00:00:00.000Z' AND '2015-05-02T00:00:00.000Z'

but I do not have OR clause.

If I run it only with CQL.toFilter("[[bbox(geom, 34, 46, 35, 45.8)])

everything works and does filtering.

If I run CQL.toFilter("[SQLDATE BETWEEN '2012-02-01T00:00:00.000Z' AND '2015-05-02T00:00:00.000Z']]")

I see the same problem as above.

Also everything works using:

geomesa export -u user -p password -c gdelt_Ukraine -fn gdelt -fmt csv -max 50 -q "[[SQLDATE BETWEEN '2014-02-01T00:00:00.000Z' AND '2014-05-02T00:00:00.000Z'] AND [bbox(geom, 34, 46, 35, 45.8)]]"

or from geoserver UI.

Please, let me know, what is the problem.







_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top