Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Compilaiton error in Java when applying "map" to an RDD generated via GeoMesaSpark.rdd

Luca,

Sorry for the trouble; I'll respond inline.

On 05/09/2015 10:01 PM, Luca Morandini wrote:
On 09/05/15 22:19, James Hughes wrote:

Language issues aside, for your case, it sounds like Spark 1.3.1 may work out. I missed the details, but some folks who are using GeoMesa internally also ran into
issues with Spark 1.3.0.

Since Spark is a compiled dependency and we don't package it, I think you'll be able to update the Spark versions in your pom and see your tests work. (That's
the very best case scenario.)

Unfortunately, it did not work. I installed Spark 1.3.1 in place of 1.3.0-SNAPSHOT and re-run the integration test, to no avail.

From reading https://issues.apache.org/jira/browse/SPARK-6152 again, I noticed the suggested work-around:

"Workaround: Don't compile Scala classes to Java 8, Scala 2.11 does not support nor require any Java 8 features."

Perhaps things will work out if you build GeoMesa with Java 6/7?

On the other hand, since bumping the version of Spark didn't work out, maybe managing the version of reflectasm in your project's pom would. If you specify the correct version of reflectasm, it should be used in the unit test. If this works, you may end up needing to try building a custom version of Spark which uses that updated reflectasm. It would be a positive path, but may make your deployment a tad more finicky.


If upgrading Spark doesn't help, you may have to try compiling with Java 6 or 7 (assuming that you don't have too many lambdas to rewrite), or compile your functional Java code with Java 8 and try to compile the rest with Java 7.

It looks like pain to me.

For the time being I patched my code with this utterly stolid (and doomed to fail with any sizeable amount of data) method:

  public static JavaRDD<Tweet> loadFromGeoMesaTable(JavaSparkContext sc,
      GeoMesaOptions options) throws IOException, SchemaException {

    List<Tweet> featList = new ArrayList<Tweet>();
SimpleFeatureIterator featIter = TweetFeatureStore.getFeatureType(options)
        .getFeatures().features();
    try {
      while (featIter.hasNext()) {
        featList.add(new Tweet(featIter.next()));
      }
    } finally {
      featIter.close();
    }

    return sc.parallelize(Lists.newArrayList(featList));
  }

I know it sucks... but is there a better way to build an RDD out of a FeatureType using parallelize?

In general, there are some options we can explore. Assuming that Spark executors are on Accumulo tablet servers, we could cook up some functions to figure out where code is running and then retrieve local entries based on a query.

It'd be a tad fragile, so let's put that on the back burner;)

Thanks again for your time&patience,

Likewise; thank you very much for reporting the issue and keeping with it. GeoMesa lives at an exciting intersection of technologies, and this discussion will help document solutions!

Thanks,

Jim

Luca Morandini
Data Architect - AURIN project
Melbourne eResearch Group
Department of Computing and Information Systems
University of Melbourne
Tel. +61 03 903 58 380
Skype: lmorandini
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users



Back to the top