Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Advise on Geomesa data streaming

Hi José,

It sounds like you want to have all the results in a single thread? If so, the regular geotools API provides exactly that functionality. The features are returned as an iterator that lazily loads results from accumulo, so you don't have to worry about memory. Our spark libraries use the data store under the hood, so you just have to access it slightly differently. To get a data store, follow the example here:

http://www.geomesa.org/documentation/user/accumulo/usage.html#creating-a-data-store

To get a feature iterator, follow the example here (if you want all features, use Filter.INCLUDE instead of the CQL in the example):

https://github.com/geomesa/geomesa-tutorials/blob/master/geomesa-quickstart-accumulo/src/main/java/com/example/geomesa/accumulo/AccumuloQuickStart.java#L239-L257

Thanks,

Emilio

On 03/24/2017 12:25 PM, Jose Bujalance wrote:
Hi,

I have a Geomesa-Accumulo datastore containing 80M rows, and I would like to use that data to update a routing graph (GraphHopper).
In order to do that, I have to get all the Geomesa data row by row. I am working on Java with the Geomesa-Spark API (version 1.3.1). Obviously, I can't instantiate a Java list with 80M elements using the RDD.collect() method, because of obvious memory limitations.
So I am looking for a way to stream my Geomesa data row by row in an efficient way.
Right now, I am thinking about some tools like Spark Streaming, Geomesa-Stream, or Storm, but I have never used non of them, so I don't know if that's what I need and which tool would be more adapted to my problem.
Any idea?

Thanks a lot,

José


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top