Re: [geomesa-users] Advise on Geomesa data streaming

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Advise on Geomesa data streaming

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Fri, 24 Mar 2017 14:24:27 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0

Hi José,

It sounds like you want to have all the results in a single thread? If so, the regular geotools API provides exactly that functionality. The features are returned as an iterator that lazily loads results from accumulo, so you don't have to worry about memory. Our spark libraries use the data store under the hood, so you just have to access it slightly differently. To get a data store, follow the example here:

http://www.geomesa.org/documentation/user/accumulo/usage.html#creating-a-data-store

To get a feature iterator, follow the example here (if you want all features, use Filter.INCLUDE instead of the CQL in the example):

https://github.com/geomesa/geomesa-tutorials/blob/master/geomesa-quickstart-accumulo/src/main/java/com/example/geomesa/accumulo/AccumuloQuickStart.java#L239-L257

Thanks,

Emilio

On 03/24/2017 12:25 PM, Jose Bujalance wrote:

Hi,

I have a Geomesa-Accumulo datastore containing 80M rows, and I would like to use that data to update a routing graph (GraphHopper).

In order to do that, I have to get all the Geomesa data row by row. I am working on Java with the Geomesa-Spark API (version 1.3.1). Obviously, I can't instantiate a Java list with 80M elements using the RDD.collect() method, because of obvious memory limitations.

So I am looking for a way to stream my Geomesa data row by row in an efficient way.

Right now, I am thinking about some tools like Spark Streaming, Geomesa-Stream, or Storm, but I have never used non of them, so I don't know if that's what I need and which tool would be more adapted to my problem.

Any idea?

Thanks a lot,

José
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

Follow-Ups:
- Re: [geomesa-users] Advise on Geomesa data streaming
  - From: Jose Bujalance

References:
- [geomesa-users] Advise on Geomesa data streaming
  - From: Jose Bujalance

Prev by Date: Re: [geomesa-users] Advise on Geomesa data streaming
Next by Date: [geomesa-users] geomesa-hbase-datastore's class DataStore return null and i got a NullPointException
Previous by thread: Re: [geomesa-users] Advise on Geomesa data streaming
Next by thread: Re: [geomesa-users] Advise on Geomesa data streaming
Index(es):
- Date
- Thread

Breadcrumbs