Re: [geomesa-users] Improving geomesa performance with MapReduce

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Improving geomesa performance with MapReduce

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Tue, 22 Oct 2019 12:21:49 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

Can you say more about your setup? What back-end database are you using? Are you using geoserver for querying, or something else?

Thanks,

Emilio

On 10/22/19 11:23 AM, Udstrand, Will M wrote:

Problem Description:

Currently in our platform we are using geomesa to store large amounts of geographical and time sensitive metadata, and we are experiencing very poor performance metrics (i.e. latency) with our systems current configuration. The primary bottleneck has to do with the large amount of data returned by geomesa, so we are actively pursuing avenues to reduce and shrink the size of the responses. We have been investigating the use of MapReduce with in the system, but have run into some knowledge gaps due to the lack of documentation. The idea behind our MapReduce use case is to either intercept queries coming into our cluster, or run jobs to periodically to combine and reduce the primary dataset and place the results into a separate table. Ideally we would intercept the queries due to the complications of the data reduction, since the reductions is dependent on the parameters of a query.

MapReduce Options

·        When intercepting queries coming into our cluster we’d have them trigger jobs that combine and reduce the queries raw metadata into a smaller set of formatted/processed data points which is then returned to our backend services as the result of the query.

·        Periodically or have events such as a write to a table trigger a job that process and reduces the primary data set and write the result to our new “query” table.

Questions

·        Can MapReduce jobs be triggered by events in the database?

·        Can one intercept the queries written to a geomesa instance?

·        How are MapReduce Jobs initiated, and can they be triggered programmatically?

·        Can we send back the results of a MapReduce Job as the result of a query?

·        Are there any other options to reduce the latency occurred by large responses from the database?

We were hoping that you'd be able to give us some insight into our problems and additional help in terms of the plausibility for our MapReduce and geomesa use case.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

References:
- [geomesa-users] Improving geomesa performance with MapReduce
  - From: Udstrand, Will M

Prev by Date: [geomesa-users] Improving geomesa performance with MapReduce
Next by Date: Re: [geomesa-users] Improving geomesa performance with MapReduce
Previous by thread: [geomesa-users] Improving geomesa performance with MapReduce
Next by thread: Re: [geomesa-users] Improving geomesa performance with MapReduce
Index(es):
- Date
- Thread

Breadcrumbs