Problem Description:
Currently in our platform we are using
geomesa to store large amounts of geographical and time
sensitive metadata, and we are experiencing very poor
performance metrics (i.e. latency) with our systems current
configuration. The primary bottleneck has to do with the large
amount of data returned by geomesa, so we are actively
pursuing avenues to reduce and shrink the size of the
responses. We have been investigating the use of MapReduce
with in the system, but have run into some knowledge gaps due
to the lack of documentation. The idea behind our MapReduce
use case is to either intercept queries coming into our
cluster, or run jobs to periodically to combine and reduce the
primary dataset and place the results into a separate table.
Ideally we would intercept the queries due to the
complications of the data reduction, since the reductions is
dependent on the parameters of a query.
MapReduce Options
·
When intercepting
queries coming into our cluster we’d have them trigger jobs
that combine and reduce the queries raw metadata into a
smaller set of formatted/processed data points which is then
returned to our backend services as the result of the query.
·
Periodically or have
events such as a write to a table trigger a job that process
and reduces the primary data set and write the result to our
new “query” table.
Questions
·
Can MapReduce jobs be
triggered by events in the database?
·
Can one intercept the
queries written to a geomesa instance?
·
How are MapReduce Jobs
initiated, and can they be triggered programmatically?
·
Can we send back the
results of a MapReduce Job as the result of a query?
·
Are there any other
options to reduce the latency occurred by large responses from
the database?
We were hoping that you'd be able to give
us some insight into our problems and additional help in terms
of the plausibility for our MapReduce and geomesa use case.