Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Newer geomesa user, data aggregation and visualization question

Analytics are a great way to reduce the amount of data being passed around. We do distributed processing through WPS in our HBase and Accumulo back-ends, but Cassandra doesn't support the kind of custom code. So using spark is a great fallback (and very flexible).

Another thing to consider is scaling your WMS server. Since the back-end is already parallelized, usually the next bottleneck is the server. If you're using geoserver you can parallelize requests using tiling. There are some details here: http://geoserver.geo-solutions.it/educational/en/clustering/index.html

GeoMesa also supports some data formats that help the client process large amounts of vector data. We have a custom binary data format, and we recently added support for returning Apache Arrow vectors. The visualizations you see on our home page (http://www.geomesa.org/) use those formats with a custom _javascript_ web application. If you're interested in that let us know, some of the geoserver code hasn't been publicly released yet as we need to work out software licenses.

Thanks,

Emilio

On 05/31/2017 01:38 PM, David Boyd wrote:

Michael:

     So you don't say what you are using for visualization/display (thick client or web application) but the core
issue is the same.   The approach you describe is one good way of approaching the problem.  You may also
want to look at using Web Processing Service (WPS).   GeoMesa has support for that and will do the work
distributed across the cluster.   I have had some success with the Geomesa heat map, but have not tried
any other WPS functions.

    The end problem for the visualization is two fold:

1.  Getting the data from the database to client
2. Rendering the data so that it is something other than a blob.

Typically, even if you can accomplish #1 quickly, most front end UI tools tend
to get overwhelmed with all the data.  Hence server side rendering.  The harder
part is to render something other than a blob.

The root of the question is what task is the analyst trying to solve with the data?
And what in that task requires all that data be visualized?

Typically, for other Big Data visualizations, you would run some sort of analytic
server side that would reduce the data volume by limiting the display to what is of
interest to the analyst.    For example, on one project with AIS (ship location data) if
you just plotted it you got a blob.   Using statistical clustering approaches we limited
the display to just the anomolies (those tracks that deviated from normal shipping
lanes).

The technical tools are out there to support this (Spark, WPS) but you would need
define the goals of your visualization.


On 5/31/17 12:53 PM, Michael Bowen wrote:
Hello All,

Been playing with Geomesa for the last month or so. Primary use case is to perform visualizations of a large amount of geospatial sensor data (anywhere from a few hundred gigabytes to a few terabytes). Currently, I'm ingesting the data into Cassandra data store as a simple feature type (lat, lon pairs with sensor readings over time). 

When I don't use geowebcache I can display a few hundred thousand readings from the cassandra data store and dynamically interact with it in the browser. When using geowebcache I can scale to a few million points. This is all on a single node, with a cassandra replication factor of 2.

I would like to be able to display possibly billions of points of data. My overarching question is about scalability and performance - I'm fairly new to displaying geospatial data on this scale, and am wondering what routes I should try and explore next in geomesa to visualize the vast stores of data. My initial thought is to query the data into a geomesa spark spatial rdd, aggregate the readings into spatial bins, put the data into a separate data store, and then display the aggregated data results. Any tips/advice is greatly appreciated!

Cheers,

Mike


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

-- 
========= mailto:dboyd@xxxxxxxxxxxxxxxxx ============
David W. Boyd                     
VP,  Data Solutions       
10432 Balls Ford, Suite 240  
Manassas, VA 20109         
office:   +1-703-552-2862        
cell:     +1-703-402-7908
============== http://www.incadencecorp.com/ ============
ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
Chair ANSI/INCITS TC Big Data
Co-chair NIST Big Data Public Working Group Reference Architecture
First Robotic Mentor - FRC, FTC - www.iliterobotics.org
Board Member- USSTEM Foundation - www.usstem.org

The information contained in this message may be privileged 
and/or confidential and protected from disclosure.  
If the reader of this message is not the intended recipient 
or an employee or agent responsible for delivering this message 
to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication 
is strictly prohibited.  If you have received this communication 
in error, please notify the sender immediately by replying to 
this message and deleting the material from any computer.

 


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top