Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Which backend to pick?

Hi John,

Great question. Benchmarks are likely the most challenging to come by. In terms of capabilities, I'll add a bit more to Anthony's response. GeoMesa on Accumulo is the most mature. GeoMesa on HBase is next. HBase and Accumulo both allow for rich server side programming, and this spring, the GeoMesa team ported over most of the Accumulo iterator work to HBase coprocessors.

From what I can see, C* has a much more limited server programming paradigm and we haven't explored it fully. GeoMesa just uses C* as a key-value store, so I don't believe there is any need to install GeoMesa jars on the C* cluster. (Shout if I'm missing anything! Also, if there's community support or funding for maintaining a C* 2.x support, I'm happy to discuss that further.)

The FS datastore is a new, exciting option. Since it is new, there hasn't been much work on expanding the features yet. That said, one of the big goals is to provide 'cheap' access to volumes of data without needing to manage an explicit database. I expect this'll be handy for making data which would otherwise be 'off-line' useful.

Cheers,

Jim

On 09/04/2017 08:26 AM, Anthony Fox wrote:
John,

For non-point geometries, your best bet right now is
Accumulo/HBase/Cassandra.  I think Accumulo is the most mature but HBase
and Cassandra are catching up.  Geodocker recently got some attention
from a user who was interested in on-premise instantiations.  Check out
https://anitagraser.com/2017/08/27/getting-started-with-geomesa-using-geodocker/
for a tutorial.

The FS data store is the most light weight implementation and one that I
think will continue to gain traction.  As of now, it only supports
points as you discovered.  And it is intended for relatively static data
sets, ones which you can manage ingest and compaction manually.  I don't
think it would be a stretch to get it to support linestrings and
polygons as the xz index produces an integer index the same as the z2
index.  Attribute indexes may be a bit more of challenge as the range of
values is not known apriori.  We are adding manual compaction tools
which could be used to manage attribute (and other) indexes more
effectively.  Basically, you can have multiple feature writers appending
data to an FS store by writing to unique files in the store.  Then, when
ingest is complete, you can run a compaction job to merge files.  At
this stage, I could envision the compaction job also distributing keys
by discovering the range of values of an attribute and writing out new
files accordingly.

One interesting way to deploy the FS data store is similar to what you
hinted at with respect to heavy caching on geoservers.  While I have
tested this out on AWS, it would work similarly to on-premise HDFS
clusters.  Basically, ingest data into S3 and then deploy s3fs-fuse with
caching on Geoservers.  This has proven to be a very cost-effective and
elastic deployment mode.

-Anthony.

John Meehan <john_n_meehan@xxxxxxxxx> writes:

Is there a comparison somewhere of the performance and featuresets of the various backends?  I’ve tried, in a limited capacity, Accumulo (via geodocker), Cassandra (hacking the source to get it working with v2), and FS (HDFS) though not to the extent to evaluate their performance characteristics.

I’m in a large corporate setting where its tenuous to convince ops teams to install Accumulo on their managed Hadoop clusters or unfamiliar jars into their Cassandra instances. But they’re all potentially options.  I like the simplicity of geomesa-fs on HDFS, and heavily caching reads in a pool of Geoservers/Geowebcache, but also supporting batch Spark analytics.  I just discovered that it doesn’t (yet?) support anything but Points in org.locationtech.geomesa.fs.storage.common.Z2Scheme.

My main goals is indexing terabytes of shape files (of various geometry types), primarily for spatial queries to serve vector (and possible raster) tiles via Geoserver.  But getting temporal and attribute indices would be a nice bonus, as would supporting Spark analytics.

-John

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users




Back to the top