Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Which backend to pick?

John,

For non-point geometries, your best bet right now is
Accumulo/HBase/Cassandra.  I think Accumulo is the most mature but HBase
and Cassandra are catching up.  Geodocker recently got some attention
from a user who was interested in on-premise instantiations.  Check out
https://anitagraser.com/2017/08/27/getting-started-with-geomesa-using-geodocker/
for a tutorial.

The FS data store is the most light weight implementation and one that I
think will continue to gain traction.  As of now, it only supports
points as you discovered.  And it is intended for relatively static data
sets, ones which you can manage ingest and compaction manually.  I don't
think it would be a stretch to get it to support linestrings and
polygons as the xz index produces an integer index the same as the z2
index.  Attribute indexes may be a bit more of challenge as the range of
values is not known apriori.  We are adding manual compaction tools
which could be used to manage attribute (and other) indexes more
effectively.  Basically, you can have multiple feature writers appending
data to an FS store by writing to unique files in the store.  Then, when
ingest is complete, you can run a compaction job to merge files.  At
this stage, I could envision the compaction job also distributing keys
by discovering the range of values of an attribute and writing out new
files accordingly.

One interesting way to deploy the FS data store is similar to what you
hinted at with respect to heavy caching on geoservers.  While I have
tested this out on AWS, it would work similarly to on-premise HDFS
clusters.  Basically, ingest data into S3 and then deploy s3fs-fuse with
caching on Geoservers.  This has proven to be a very cost-effective and
elastic deployment mode.

-Anthony.

John Meehan <john_n_meehan@xxxxxxxxx> writes:

> Is there a comparison somewhere of the performance and featuresets of the various backends?  I’ve tried, in a limited capacity, Accumulo (via geodocker), Cassandra (hacking the source to get it working with v2), and FS (HDFS) though not to the extent to evaluate their performance characteristics.
>
> I’m in a large corporate setting where its tenuous to convince ops teams to install Accumulo on their managed Hadoop clusters or unfamiliar jars into their Cassandra instances. But they’re all potentially options.  I like the simplicity of geomesa-fs on HDFS, and heavily caching reads in a pool of Geoservers/Geowebcache, but also supporting batch Spark analytics.  I just discovered that it doesn’t (yet?) support anything but Points in org.locationtech.geomesa.fs.storage.common.Z2Scheme.
>
> My main goals is indexing terabytes of shape files (of various geometry types), primarily for spatial queries to serve vector (and possible raster) tiles via Geoserver.  But getting temporal and attribute indices would be a nice bonus, as would supporting Spark analytics.
>
> -John
>
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top