Re: [geomesa-users] Which backend to pick?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Which backend to pick?

From: Jim Hughes <jnh5y@xxxxxxxx>
Date: Mon, 4 Sep 2017 10:58:12 -0400
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1

Hi John,

Great question. Benchmarks are likely the most challenging to come by.In terms of capabilities, I'll add a bit more to Anthony's response.GeoMesa on Accumulo is the most mature. GeoMesa on HBase is next.HBase and Accumulo both allow for rich server side programming, and thisspring, the GeoMesa team ported over most of the Accumulo iterator workto HBase coprocessors.

From what I can see, C* has a much more limited server programmingparadigm and we haven't explored it fully. GeoMesa just uses C* as akey-value store, so I don't believe there is any need to install GeoMesajars on the C* cluster. (Shout if I'm missing anything! Also, ifthere's community support or funding for maintaining a C* 2.x support,I'm happy to discuss that further.)

The FS datastore is a new, exciting option. Since it is new, therehasn't been much work on expanding the features yet. That said, one ofthe big goals is to provide 'cheap' access to volumes of data withoutneeding to manage an explicit database. I expect this'll be handy formaking data which would otherwise be 'off-line' useful.


Cheers,

Jim

On 09/04/2017 08:26 AM, Anthony Fox wrote:

John,

For non-point geometries, your best bet right now is
Accumulo/HBase/Cassandra.  I think Accumulo is the most mature but HBase
and Cassandra are catching up.  Geodocker recently got some attention
from a user who was interested in on-premise instantiations.  Check out
https://anitagraser.com/2017/08/27/getting-started-with-geomesa-using-geodocker/
for a tutorial.

The FS data store is the most light weight implementation and one that I
think will continue to gain traction.  As of now, it only supports
points as you discovered.  And it is intended for relatively static data
sets, ones which you can manage ingest and compaction manually.  I don't
think it would be a stretch to get it to support linestrings and
polygons as the xz index produces an integer index the same as the z2
index.  Attribute indexes may be a bit more of challenge as the range of
values is not known apriori.  We are adding manual compaction tools
which could be used to manage attribute (and other) indexes more
effectively.  Basically, you can have multiple feature writers appending
data to an FS store by writing to unique files in the store.  Then, when
ingest is complete, you can run a compaction job to merge files.  At
this stage, I could envision the compaction job also distributing keys
by discovering the range of values of an attribute and writing out new
files accordingly.

One interesting way to deploy the FS data store is similar to what you
hinted at with respect to heavy caching on geoservers.  While I have
tested this out on AWS, it would work similarly to on-premise HDFS
clusters.  Basically, ingest data into S3 and then deploy s3fs-fuse with
caching on Geoservers.  This has proven to be a very cost-effective and
elastic deployment mode.

-Anthony.

John Meehan <john_n_meehan@xxxxxxxxx> writes:

Is there a comparison somewhere of the performance and featuresets of the various backends?  I’ve tried, in a limited capacity, Accumulo (via geodocker), Cassandra (hacking the source to get it working with v2), and FS (HDFS) though not to the extent to evaluate their performance characteristics.

I’m in a large corporate setting where its tenuous to convince ops teams to install Accumulo on their managed Hadoop clusters or unfamiliar jars into their Cassandra instances. But they’re all potentially options.  I like the simplicity of geomesa-fs on HDFS, and heavily caching reads in a pool of Geoservers/Geowebcache, but also supporting batch Spark analytics.  I just discovered that it doesn’t (yet?) support anything but Points in org.locationtech.geomesa.fs.storage.common.Z2Scheme.

My main goals is indexing terabytes of shape files (of various geometry types), primarily for spatial queries to serve vector (and possible raster) tiles via Geoserver.  But getting temporal and attribute indices would be a nice bonus, as would supporting Spark analytics.

-John

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

References:
- [geomesa-users] Geomesa Query Performance
  - From: Sandeep Singh
- Re: [geomesa-users] Geomesa Query Performance
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Geomesa Query Performance
  - From: Sandeep Singh
- [geomesa-users] Which backend to pick?
  - From: John Meehan
- Re: [geomesa-users] Which backend to pick?
  - From: Anthony Fox

Prev by Date: Re: [geomesa-users] Which backend to pick?
Next by Date: [geomesa-users] Error in SimpleFeatureTypes.createType
Previous by thread: Re: [geomesa-users] Which backend to pick?
Next by thread: [geomesa-users] Error in SimpleFeatureTypes.createType
Index(es):
- Date
- Thread

Breadcrumbs