Hi Damiano,
No problem, more replies inline.
Thanks,
Emilio
On 01/23/2017 02:42 PM, Damiano Albani
wrote:
Hello Emilio,
First, thanks for having taking the time to write such a
detailed answer!
Because we're an Eclipse project, anything we host has to be blessed
by Eclipse for provenance and license. As we haven't gotten this
sign-off on all the BigTable dependencies yet, we unfortunately
can't bundle it - we can still use it as a plugin, but you have to
build it yourself. Hopefully we will be able to get it approved
soon.
Since you're doing a one-time bulk ingest, map/reduce could be a
good fit. Depending on your inputs, our tools should make it fairly
easy (with the classpath caveat I mentioned). If you have a cluster
to run on, and your inputs are flat files, it will handle all the
multi-threading and load-balancing for you.
GeoTools does have a lot of different ways to accomplish the same
thing. The main underlying abstraction for writing is the
FeatureWriter (either Append of Modify) - if you look at the
addFeatures method, we just use a feature writer:
https://github.com/locationtech/geomesa/blob/master/geomesa-index-api/src/main/scala/org/locationtech/geomesa/index/geotools/GeoMesaFeatureStore.scala#L31-L53
Buffering is implementation dependent - for GeoMesa HBase/BigTable,
we use an underlying org.apache.hadoop.hbase.client.BufferedMutator.
You can control the batch size through the system property
'geomesa.hbase.write.batch'. If you want finer control, you can also
cast a FeatureWriter to
org.locationtech.geomesa.hbase.data.HBaseAppendFeatureWriter, which
includes a 'flush' method (you can get a feature writer through
datastore.
getFeatureWriterAppend).
As I mentioned before, all methods of writing through GeoMesa will
end up funneling through that feature writer class, so this applies
across the board.
As for the input side, we use a combination of GeoTools data stores
and custom code. Our converter framework is designed to convert flat
files into simple features in a streaming fashion, and I can attest
that it handles memory well. The other GeoTools data stores may work
differently (loading the entire file at once, etc) - I'm not
entirely sure there.
That may be reasonable depending on your input data (using a
GeoTools query implies that you already have your data in a GeoTools
data store). Feature writers and readers are all single-threaded
though, so you would want to load 5 different feature collections by
splitting your data on some queryable attribute (e.g. by month if
your data has timestamps).
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users
|