Hello,
I've been successfully ingesting Avro-formatted data into Bigtable using the command line program.
This was done via a MapReduce job targeting Avro files located on GCS, thanks to the
By the way, don't you think it would be appropriate to include a dependency to this connector in the geomesa-bigtable-tools module by default?
A related change would be to add
"gs://" to the list of
distPrefixes in
AbstractIngest.
I've used Google Cloud Dataproc (i.e. hosted Hadoop environment) to run the MapReduce job.
The issue I run into was that Dataproc requires a JAR file (or several JARs) to run the job.
So I couldn't simply tell it to call "geomesa-bigtable convert ...".
The solution I came up with was to build a shaded JAR of geomesa-bigtable-tools.
Do you think it would be a good idea to provide such a JAR by default for Hadoop usage?
Last point I wanted to mention: it looks like the input of the MapReduce job was not split, even though I was using Avro files on purpose.
Should AvroFileInputFormat thus simply overrides it to "isSplitable = true"? (I haven't tested how GeoMesa would react.)
I suppose TSV and CSV input formats should also be marked as splitable by the way, shouldn't they?
Thanks,
--