Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc

Hi Damiano,

Nice! Let us know if you get any more performance tips.

We do already attempt to pre-split the tables at table creation time:

https://github.com/locationtech/geomesa/blob/master/geomesa-hbase/geomesa-hbase-datastore/src/main/scala/org/locationtech/geomesa/hbase/index/HBaseFeatureIndex.scala#L53

The default for feature IDs is designed for UUID strings, so you should be good there. But since bigtable is a black box, it's hard to say whether this makes a difference, or even if we're doing it correctly.

Thanks,

Emilio

On 02/28/2017 11:25 AM, Damiano Albani wrote:
Hello,

On Tue, Feb 21, 2017 at 4:46 PM, Damiano Albani <damiano.albani@xxxxxxxxx> wrote:
Now the remaining issue is that I don't understand the overall behavior of the MapReduce job on Google Dataproc: only 1 worker node (e.g. out of 2) gets tasks (albeit correctly 1 task per vCPU) and, even more surprising, I don't see any performance boost in Bigtable write throughput.

For the record, using the preview version of the Dataproc environment fixed my issue somehow.
MapReduce ingest jobs are now fully split over all nodes — so fast that I think Bigtable is now the bottleneck.
I mean, at least starting off an empty Bigtable instance.

This comment on StackOverflow made me think that it could be preferable to pre-split Bigtable before ingesting the data.
(Bigtable will eventually reorganize those splits if I understand correctly.)
Given that I use UUID strings as feature identifiers, I suppose I could use split prefixes going from "0" to "f"?
Anyway, I'll report if that improved the performance.

Regards,

--
Damiano Albani
Geodan


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top