Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Go

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc

From: Damiano Albani <damiano.albani@xxxxxxxxx>
Date: Tue, 21 Feb 2017 16:46:47 +0100
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>

Hello,

On Mon, Feb 20, 2017 at 3:23 PM, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:

Awesome! I think that avro files are not splittable in our input format because they have a defined header and format that must be read by a single mapper. My understanding is that it's like XML - if you arbitrarily split an XML document each piece will no longer be valid. I could be wrong though, and there may be better work-arounds also.

Indeed, I agree that the reason why Avro files aren't split is due to GeoMesa's input format — well, at least, the current one.

Because I've applied what I mentioned previously: overriding AvroFileInputFormat with "isSplitable = true".

And I can report that the input was indeed split (e.g. 58 splits for a 3+ GB Avro file):

17/02/21 11:01:13 INFO mapreduce.JobSubmitter: number of splits:58

Now the remaining issue is that I don't understand the overall behavior of the MapReduce job on Google Dataproc: only 1 worker node (e.g. out of 2) gets tasks (albeit correctly 1 task per vCPU) and, even more surprising, I don't see any performance boost in Bigtable write throughput.

That's not particularly GeoMesa-specific I suppose but if you guys have an idea about what's going on, I'm interested!

Regards,

Damiano Albani
Geodan

Follow-Ups:
- Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Damiano Albani

References:
- [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Damiano Albani
- Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Anthony Fox
- Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
  - From: Emilio Lahr-Vivaz

Prev by Date: Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
Next by Date: Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
Previous by thread: Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
Next by thread: Re: [geomesa-users] Ingesting Avro files into GeoMesa using Hadoop on Google Dataproc
Index(es):
- Date
- Thread

Breadcrumbs