Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa

Hi Emilio (and everyone else),

On Wed, Jan 25, 2017 at 3:19 PM, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Awesome, let us know how it goes!

I've had a try at the ingestion of the GDELT (1.0 Event) dataset, following the nice tutorial that is provided.
Apart from having a significant numbers of rows being rejected by the ingest tool (data format issue?), it worked as expected.
I have remark though regarding the threading functionality: it seems that settings a high value didn't make any difference for the performance.
Running a second process of geomesa-bigtable ingest did increase the speed of the data ingestion.
Speaking of performance, I happened to reach ~ 20 MB/s write throughput on a "default" BigTable instance (i.e. 3 nodes on SSD). And ~ 30 MB/s with the second process I mentioned above.
I don't know if you used BigTable in that context, but does it seem to match the average expected performance?

The second step of my testing of GeoMesa is now to try using Hapdoop and MapReduce jobs to further improve the performance. Are my expectations correct in that regard, by the way?
I have actually followed the tutorial found on Github which, after adaptation to the Google Cloud environment, kind of worked: the "only" thing being that the performance was atrocious?!
I suppose it has to do with my (limited) knowledge of Hadoop, MapReduce and the Google Dataproc product.
In case you have some experience on that subject, I'd be happy to hear any advice or things I need to pay attention to.

Thanks,

--
Damiano Albani
Geodan

Back to the top