Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-dev] Ingest performance issues with newest version of geomesa

Blake,

I am still working through some of the numbers here on ingest of points
v. non-point geometries, but will write you back as soon as I have
something cogent.

Thanks!

Sincerely,
  -- Chris Eichelberger


On Fri, 2014-05-30 at 19:35 -0400, Anthony Fox wrote:
> Blake, I don't think it's you. Chris has some preliminary performance
> stats on non-point geometry features and something has impacted
> ingest. We are still looking into it. Will get back to you when we
> know more. 
> 
> On May 30, 2014, at 5:59 PM, "Peno, Blake"
> <Blake.Peno@xxxxxxxxxxxxxxx> wrote:
> 
> 
> > When I use a MapReduce job, I use a split for each layer of my
> > dataset, which is about 340ish splits. I’m getting ingest speeds of
> > around 75/s using this MapReduce job, but it’s actually must faster
> > for me if I just push them one at a time without using any of the
> > MapReduce stuff, so I have to assume I’m doing something
> > incorrectly, but I’m not really sure what. You guys will have to
> > forgive me, as I’m not very well versed with hadoop in general, so
> > working with geomesa is a bit of a learning experience for me.
> > 
> >  
> > 
> > If you could get me some information on how fast you can ingest
> > polygons, I can confirm that the problem is on my end and just keep
> > learning and fixing things over here. I just want to make sure that
> > it is just me  getting these speed issues.
> > 
> >  
> > 
> > Blake
> > 
> >  
> > 
> > From: geomesa-dev-bounces@xxxxxxxxxxxxxxxx
> > [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Anthony
> > Fox
> > Sent: Thursday, May 29, 2014 7:48 AM
> > To: Discussions between GeoMesa committers
> > Subject: Re: [geomesa-dev] Ingest performance issues with newest
> > version of geomesa
> > 
> >  
> > 
> > Blake, we're running some tests against polygons and will let you
> > know the result.  Can you tell me how many map tasks were
> > instantiated by your MapReduce job?
> > 
> > 
> > Thanks,
> > Anthony
> > 
> > 
> >  
> > 
> > On Wed, May 28, 2014 at 5:31 PM, Peno, Blake
> > <Blake.Peno@xxxxxxxxxxxxxxx> wrote:
> > 
> > The MapReduce jobs from the example is getting me about the same
> > speed ingestion. I’m getting an average of (according to the
> > accumulo overview site) around 120 per second ingests. Let me know
> > if your polygons are getting any better performance than this and
> > I’m just doing something wrong.
> > 
> >  
> > 
> > From:geomesa-dev-bounces@xxxxxxxxxxxxxxxx
> > [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Anthony
> > Fox
> > Sent: Wednesday, May 28, 2014 2:29 PM
> > 
> > 
> > To: Discussions between GeoMesa committers
> > Subject: Re: [geomesa-dev] Ingest performance issues with newest
> > version of geomesa
> > 
> > 
> >  
> > 
> > Blake,
> > 
> > This is a good mailing list to contact us - you can also use the
> > users mailing list (geomesa-users@xxxxxxxxxxxxxxxx).  We benchmarked
> > against point data - I'll test out an area and lines ingest and let
> > you know some numbers.  I'd recommend creating MapReduce jobs for
> > your ingest (or a Storm job if it is streaming).  That way, you'd
> > get lots of parallelism and the index requires no communication so
> > parallelism is fine.  Check out the tutorial here:
> > 
> > http://geomesa.github.io/2014/04/17/geomesa-gdelt-analysis/
> > 
> > 
> > The code referenced in that tutorial (available on GitHub)
> > demonstrates MapReduce based ingest.   For Storm, check out:
> > 
> > http://geomesa.github.io/2014/05/16/geomesa-osm-analysis/
> > 
> > 
> > Let me know if this helps.
> > 
> > 
> > Thanks,
> > Anthony
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > 
> >  
> > 
> > On Wed, May 28, 2014 at 2:23 PM, Peno, Blake
> > <Blake.Peno@xxxxxxxxxxxxxxx> wrote:
> > 
> > Sorry, forgot to mention our cluster is 14 nodes.
> > 
> >  
> > 
> > From:geomesa-dev-bounces@xxxxxxxxxxxxxxxx
> > [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Peno,
> > Blake
> > Sent: Wednesday, May 28, 2014 1:21 PM
> > 
> > 
> > To: Discussions between GeoMesa committers
> > Subject: Re: [geomesa-dev] Ingest performance issues with newest
> > version of geomesa
> > 
> > 
> >  
> > 
> > I’m using Java to push features as described in the documentation
> > PDF. I’m getting a FeatureSource from the DataStore and using the
> > addFeatures method. 500k/second is about 50k/second times faster
> > than what I’ve been getting recently. Even before updating to the
> > latest version I wasn’t getting anywhere near that. It seems to be
> > much faster when using point data, of course, but most of my data is
> > area and line features.
> > 
> >  
> > 
> > Also, side note, is this the mailing list I should be using? I know
> > I’m not a developer of geomesa per say, but I didn’t know how else
> > to contact you guys easily.
> > 
> >  
> > 
> > From:geomesa-dev-bounces@xxxxxxxxxxxxxxxx
> > [mailto:geomesa-dev-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Anthony
> > Fox
> > Sent: Wednesday, May 28, 2014 11:54 AM
> > To: Discussions between GeoMesa committers
> > Subject: Re: [geomesa-dev] Ingest performance issues with newest
> > version of geomesa
> > 
> >  
> > 
> > Blake,
> > 
> > We recently switched from a text based encoding to an Avro binary
> > encoding.  This should have actually sped up your ingest
> > significantly - it performed very well in tests we ran during
> > development of the binary encoding.  As a point of reference, we
> > have been able to ingest (on a 21 node cluster) about 500K records
> > per second using a map/reduce job.  Can you give a bit more detail
> > about how you are performing your ingest?
> > 
> > 
> > Thanks,
> > Anthony
> > 
> > 
> >  
> > 
> > On Wed, May 28, 2014 at 12:48 PM, Peno, Blake
> > <Blake.Peno@xxxxxxxxxxxxxxx> wrote:
> > 
> > Hi all,
> > 
> >  
> > 
> > I recently upgraded to the newest version of geomesa on github, and
> > I’ve noticed that my performance has drastically dropped in regards
> > to pushing features to geomesa. At this rate it’s going to take
> > about a week to get all of my data uploaded. Has something changed
> > that would cause this, or am I missing something simple?
> > 
> > 
> > 
> > _______________________________________________
> > geomesa-dev mailing list
> > geomesa-dev@xxxxxxxxxxxxxxxx
> > http://locationtech.org/mailman/listinfo/geomesa-dev
> > 
> > 
> >  
> > 
> > 
> > 
> > _______________________________________________
> > geomesa-dev mailing list
> > geomesa-dev@xxxxxxxxxxxxxxxx
> > http://locationtech.org/mailman/listinfo/geomesa-dev
> > 
> > 
> >  
> > 
> > 
> > 
> > _______________________________________________
> > geomesa-dev mailing list
> > geomesa-dev@xxxxxxxxxxxxxxxx
> > http://locationtech.org/mailman/listinfo/geomesa-dev
> > 
> > 
> >  
> > 
> > 
> > _______________________________________________
> > geomesa-dev mailing list
> > geomesa-dev@xxxxxxxxxxxxxxxx
> > http://locationtech.org/mailman/listinfo/geomesa-dev
> > 
> _______________________________________________
> geomesa-dev mailing list
> geomesa-dev@xxxxxxxxxxxxxxxx
> http://locationtech.org/mailman/listinfo/geomesa-dev



Back to the top