Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Geomesa map/reduce ingest speed query

Hi Ben,

Awesome.  I'm glad that the geomesa ingest tool is working for you.  For M/R ingest, we assumed that the data was available on HDFS.  In general, if the file(s) to be ingested were very large, then having it on a HDFS should help provide splits for mappers.

At the minute, given the recent emails, I think our M/R ingest code should be reviewed.  I'd suggest using the local ingest tool for a little bit.

Thanks,

Jim

On 08/04/2015 01:16 PM, Ben Southall wrote:

Looks like geomesa ingest is working nicely, thank you.

 

Is there any advantage to having the file being ingested on hdfs?

 

Thanks,

 

Ben

 

From: geomesa-users-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Ben Southall
Sent: Monday, August 03, 2015 5:12 PM
To: Geomesa User discussions
Subject: Re: [geomesa-users] Geomesa map/reduce ingest speed query

 

Thanks Jim – I’ll try Geomesa tools ingest. My map/reduce ingest is working from .csv anyway.

 

Ben

 

 

From: geomesa-users-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Jim Hughes
Sent: Monday, August 03, 2015 5:08 PM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] Geomesa map/reduce ingest speed query

 

Hi Ben,

I don't believe we have seen those exceptions for ingest.  From looking at the GDELT M/R ingest example, I think the code is pretty rough.  I'm going to try and bring that up around the water cooler to see if we can share something better.

As a guess, I'm wondering if your mapper are running out of heap, dying, and when they come back, Hadoop is trying to re-secure/re-write a file which indicates what process is doing what.  Otherwise, there may be a problem with some of our code which creates directories for the MR sample code.

As a general suggestion, you might try out the 'export' and 'ingest' commands directly (http://www.geomesa.org/geomesa-tools-ingest-export/).  If you had a working set of GeoMesa tools for the old version, you can export to csv, and then use a new version of the tools to ingest into a different table.  This approach definitely isn't high-tech, but it should be fairly easy to do and avoid any issues with memory sizes for JVM and any other MR issues.

Other than that, we'll be support Scala 2.11 in the near future.  When we do, that may help with the Function22/Tuple22 limits.

Thanks,

Jim

On 08/03/2015 03:24 PM, Ben Southall wrote:

Hello,

 

We’ve been transitioning from a version of Geomesa from before the ‘z3’ index was introduced, to  1.1.0_rc.2. We tried an in-place upgrade of our 1.0.x tables, but unfortunately it didn’t work (I think the problem relates to my Scala compiler topping out at Function22, and I have 30+ attributes in my table).

 

Anyway, I figured I could just re-ingest the data, since that was typically something I could do overnight, and I was going to be out for a few days anyway.

 

My ingestion code is done using Map/Reduce, and is based upon the old geomesa.org GDELT Map/Reduce ingestion example; with version 1.0.x it worked fine. Now, after just over 1 week of processing, I’m only 21% of the way through a dataset of only around 9 million features with point geometry (each feature has 30+ attributes, one timestamp, one POINT geometry, and 3 secondary indexes). Each Map task has a 1GB heap (which I have room to increase if necessary), and I have plentiful space on HDFS.

 

It seems that my map tasks are repeatedly failing with a number of different errors (I’ve listed them at the bottom of the email). I tried an ingestion of a larger number of points (~43 million) with fewer (7) non-geometry attributes, and came across similar issues.

 

Any suggestions?

 

Thanks!

 

Ben

 

--

Error: Java heap space

--

Java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs

                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1451)

                at org.apache.hadoop.mapred.Child.main(Child.java:262)

Caused by: java.security.PrivilegedActionException: org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations : 0  security codes: []  # server errors 0 # exceptions 1

                at java.security.AccessController.doPrivileged(Native Method)

                at javax.security.auth.Subject.doAs(Subject.java:415)

                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)

                ... 1 more

Caused by: org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations : 0  security codes: []  # server errors 0 # exceptions 1

                at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:536)

                at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:353)

                at org.apache.acc

--


org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: EEXIST: File exists

        at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:178)

        at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310)

        at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)

        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:415)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)

        at org.apache.hadoop.mapred.Child.main(Child.java:262)

Caused by: EEXIST: File exists

        at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)

        at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:172)

        ... 7 more

--

 



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users

 



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top