Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Geomesa map/reduce ingest speed query

Hi Ben,

I don't believe we have seen those exceptions for ingest.  From looking at the GDELT M/R ingest example, I think the code is pretty rough.  I'm going to try and bring that up around the water cooler to see if we can share something better.

As a guess, I'm wondering if your mapper are running out of heap, dying, and when they come back, Hadoop is trying to re-secure/re-write a file which indicates what process is doing what.  Otherwise, there may be a problem with some of our code which creates directories for the MR sample code.

As a general suggestion, you might try out the 'export' and 'ingest' commands directly (http://www.geomesa.org/geomesa-tools-ingest-export/).  If you had a working set of GeoMesa tools for the old version, you can export to csv, and then use a new version of the tools to ingest into a different table.  This approach definitely isn't high-tech, but it should be fairly easy to do and avoid any issues with memory sizes for JVM and any other MR issues.

Other than that, we'll be support Scala 2.11 in the near future.  When we do, that may help with the Function22/Tuple22 limits.

Thanks,

Jim

On 08/03/2015 03:24 PM, Ben Southall wrote:

Hello,

 

We’ve been transitioning from a version of Geomesa from before the ‘z3’ index was introduced, to  1.1.0_rc.2. We tried an in-place upgrade of our 1.0.x tables, but unfortunately it didn’t work (I think the problem relates to my Scala compiler topping out at Function22, and I have 30+ attributes in my table).

 

Anyway, I figured I could just re-ingest the data, since that was typically something I could do overnight, and I was going to be out for a few days anyway.

 

My ingestion code is done using Map/Reduce, and is based upon the old geomesa.org GDELT Map/Reduce ingestion example; with version 1.0.x it worked fine. Now, after just over 1 week of processing, I’m only 21% of the way through a dataset of only around 9 million features with point geometry (each feature has 30+ attributes, one timestamp, one POINT geometry, and 3 secondary indexes). Each Map task has a 1GB heap (which I have room to increase if necessary), and I have plentiful space on HDFS.

 

It seems that my map tasks are repeatedly failing with a number of different errors (I’ve listed them at the bottom of the email). I tried an ingestion of a larger number of points (~43 million) with fewer (7) non-geometry attributes, and came across similar issues.

 

Any suggestions?

 

Thanks!

 

Ben

 

--

Error: Java heap space

--

Java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs

                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1451)

                at org.apache.hadoop.mapred.Child.main(Child.java:262)

Caused by: java.security.PrivilegedActionException: org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations : 0  security codes: []  # server errors 0 # exceptions 1

                at java.security.AccessController.doPrivileged(Native Method)

                at javax.security.auth.Subject.doAs(Subject.java:415)

                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)

                ... 1 more

Caused by: org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations : 0  security codes: []  # server errors 0 # exceptions 1

                at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:536)

                at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:353)

                at org.apache.acc

--


org.apache.hadoop.io.SecureIOUtils$AlreadyExistsException: EEXIST: File exists

        at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:178)

        at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:310)

        at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:383)

        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:415)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)

        at org.apache.hadoop.mapred.Child.main(Child.java:262)

Caused by: EEXIST: File exists

        at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)

        at org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:172)

        ... 7 more

--

 



_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users


Back to the top