Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Accumulo Durability

One thing to note.  With the newer Hadoop versions the WALS by default get
put into the trash when deleted.   There is an accumulo setting to have them directly
deleted, or you can set up Hadoop to checkpoint and empty the trash on a regular
basis.

I found I had disk filling up quickly when doing ingests.


On 4/6/17 8:08 AM, Jose Bujalance wrote:
Hi Andrew,

Thanks for the advise.

José

2017-04-05 18:17 GMT+02:00 Andrew Hulbert <ahulbert@xxxxxxxx>:

José,

Note that in general you'll want to keep some free disk space around because of compactions as well...Don't let HDFS get above 90% full most likely. Even though your WALs are off for geomesa tables the metadata table still should have a WAL.

If you want to decrease usage (and don't care as much about failures) you can also set the replication of the tables to 2 instead of 3 in HDFS meaning that your disk usage will be cut by 1/3 but you are more likely to lose data in the event of a hdfs datanode failure.

Andrew

On 04/05/2017 12:12 PM, Jose Bujalance wrote:
Hi Emilio,

Thanks for your answer. My Accumulo was in a very bad state, so after having done some research I have decided to reinitialize it. I recommend to do this when the situation is critical, as there's no need to reinstall Accumulo (which I find quite painful).

1/ Delete the HDFS Accumulo data directory (/apps/accumulo)
2/ Reinitialize Accumulo using the "accumulo init" command
3/ In case of problem, verify that there are not residual instances in Zookeeper

Everything is working fine for me now. I have also changed the durability to "none" in order to avoid this potential issue again.

Thanks,
José

2017-04-04 22:18 GMT+02:00 Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>:
We occasionally disable WALs to speed up ingestion. You can configure the durability on a per-table basis through the shell:

https://accumulo.apache.org/1.7/accumulo_user_manual#_table_durability

Something like:

config -t foo -s table.durability=none

You'd have to do that to each of your geomesa index tables.

I'm not sure exactly what would happen if you deleted the existing WALs - hopefully accumulo would clean those up for you once they aren't needed. But if you're in a bad state you might also need to delete references to them in the metadata table. I believe this comes up from time to time on the accumulo user list.

Thanks,

Emilio


On 04/04/2017 12:27 PM, Jose Bujalance wrote:
Hi,

I am having some problems with my HDFS free disk space, and I think it can be related to the Accumulo durability and the WAL (Write-ahead Log).

When I ingest or delete any feature or catalog in Geomesa, my free disk start decreasing until the disk is full at 100%. I don't fully understand how Acumulo's durability works, but I think this might be because the first time I tried to delete my Geomesa catalog, the deletion didn't fullt suceed because I had a RAM problem, and then I started ingesting new features with the same catalog name, so Accumulo seem to interpret this as an update and starts generating logs in the WAL.

In order to regain some disk space, I would like to clean the WAL logs in the HDFS, and also set the Accumulo's durability to "none" (https://accumulo.apache.org/1.7/accumulo_user_manual#_durability)

However, I don't know what are the consequences of deleting the HDFS folder /apps/accumulo/data/wal, and I haven't found the way to change the durability default.

Has anyone done this before?

Thanks, 
José


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________ geomesa-users mailing list geomesa-users@xxxxxxxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________ geomesa-users mailing list geomesa-users@xxxxxxxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
-- 
========= mailto:dboyd@xxxxxxxxxxxxxxxxx ============
David W. Boyd                     
VP,  Data Solutions       
10432 Balls Ford, Suite 240  
Manassas, VA 20109         
office:   +1-703-552-2862        
cell:     +1-703-402-7908
============== http://www.incadencecorp.com/ ============
ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
Chair ANSI/INCITS TC Big Data
Co-chair NIST Big Data Public Working Group Reference Architecture
First Robotic Mentor - FRC, FTC - www.iliterobotics.org
Board Member- USSTEM Foundation - www.usstem.org

The information contained in this message may be privileged 
and/or confidential and protected from disclosure.  
If the reader of this message is not the intended recipient 
or an employee or agent responsible for delivering this message 
to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication 
is strictly prohibited.  If you have received this communication 
in error, please notify the sender immediately by replying to 
this message and deleting the material from any computer.

 

Back to the top