Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...

Hi Takashi,

In geomesa-1.2.3 there are 2 different parameters you might be referencing: generateStats and collectQueryStats. collectQueryStats was named collectStats in older versions, but was renamed to make it clearer. generateStats will store summary statistics for your data, which is then used for query planning. Since you are deleting the data every few minutes, you probably want to disable this, as it will introduce needless overhead. collectQueryStats is a simple form of auditing that will log every query to accumulo, in the '<catalog>_queries' table.

When you delete your data, how are you doing it? It might be the rapid deleting and re-creating that is causing Accumulo problems - managing table state is a single master process and it often seems to cause some contention. Depending on your use case, maybe you should consider using the Kafka data store instead - it excels at real-time data, and hardware costs are considerably lower. It doesn't provide some of the more advanced features of the Accumulo data store, but that might not be a problem for you.

Another tip to reduce your hardware requirements is to disable any indices that you aren't using. It sounds like all your queries are against the z2 index (i.e. they have a spatial component) - if so you could disable the other indices. See here for instructions: http://www.geomesa.org/documentation/1.2.3/user/data_management.html#customizing-index-creation

Thanks,

Emilio

On 12/14/2016 01:47 AM, Takashi Sasaki wrote:
Oops,  I forgot to mention important things.

I'm ingesting the data actually not using mapreduce, but I'm using
multithread programing.
So I delete and re-ingest seven tables "in parallel".

2016-12-14 15:23 GMT+09:00 Takashi Sasaki <tsasaki609@xxxxxxxxx>:
Hi Andrew,

Answer:
I'm ingesting the data from Apache Spark program which is mapreduce,
but ingesting the data is after collecting data to driver node, so
actually single java program.

Note:
I'm not using timestamp attribute, so GeoMesa create Z2 index table.
I delete and re-ingest the data every few minutes for near realtime
image rendering by GeoServer.

Additional Question:
Accumulo(GeoMesa) Data Store exist "collectStats" parameter.
What kind of merit is there by using this?
I tried to find a document, but I could not find it.

Thank you for reply,
Takashi

2016-12-14 13:22 GMT+09:00 Andrew <ahulbert@xxxxxxxx>:
Trying again...

Hi Takashi,

I'll try to look into question #1 but note that I generally use ec2 instead
so I'll have to see how it does zookeepers ... Question: How are you
ingesting the data? Is it mapreduce or a single java program?

But in the meantime for #2 you should certainly be able to use the r3.xlarge
instances with 4 CPU and 30gig ram. We have built many clusters on EC2 this
way. I would try to use at least 5 or 10 nodes if you can. Just make sure to
spread the accumulo master, hadoop name more, and 3 zookeepers on separate
nodes. I'll update that wiki page with some more examples.

Also note that you can use EBS storage which can be cheaper.

Andrew




-------- Original message --------
From: Takashi Sasaki <tsasaki609@xxxxxxxxx>
Date: 12/13/16 20:19 (GMT-05:00)
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: [geomesa-users] How to avoid Tablet Server crashing, and server
cost problem...

Hello,

I use GeoMesa 1.2.3 with Accumulo 1.7.2 on AWS EMR which is release
emr-4.7.0, Hadoop 2.7.2, ZooKeeper-Sandbox 3.4.8.
I have two problem for the system management. One is Accumulo Tablet
server suddenly crashing, and other is Geomesa(Accumulo) hardware spec
requirement is too expensive for my company.

1.
I encountered too slow writting features to GeoMesa(Accumulo)
Datastore when Accumulo Tablet Server crashed, and GeoServer image
rendering respose is also slowing.
If Tablet Server alive, it is one or two minute for writting features,
but it is eight minutes or more in the above state.

Probably, this cause is waitting Accumulo Master server for tablet
rebalancing.

[Accumulo Master Server log]
2016-12-13 03:34:26,008 [master.Master] WARN : Lost servers
[ip-10-24-83-37:9997[358b52962d101f8]]

[Accumulo Tablet Server log]
2016-12-13 03:34:47,943 [hdfs.DFSClient] WARN : DFSOutputStream
ResponseProcessor exception for block
BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460
java.io.EOFException: Premature EOF: no length prefix available
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
at
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:734)
2016-12-13 03:34:47,943 [zookeeper.ClientCnxn] WARN : Client session
timed out, have not heard from server in 53491ms for sessionid
0x358b52962d101f8
2016-12-13 03:34:47,944 [hdfs.DFSClient] WARN : Error Recovery for
block BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460
in pipeline
DatanodeInfoWithStorage[10.24.83.37:50010,DS-43dbba0e-2bbd-4f8e-8a07-3f869123925c,DISK],
DatanodeInfoWithStorage[10.24.83.39:50010,DS-f66dcecb-1b44-431c-83fc-ab343339c485,DISK]:
bad datanode
DatanodeInfoWithStorage[10.24.83.37:50010,DS-43dbba0e-2bbd-4f8e-8a07-3f869123925c,DISK]
2016-12-13 03:34:48,444 [hdfs.DFSClient] WARN : DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException):
BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
not exist or is not under Constructionblk_1074003283_264798
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
2016-12-13 03:34:48,445 [hdfs.DFSClient] WARN : Error while syncing
org.apache.hadoop.ipc.RemoteException(java.io.IOException):
BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
not exist or is not under Constructionblk_1074003283_264798
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
2016-12-13 03:34:48,445 [log.DfsLogger] WARN : Exception syncing
java.lang.reflect.InvocationTargetException
2016-12-13 03:34:48,463 [zookeeper.ClientCnxn] WARN : Unable to
reconnect to ZooKeeper service, session 0x358b52962d101f8 has expired
2016-12-13 03:34:48,546 [log.DfsLogger] WARN : Exception syncing
java.lang.reflect.InvocationTargetException
2016-12-13 03:34:48,546 [log.DfsLogger] ERROR: Failed to close log file
org.apache.hadoop.ipc.RemoteException(java.io.IOException):
BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
not exist or is not under Constructionblk_1074003283_264798
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
2016-12-13 03:34:48,564 [zookeeper.ZooReader] WARN : Saw (possibly)
transient exception communicating with ZooKeeper
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/accumulo/3a6635fa-2c60-4860-b8aa-56a2d654b419/tservers/ip-10-24-83-37:9997
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
at
org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java:132)
at org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
2016-12-13 03:34:48,565 [zookeeper.ZooCache] WARN : Saw (possibly)
transient exception communicating with ZooKeeper, will retry
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/accumulo/3a6635fa-2c60-4860-b8aa-56a2d654b419/tables/1n/namespace
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:295)
at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:272)
at
org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(ZooCache.java:171)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:323)
at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:259)
at
org.apache.accumulo.core.client.impl.Tables.getNamespaceId(Tables.java:235)
at
org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(TabletServer.java:600)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
at com.sun.proxy.$Proxy19.startMultiScan(Unknown Source)
at
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2330)
at
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2314)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at
org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
at
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
at
org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:745)
---
(retry log many times)
---
2016-12-13 03:34:48,844 [tserver.TabletServer] ERROR: Lost tablet
server lock (reason = SESSION_EXPIRED), exiting.

According this log, Tablet Server was accessing HDFS and ZooKeeper
session expired, server shutdowned self.

How can I avoid Tablet Server crashing?


2.
I use AWS EMR with r3.8xlarge instance for Accumulo Tablet server on 5
core node.
It is high performance(32 vCPU/ 244G memory/ 2 x 320GB SSD), but too
expensive.

For GeoMesa Tuning Accumulo,
https://geomesa.atlassian.net/wiki/display/GEOMESA/Tuning+Accumulo#TuningAccumulo-SmallCluster,LargeServers
example hardware spec requirement is almost fitting for r3.8xlarge.

But I want to cost down.
Is not Small Cluster, "Small" Servers example  hardware spec
requirement somewhere?

If possible, I wanna use r3.xlarge(4 vCPU/ 30.5G memory/ 1 x 80GB SSD)
for Accumulo Tablet server.
How do you think?


Thank you,
Takashi
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from
this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users



Back to the top