Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
From: Takashi Sasaki <tsasaki609@xxxxxxxxx>
Date: Wed, 14 Dec 2016 15:47:41 +0900
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
Oops,  I forgot to mention important things.

I'm ingesting the data actually not using mapreduce, but I'm using
multithread programing.
So I delete and re-ingest seven tables "in parallel".

2016-12-14 15:23 GMT+09:00 Takashi Sasaki <tsasaki609@xxxxxxxxx>:
> Hi Andrew,
>
> Answer:
> I'm ingesting the data from Apache Spark program which is mapreduce,
> but ingesting the data is after collecting data to driver node, so
> actually single java program.
>
> Note:
> I'm not using timestamp attribute, so GeoMesa create Z2 index table.
> I delete and re-ingest the data every few minutes for near realtime
> image rendering by GeoServer.
>
> Additional Question:
> Accumulo(GeoMesa) Data Store exist "collectStats" parameter.
> What kind of merit is there by using this?
> I tried to find a document, but I could not find it.
>
> Thank you for reply,
> Takashi
>
> 2016-12-14 13:22 GMT+09:00 Andrew <ahulbert@xxxxxxxx>:
>> Trying again...
>>
>> Hi Takashi,
>>
>> I'll try to look into question #1 but note that I generally use ec2 instead
>> so I'll have to see how it does zookeepers ... Question: How are you
>> ingesting the data? Is it mapreduce or a single java program?
>>
>> But in the meantime for #2 you should certainly be able to use the r3.xlarge
>> instances with 4 CPU and 30gig ram. We have built many clusters on EC2 this
>> way. I would try to use at least 5 or 10 nodes if you can. Just make sure to
>> spread the accumulo master, hadoop name more, and 3 zookeepers on separate
>> nodes. I'll update that wiki page with some more examples.
>>
>> Also note that you can use EBS storage which can be cheaper.
>>
>> Andrew
>>
>>
>>
>>
>> -------- Original message --------
>> From: Takashi Sasaki <tsasaki609@xxxxxxxxx>
>> Date: 12/13/16 20:19 (GMT-05:00)
>> To: geomesa-users@xxxxxxxxxxxxxxxx
>> Subject: [geomesa-users] How to avoid Tablet Server crashing, and server
>> cost problem...
>>
>> Hello,
>>
>> I use GeoMesa 1.2.3 with Accumulo 1.7.2 on AWS EMR which is release
>> emr-4.7.0, Hadoop 2.7.2, ZooKeeper-Sandbox 3.4.8.
>> I have two problem for the system management. One is Accumulo Tablet
>> server suddenly crashing, and other is Geomesa(Accumulo) hardware spec
>> requirement is too expensive for my company.
>>
>> 1.
>> I encountered too slow writting features to GeoMesa(Accumulo)
>> Datastore when Accumulo Tablet Server crashed, and GeoServer image
>> rendering respose is also slowing.
>> If Tablet Server alive, it is one or two minute for writting features,
>> but it is eight minutes or more in the above state.
>>
>> Probably, this cause is waitting Accumulo Master server for tablet
>> rebalancing.
>>
>> [Accumulo Master Server log]
>> 2016-12-13 03:34:26,008 [master.Master] WARN : Lost servers
>> [ip-10-24-83-37:9997[358b52962d101f8]]
>>
>> [Accumulo Tablet Server log]
>> 2016-12-13 03:34:47,943 [hdfs.DFSClient] WARN : DFSOutputStream
>> ResponseProcessor exception for block
>> BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460
>> java.io.EOFException: Premature EOF: no length prefix available
>> at
>> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
>> at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:734)
>> 2016-12-13 03:34:47,943 [zookeeper.ClientCnxn] WARN : Client session
>> timed out, have not heard from server in 53491ms for sessionid
>> 0x358b52962d101f8
>> 2016-12-13 03:34:47,944 [hdfs.DFSClient] WARN : Error Recovery for
>> block BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460
>> in pipeline
>> DatanodeInfoWithStorage[10.24.83.37:50010,DS-43dbba0e-2bbd-4f8e-8a07-3f869123925c,DISK],
>> DatanodeInfoWithStorage[10.24.83.39:50010,DS-f66dcecb-1b44-431c-83fc-ab343339c485,DISK]:
>> bad datanode
>> DatanodeInfoWithStorage[10.24.83.37:50010,DS-43dbba0e-2bbd-4f8e-8a07-3f869123925c,DISK]
>> 2016-12-13 03:34:48,444 [hdfs.DFSClient] WARN : DataStreamer Exception
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException):
>> BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
>> not exist or is not under Constructionblk_1074003283_264798
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
>> 2016-12-13 03:34:48,445 [hdfs.DFSClient] WARN : Error while syncing
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException):
>> BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
>> not exist or is not under Constructionblk_1074003283_264798
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
>> 2016-12-13 03:34:48,445 [log.DfsLogger] WARN : Exception syncing
>> java.lang.reflect.InvocationTargetException
>> 2016-12-13 03:34:48,463 [zookeeper.ClientCnxn] WARN : Unable to
>> reconnect to ZooKeeper service, session 0x358b52962d101f8 has expired
>> 2016-12-13 03:34:48,546 [log.DfsLogger] WARN : Exception syncing
>> java.lang.reflect.InvocationTargetException
>> 2016-12-13 03:34:48,546 [log.DfsLogger] ERROR: Failed to close log file
>> org.apache.hadoop.ipc.RemoteException(java.io.IOException):
>> BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
>> not exist or is not under Constructionblk_1074003283_264798
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
>> at
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>> at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>> at
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
>> at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
>> 2016-12-13 03:34:48,564 [zookeeper.ZooReader] WARN : Saw (possibly)
>> transient exception communicating with ZooKeeper
>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for
>> /accumulo/3a6635fa-2c60-4860-b8aa-56a2d654b419/tservers/ip-10-24-83-37:9997
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
>> at
>> org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java:132)
>> at org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
>> at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
>> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
>> 2016-12-13 03:34:48,565 [zookeeper.ZooCache] WARN : Saw (possibly)
>> transient exception communicating with ZooKeeper, will retry
>> org.apache.zookeeper.KeeperException$SessionExpiredException:
>> KeeperErrorCode = Session expired for
>> /accumulo/3a6635fa-2c60-4860-b8aa-56a2d654b419/tables/1n/namespace
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
>> at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:295)
>> at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:272)
>> at
>> org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(ZooCache.java:171)
>> at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:323)
>> at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:259)
>> at
>> org.apache.accumulo.core.client.impl.Tables.getNamespaceId(Tables.java:235)
>> at
>> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(TabletServer.java:600)
>> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
>> at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
>> at com.sun.proxy.$Proxy19.startMultiScan(Unknown Source)
>> at
>> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2330)
>> at
>> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2314)
>> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>> at
>> org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
>> at
>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
>> at
>> org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at
>> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>> at java.lang.Thread.run(Thread.java:745)
>> ---
>> (retry log many times)
>> ---
>> 2016-12-13 03:34:48,844 [tserver.TabletServer] ERROR: Lost tablet
>> server lock (reason = SESSION_EXPIRED), exiting.
>>
>> According this log, Tablet Server was accessing HDFS and ZooKeeper
>> session expired, server shutdowned self.
>>
>> How can I avoid Tablet Server crashing?
>>
>>
>> 2.
>> I use AWS EMR with r3.8xlarge instance for Accumulo Tablet server on 5
>> core node.
>> It is high performance(32 vCPU/ 244G memory/ 2 x 320GB SSD), but too
>> expensive.
>>
>> For GeoMesa Tuning Accumulo,
>> https://geomesa.atlassian.net/wiki/display/GEOMESA/Tuning+Accumulo#TuningAccumulo-SmallCluster,LargeServers
>> example hardware spec requirement is almost fitting for r3.8xlarge.
>>
>> But I want to cost down.
>> Is not Small Cluster, "Small" Servers example  hardware spec
>> requirement somewhere?
>>
>> If possible, I wanna use r3.xlarge(4 vCPU/ 30.5G memory/ 1 x 80GB SSD)
>> for Accumulo Tablet server.
>> How do you think?
>>
>>
>> Thank you,
>> Takashi
>> _______________________________________________
>> geomesa-users mailing list
>> geomesa-users@xxxxxxxxxxxxxxxx
>> To change your delivery options, retrieve your password, or unsubscribe from
>> this list, visit
>> https://www.locationtech.org/mailman/listinfo/geomesa-users
Follow-Ups:
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Takashi Sasaki
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Emilio Lahr-Vivaz
References:
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Andrew
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Takashi Sasaki
Prev by Date: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Next by Date: Re: [geomesa-users] How to write scala class that transforms JSONObject into SimpleFeatureType and writes it to the Geomesa Accumulo
Previous by thread: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Next by thread: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Index(es):
- Date
- Thread
Breadcrumbs