Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
From: Takashi Sasaki <tsasaki609@xxxxxxxxx>
Date: Wed, 14 Dec 2016 15:23:45 +0900
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
Hi Andrew,

Answer:
I'm ingesting the data from Apache Spark program which is mapreduce,
but ingesting the data is after collecting data to driver node, so
actually single java program.

Note:
I'm not using timestamp attribute, so GeoMesa create Z2 index table.
I delete and re-ingest the data every few minutes for near realtime
image rendering by GeoServer.

Additional Question:
Accumulo(GeoMesa) Data Store exist "collectStats" parameter.
What kind of merit is there by using this?
I tried to find a document, but I could not find it.

Thank you for reply,
Takashi

2016-12-14 13:22 GMT+09:00 Andrew <ahulbert@xxxxxxxx>:
> Trying again...
>
> Hi Takashi,
>
> I'll try to look into question #1 but note that I generally use ec2 instead
> so I'll have to see how it does zookeepers ... Question: How are you
> ingesting the data? Is it mapreduce or a single java program?
>
> But in the meantime for #2 you should certainly be able to use the r3.xlarge
> instances with 4 CPU and 30gig ram. We have built many clusters on EC2 this
> way. I would try to use at least 5 or 10 nodes if you can. Just make sure to
> spread the accumulo master, hadoop name more, and 3 zookeepers on separate
> nodes. I'll update that wiki page with some more examples.
>
> Also note that you can use EBS storage which can be cheaper.
>
> Andrew
>
>
>
>
> -------- Original message --------
> From: Takashi Sasaki <tsasaki609@xxxxxxxxx>
> Date: 12/13/16 20:19 (GMT-05:00)
> To: geomesa-users@xxxxxxxxxxxxxxxx
> Subject: [geomesa-users] How to avoid Tablet Server crashing, and server
> cost problem...
>
> Hello,
>
> I use GeoMesa 1.2.3 with Accumulo 1.7.2 on AWS EMR which is release
> emr-4.7.0, Hadoop 2.7.2, ZooKeeper-Sandbox 3.4.8.
> I have two problem for the system management. One is Accumulo Tablet
> server suddenly crashing, and other is Geomesa(Accumulo) hardware spec
> requirement is too expensive for my company.
>
> 1.
> I encountered too slow writting features to GeoMesa(Accumulo)
> Datastore when Accumulo Tablet Server crashed, and GeoServer image
> rendering respose is also slowing.
> If Tablet Server alive, it is one or two minute for writting features,
> but it is eight minutes or more in the above state.
>
> Probably, this cause is waitting Accumulo Master server for tablet
> rebalancing.
>
> [Accumulo Master Server log]
> 2016-12-13 03:34:26,008 [master.Master] WARN : Lost servers
> [ip-10-24-83-37:9997[358b52962d101f8]]
>
> [Accumulo Tablet Server log]
> 2016-12-13 03:34:47,943 [hdfs.DFSClient] WARN : DFSOutputStream
> ResponseProcessor exception for block
> BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460
> java.io.EOFException: Premature EOF: no length prefix available
> at
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:244)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:734)
> 2016-12-13 03:34:47,943 [zookeeper.ClientCnxn] WARN : Client session
> timed out, have not heard from server in 53491ms for sessionid
> 0x358b52962d101f8
> 2016-12-13 03:34:47,944 [hdfs.DFSClient] WARN : Error Recovery for
> block BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460
> in pipeline
> DatanodeInfoWithStorage[10.24.83.37:50010,DS-43dbba0e-2bbd-4f8e-8a07-3f869123925c,DISK],
> DatanodeInfoWithStorage[10.24.83.39:50010,DS-f66dcecb-1b44-431c-83fc-ab343339c485,DISK]:
> bad datanode
> DatanodeInfoWithStorage[10.24.83.37:50010,DS-43dbba0e-2bbd-4f8e-8a07-3f869123925c,DISK]
> 2016-12-13 03:34:48,444 [hdfs.DFSClient] WARN : DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException):
> BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
> not exist or is not under Constructionblk_1074003283_264798
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
> 2016-12-13 03:34:48,445 [hdfs.DFSClient] WARN : Error while syncing
> org.apache.hadoop.ipc.RemoteException(java.io.IOException):
> BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
> not exist or is not under Constructionblk_1074003283_264798
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
> 2016-12-13 03:34:48,445 [log.DfsLogger] WARN : Exception syncing
> java.lang.reflect.InvocationTargetException
> 2016-12-13 03:34:48,463 [zookeeper.ClientCnxn] WARN : Unable to
> reconnect to ZooKeeper service, session 0x358b52962d101f8 has expired
> 2016-12-13 03:34:48,546 [log.DfsLogger] WARN : Exception syncing
> java.lang.reflect.InvocationTargetException
> 2016-12-13 03:34:48,546 [log.DfsLogger] ERROR: Failed to close log file
> org.apache.hadoop.ipc.RemoteException(java.io.IOException):
> BP-1424542533-10.24.83.115-1481002292587:blk_1074003283_262460 does
> not exist or is not under Constructionblk_1074003283_264798
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(FSNamesystem.java:6239)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(FSNamesystem.java:6306)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(NameNodeRpcServer.java:805)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolServerSideTranslatorPB.java:955)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(ClientNamenodeProtocolTranslatorPB.java:901)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1173)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:876)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:402)
> 2016-12-13 03:34:48,564 [zookeeper.ZooReader] WARN : Saw (possibly)
> transient exception communicating with ZooKeeper
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /accumulo/3a6635fa-2c60-4860-b8aa-56a2d654b419/tservers/ip-10-24-83-37:9997
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
> at
> org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java:132)
> at org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
> 2016-12-13 03:34:48,565 [zookeeper.ZooCache] WARN : Saw (possibly)
> transient exception communicating with ZooKeeper, will retry
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /accumulo/3a6635fa-2c60-4860-b8aa-56a2d654b419/tables/1n/namespace
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1102)
> at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:295)
> at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:272)
> at
> org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(ZooCache.java:171)
> at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:323)
> at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:259)
> at
> org.apache.accumulo.core.client.impl.Tables.getNamespaceId(Tables.java:235)
> at
> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(TabletServer.java:600)
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(RpcServerInvocationHandler.java:46)
> at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(RpcWrapper.java:74)
> at com.sun.proxy.$Proxy19.startMultiScan(Unknown Source)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2330)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2314)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at
> org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
> at
> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:516)
> at
> org.apache.accumulo.server.rpc.CustomNonBlockingServer$1.run(CustomNonBlockingServer.java:78)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> at java.lang.Thread.run(Thread.java:745)
> ---
> (retry log many times)
> ---
> 2016-12-13 03:34:48,844 [tserver.TabletServer] ERROR: Lost tablet
> server lock (reason = SESSION_EXPIRED), exiting.
>
> According this log, Tablet Server was accessing HDFS and ZooKeeper
> session expired, server shutdowned self.
>
> How can I avoid Tablet Server crashing?
>
>
> 2.
> I use AWS EMR with r3.8xlarge instance for Accumulo Tablet server on 5
> core node.
> It is high performance(32 vCPU/ 244G memory/ 2 x 320GB SSD), but too
> expensive.
>
> For GeoMesa Tuning Accumulo,
> https://geomesa.atlassian.net/wiki/display/GEOMESA/Tuning+Accumulo#TuningAccumulo-SmallCluster,LargeServers
> example hardware spec requirement is almost fitting for r3.8xlarge.
>
> But I want to cost down.
> Is not Small Cluster, "Small" Servers example  hardware spec
> requirement somewhere?
>
> If possible, I wanna use r3.xlarge(4 vCPU/ 30.5G memory/ 1 x 80GB SSD)
> for Accumulo Tablet server.
> How do you think?
>
>
> Thank you,
> Takashi
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from
> this list, visit
> https://www.locationtech.org/mailman/listinfo/geomesa-users
Follow-Ups:
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Takashi Sasaki
References:
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Andrew
Prev by Date: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Next by Date: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Previous by thread: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Next by thread: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Index(es):
- Date
- Thread
Breadcrumbs