Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...

Hi Andrew,

I'm ingesting the data from Apache Spark program which is mapreduce,
but ingesting the data is after collecting data to driver node, so
actually single java program.

I'm not using timestamp attribute, so GeoMesa create Z2 index table.
I delete and re-ingest the data every few minutes for near realtime
image rendering by GeoServer.

Additional Question:
Accumulo(GeoMesa) Data Store exist "collectStats" parameter.
What kind of merit is there by using this?
I tried to find a document, but I could not find it.

Thank you for reply,

2016-12-14 13:22 GMT+09:00 Andrew <ahulbert@xxxxxxxx>:
> Trying again...
> Hi Takashi,
> I'll try to look into question #1 but note that I generally use ec2 instead
> so I'll have to see how it does zookeepers ... Question: How are you
> ingesting the data? Is it mapreduce or a single java program?
> But in the meantime for #2 you should certainly be able to use the r3.xlarge
> instances with 4 CPU and 30gig ram. We have built many clusters on EC2 this
> way. I would try to use at least 5 or 10 nodes if you can. Just make sure to
> spread the accumulo master, hadoop name more, and 3 zookeepers on separate
> nodes. I'll update that wiki page with some more examples.
> Also note that you can use EBS storage which can be cheaper.
> Andrew
> -------- Original message --------
> From: Takashi Sasaki <tsasaki609@xxxxxxxxx>
> Date: 12/13/16 20:19 (GMT-05:00)
> To: geomesa-users@xxxxxxxxxxxxxxxx
> Subject: [geomesa-users] How to avoid Tablet Server crashing, and server
> cost problem...
> Hello,
> I use GeoMesa 1.2.3 with Accumulo 1.7.2 on AWS EMR which is release
> emr-4.7.0, Hadoop 2.7.2, ZooKeeper-Sandbox 3.4.8.
> I have two problem for the system management. One is Accumulo Tablet
> server suddenly crashing, and other is Geomesa(Accumulo) hardware spec
> requirement is too expensive for my company.
> 1.
> I encountered too slow writting features to GeoMesa(Accumulo)
> Datastore when Accumulo Tablet Server crashed, and GeoServer image
> rendering respose is also slowing.
> If Tablet Server alive, it is one or two minute for writting features,
> but it is eight minutes or more in the above state.
> Probably, this cause is waitting Accumulo Master server for tablet
> rebalancing.
> [Accumulo Master Server log]
> 2016-12-13 03:34:26,008 [master.Master] WARN : Lost servers
> [ip-10-24-83-37:9997[358b52962d101f8]]
> [Accumulo Tablet Server log]
> 2016-12-13 03:34:47,943 [hdfs.DFSClient] WARN : DFSOutputStream
> ResponseProcessor exception for block
> BP-1424542533-
> Premature EOF: no length prefix available
> at
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$
> 2016-12-13 03:34:47,943 [zookeeper.ClientCnxn] WARN : Client session
> timed out, have not heard from server in 53491ms for sessionid
> 0x358b52962d101f8
> 2016-12-13 03:34:47,944 [hdfs.DFSClient] WARN : Error Recovery for
> block BP-1424542533-
> in pipeline
> DatanodeInfoWithStorage[,DS-43dbba0e-2bbd-4f8e-8a07-3f869123925c,DISK],
> DatanodeInfoWithStorage[,DS-f66dcecb-1b44-431c-83fc-ab343339c485,DISK]:
> bad datanode
> DatanodeInfoWithStorage[,DS-43dbba0e-2bbd-4f8e-8a07-3f869123925c,DISK]
> 2016-12-13 03:34:48,444 [hdfs.DFSClient] WARN : DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(
> BP-1424542533- does
> not exist or is not under Constructionblk_1074003283_264798
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> at org.apache.hadoop.ipc.RPC$
> at org.apache.hadoop.ipc.Server$Handler$
> at org.apache.hadoop.ipc.Server$Handler$
> at Method)
> at
> at
> at org.apache.hadoop.ipc.Server$
> at
> at
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> at java.lang.reflect.Method.invoke(
> at
> at
> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(
> at
> org.apache.hadoop.hdfs.DFSOutputStream$
> 2016-12-13 03:34:48,445 [hdfs.DFSClient] WARN : Error while syncing
> org.apache.hadoop.ipc.RemoteException(
> BP-1424542533- does
> not exist or is not under Constructionblk_1074003283_264798
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> at org.apache.hadoop.ipc.RPC$
> at org.apache.hadoop.ipc.Server$Handler$
> at org.apache.hadoop.ipc.Server$Handler$
> at Method)
> at
> at
> at org.apache.hadoop.ipc.Server$
> at
> at
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> at java.lang.reflect.Method.invoke(
> at
> at
> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(
> at
> org.apache.hadoop.hdfs.DFSOutputStream$
> 2016-12-13 03:34:48,445 [log.DfsLogger] WARN : Exception syncing
> java.lang.reflect.InvocationTargetException
> 2016-12-13 03:34:48,463 [zookeeper.ClientCnxn] WARN : Unable to
> reconnect to ZooKeeper service, session 0x358b52962d101f8 has expired
> 2016-12-13 03:34:48,546 [log.DfsLogger] WARN : Exception syncing
> java.lang.reflect.InvocationTargetException
> 2016-12-13 03:34:48,546 [log.DfsLogger] ERROR: Failed to close log file
> org.apache.hadoop.ipc.RemoteException(
> BP-1424542533- does
> not exist or is not under Constructionblk_1074003283_264798
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkUCBlock(
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.updateBlockForPipeline(
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> at org.apache.hadoop.ipc.RPC$
> at org.apache.hadoop.ipc.Server$Handler$
> at org.apache.hadoop.ipc.Server$Handler$
> at Method)
> at
> at
> at org.apache.hadoop.ipc.Server$
> at
> at
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
> at com.sun.proxy.$Proxy15.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.updateBlockForPipeline(
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> at java.lang.reflect.Method.invoke(
> at
> at
> at com.sun.proxy.$Proxy16.updateBlockForPipeline(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(
> at
> org.apache.hadoop.hdfs.DFSOutputStream$
> 2016-12-13 03:34:48,564 [zookeeper.ZooReader] WARN : Saw (possibly)
> transient exception communicating with ZooKeeper
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /accumulo/3a6635fa-2c60-4860-b8aa-56a2d654b419/tservers/ip-10-24-83-37:9997
> at org.apache.zookeeper.KeeperException.create(
> at org.apache.zookeeper.KeeperException.create(
> at org.apache.zookeeper.ZooKeeper.exists(
> at
> org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(
> at org.apache.accumulo.fate.zookeeper.ZooLock.process(
> at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(
> at org.apache.zookeeper.ClientCnxn$
> 2016-12-13 03:34:48,565 [zookeeper.ZooCache] WARN : Saw (possibly)
> transient exception communicating with ZooKeeper, will retry
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for
> /accumulo/3a6635fa-2c60-4860-b8aa-56a2d654b419/tables/1n/namespace
> at org.apache.zookeeper.KeeperException.create(
> at org.apache.zookeeper.KeeperException.create(
> at org.apache.zookeeper.ZooKeeper.exists(
> at org.apache.accumulo.fate.zookeeper.ZooCache$
> at org.apache.accumulo.fate.zookeeper.ZooCache$
> at
> org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(
> at org.apache.accumulo.fate.zookeeper.ZooCache.get(
> at org.apache.accumulo.fate.zookeeper.ZooCache.get(
> at
> org.apache.accumulo.core.client.impl.Tables.getNamespaceId(
> at
> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(
> at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> at java.lang.reflect.Method.invoke(
> at
> org.apache.accumulo.core.trace.wrappers.RpcServerInvocationHandler.invoke(
> at org.apache.accumulo.server.rpc.RpcWrapper$1.invoke(
> at com.sun.proxy.$Proxy19.startMultiScan(Unknown Source)
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(
> at
> org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(
> at org.apache.thrift.ProcessFunction.process(
> at org.apache.thrift.TBaseProcessor.process(
> at
> org.apache.accumulo.server.rpc.TimedProcessor.process(
> at
> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(
> at
> org.apache.accumulo.server.rpc.CustomNonBlockingServer$
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> at
> java.util.concurrent.ThreadPoolExecutor$
> at
> at
> ---
> (retry log many times)
> ---
> 2016-12-13 03:34:48,844 [tserver.TabletServer] ERROR: Lost tablet
> server lock (reason = SESSION_EXPIRED), exiting.
> According this log, Tablet Server was accessing HDFS and ZooKeeper
> session expired, server shutdowned self.
> How can I avoid Tablet Server crashing?
> 2.
> I use AWS EMR with r3.8xlarge instance for Accumulo Tablet server on 5
> core node.
> It is high performance(32 vCPU/ 244G memory/ 2 x 320GB SSD), but too
> expensive.
> For GeoMesa Tuning Accumulo,
> example hardware spec requirement is almost fitting for r3.8xlarge.
> But I want to cost down.
> Is not Small Cluster, "Small" Servers example  hardware spec
> requirement somewhere?
> If possible, I wanna use r3.xlarge(4 vCPU/ 30.5G memory/ 1 x 80GB SSD)
> for Accumulo Tablet server.
> How do you think?
> Thank you,
> Takashi
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from
> this list, visit

Back to the top