Hi,
I have a GeoMesa table hosted by HBase cluster. After
switching from GeoMesa 2.4.1 to 3.0.0, some queries started to
fail due to "ZooKeeper session timeout".
Stack trace from my app:
...
1604525339987,"java.util.NoSuchElementException: Could not
obtain the next
feature:org.apache.hadoop.hbase.client.RetriesExhaustedException:
Failed after attempts=3, exceptions:"
1604525339987,"Wed Nov 04 20:55:03 UTC 2020,
RpcRetryingCaller{globalStartTime=1604523178583, pause=100,
retries=3}, java.io.IOException: Call to
ip-10-0-22-145.ec2.internal/
10.0.22.145:16020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
Call id=5864, waitTime=73235, rpcTimetout=60000"
1604525339987,"Wed Nov 04 20:56:19 UTC 2020,
RpcRetryingCaller{globalStartTime=1604523178583, pause=100,
retries=3},
org.apache.hadoop.hbase.client.RetriesExhaustedException:
Can't get the location for replica 0"
1604525339987,"Wed Nov 04 20:57:48 UTC 2020,
RpcRetryingCaller{globalStartTime=1604523178583, pause=100,
retries=3}, java.io.IOException: Call to
ip-10-0-22-145.ec2.internal/
10.0.22.145:16020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
Call id=5869, waitTime=66664, rpcTimetout=60000"
1604525339987, at
org.geotools.feature.FeatureReaderIterator.next(FeatureReaderIterator.java:75)
~[geomesa-hbase-spark-runtime-hbase1_2.11-3.0.0.jar:?]
1604525339987, at
org.geotools.feature.FeatureReaderIterator.next(FeatureReaderIterator.java:42)
~[geomesa-hbase-spark-runtime-hbase1_2.11-3.0.0.jar:?]
1604525339987, at
org.geotools.feature.collection.DelegateFeatureIterator.next(DelegateFeatureIterator.java:52)
~[geomesa-hbase-spark-runtime-hbase1_2.11-3.0.0.jar:?]
...
I also found warnings from the log:
...
1604523293568,"04 Nov 2020 20:54:11,817 [33m[WARN] [m
(pool-9-thread-13-SendThread(ip-10-0-21-114.ec2.internal:2181))
org.apache.zookeeper.ClientCnxn: Client session timed out,
have not heard from server in 62688ms for sessionid
0x47045cd3d"
1604523335819,"04 Nov 2020 20:55:35,795 [33m[WARN] [m
(pool-9-thread-13-SendThread(ip-10-0-21-114.ec2.internal:2181))
org.apache.zookeeper.ClientCnxn: Client session timed out,
have not heard from server in 42227ms for sessionid
0x47045cd3d"
1604523368963,"04 Nov 2020 20:56:08,963 [33m[WARN] [m
(pool-9-thread-13-SendThread(ip-10-0-21-114.ec2.internal:2181))
org.apache.zookeeper.ClientCnxn: Unable to reconnect to
ZooKeeper service, session 0x47045cd3d has expired"
1604523368963,"04 Nov 2020 20:56:08,963 [33m[WARN] [m
(pool-9-thread-13-EventThread)
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation:
This client just lost it's session with ZooKeeper, closing it.
It will be recreated next time someone needs it"
1604523379744,"04 Nov 2020 20:56:19,742 [33m[WARN] [m
(pool-9-thread-18) org.apache.hadoop.hbase.zookeeper.ZKUtil:
hconnection-0x31aa9b01-0x47045cd3d,
quorum=ip-10-0-21-114.ec2.internal:2181, baseZNode=/hbase
Unable to get data of znode
/hbase/table/test_TestTable_xz3_geom_timestamp_v2"
1604523379748,"04 Nov 2020 20:56:19,742 [33m[WARN] [m
(pool-9-thread-13) org.apache.hadoop.hbase.zookeeper.ZKUtil:
hconnection-0x31aa9b01-0x47045cd3d,
quorum=ip-10-0-21-114.ec2.internal:2181, baseZNode=/hbase
Unable to get data of znode
/hbase/table/test_TestTable_xz3_geom_timestamp_v2"
1604523379749,"04 Nov 2020 20:56:19,742 [33m[WARN] [m
(pool-9-thread-19) org.apache.hadoop.hbase.zookeeper.ZKUtil:
hconnection-0x31aa9b01-0x47045cd3d,
quorum=ip-10-0-21-114.ec2.internal:2181, baseZNode=/hbase
Unable to get data of znode
/hbase/table/test_TestTable_xz3_geom_timestamp_v2"
...
The query which triggered this issue:
INTERSECTS(geom,POLYGON ((-100.78857422 28.58452172,
-100.78857422 31.273855990000005, -93.71337890999999
31.273855990000005, -93.71337890999999 28.58452172,
-100.78857422 28.58452172))) AND timestamp <= '2020-10-23
16:52:20' AND timestamp > '2019-08-01 00:00:00'
The size of the test_TestTable_xz3_geom_timestamp_v2 table
is around 272 GB (GZ compressed), and the output data size of
this query is around 1.7GB (uncompressed).
I am able to reproduce the issue with this query pretty
consistently. And it would succeed if I just replaced the
GeoMesa jar in the classpath from 3.0.0/3.1.0 to 2.4.1.
I will keep looking into what got changed between the
releases, but would like to see if others are also
experiencing similar issues or can provide some insights on
it.
Regards,
Jun Cai