I am seeing serialization failure while writing features of
certain length with GeoMesaFeatureWriter. Here is part of the
stack trace:
Suppressed: java.lang.IllegalArgumentException: Error indexing feature '1804882973#10760608:MULTIPOLYGON
...
at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter$class.writeFeature(GeoMesaFeatureWriter.scala:55) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter$TableFeatureWriter.writeFeature(GeoMesaFeatureWriter.scala:141) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter$GeoMesaAppendFeatureWriter$class.write(GeoMesaFeatureWriter.scala:227) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
at org.locationtech.geomesa.index.geotools.GeoMesaFeatureWriter$$anon$3.write(GeoMesaFeatureWriter.scala:108) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
at org.locationtech.geomesa.utils.geotools.FeatureUtils$.write(FeatureUtils.scala:141) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
...
Caused by: java.lang.ArrayIndexOutOfBoundsException: 131072
at org.locationtech.geomesa.features.kryo.impl.KryoFeatureSerialization$class.writeFeature(KryoFeatureSerialization.scala:91) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
at org.locationtech.geomesa.features.kryo.impl.KryoFeatureSerialization$class.serialize(KryoFeatureSerialization.scala:42) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
at org.locationtech.geomesa.features.kryo.KryoFeatureSerializer$MutableActiveSerializer.serialize(KryoFeatureSerializer.scala:78) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
at org.locationtech.geomesa.index.api.WritableFeature$FeatureLevelWritableFeature$$anonfun$values$1$$anonfun$apply$1.apply(WritableFeature.scala:154) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
at org.locationtech.geomesa.index.api.WritableFeature$FeatureLevelWritableFeature$$anonfun$values$1$$anonfun$apply$1.apply(WritableFeature.scala:154) ~[geomesa-hbase-spark-runtime_custom_2.11-2.4.1-20200904.jar:?]
...
Root Cause
With some debugging, I found the issue was in KryoFeatureSerialization at line 89 to 91:
buffer(i + shift) = buffer(i)
where it moves content in Kryo Output buffer by <shift> bytes starting from end position of current feature content which is acquired from line 78:
val end = output.position()
The output.position() here is the number of bytes that have not been flushed. In the output buffer, the zero-based index of the last byte is actually (end - 1).
This means the code copies one more byte than needed and it becomes an issue when
output.getBuffer.length = end + shift
which returns false for the buffer expansion check at line 82
if (output.getBuffer.length < end + shift) {
and causes the ArrayIndexOutOfBoundsException when accessing buffer(i + shift).
An example from my use case, the feature content length is 131057 (end = 131058), shift is 14, output.getBuffer.length is 131072. When serializing this feature,
it skipped the buffer expansion and tried to access buffer(131058 + 14), which is buffer(131072), and caused the ArrayIndexOutOfBoundsException.
Proposed Fix
Change line 89 from
var i = end
to
var i = end - 1
Per my understanding of the code and local testing, this change shouldn't cause compatibility issue in feature serialization/deserialization. But would like to get it reviewed by more folks.
Regards,
Jun Cai