I have Hadoop environment setup witn Ambari, Hadoop 3.1.0, YARN, Spark 2.4.3 and Accumulo 1.9.3. The cluster i Kerberos-enabled with standard Ambari settings. I can successfully access GeoMesa data in Accumulo from PySpark (running using YARN) using KeyTab file - DataFrame gets populated with data, i can perform operations on it, show it etc. To access data I use code below
params = {
"accumulo.instance.id": "hdp-accumulo-instance",
"accumulo.zookeepers": "cenagis-mesos-slave2.cenagis.pl:2181,cenagis-mesos-slave1.cenagis.pl:2181,cenagis-mesos-slave3.cenagis.pl:2181,cenagis-mesos-slave5.cenagis.pl:2181,cenagis-mesos-master.cenagis.pl:2181",
"accumulo.catalog": "cenagis.geonames",
"accumulo.user": "administrator@xxxxxxxxxx",
"accumulo.keytab.path": "/home/administrator/admiministrator.keytab"
}
feature = "geonames"
df = ( spark
.read
.format("geomesa")
.options(**params)
.option("geomesa.feature", feature)
.load()
)
df.show()
However if i try to write data back to Accumulo using code below
feature = "geonames2"
df.write.format("geomesa").options(**params).option("geomesa.feature", feature).save()
it does not work. Process just hangs. I can see errors with authentication in Spark executor logs (attached). I have also attached logs from succesfull run of read operation described above, executed in the same spark context. |