[geomesa-dev] [JIRA] (GEOMESA-2665) GeoMesa Accumulo write Spark DataFra

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[geomesa-dev] [JIRA] (GEOMESA-2665) GeoMesa Accumulo write Spark DataFrame to Kerberos-enabled Accumulo

From: Bogusław Kaczałek (JIRA) <jira@xxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 11 Jul 2019 12:05:38 +0000 (UTC)
Auto-submitted: auto-generated
Delivered-to: geomesa-dev@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-dev>
List-help: <mailto:geomesa-dev-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-dev>, <mailto:geomesa-dev-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-dev>, <mailto:geomesa-dev-request@locationtech.org?subject=unsubscribe>

Title: Message Title

Issue Type:	Bug
Affects Versions:	2.3.0
Assignee:	Unassigned
Attachments:	load.log, save.log
Created:	11/Jul/19 8:05 AM
Environment:	Spark 2.4.3, OpenJDK 8 Hadoop 3.1.0 installed and managed with Apache Ambari with Ambari enabled Kerberos.
Priority:	Major
Reporter:	Bogusław Kaczałek

I have Hadoop environment setup witn Ambari, Hadoop 3.1.0, YARN, Spark 2.4.3 and Accumulo 1.9.3.
The cluster i Kerberos-enabled with standard Ambari settings. I can successfully access GeoMesa data in Accumulo from PySpark (running using YARN) using KeyTab file - DataFrame gets populated with data, i can perform operations on it, show it etc.
To access data I use code below

 
                                                                params = {
    "accumulo.instance.id": "hdp-accumulo-instance",
    "accumulo.zookeepers": "cenagis-mesos-slave2.cenagis.pl:2181,cenagis-mesos-slave1.cenagis.pl:2181,cenagis-mesos-slave3.cenagis.pl:2181,cenagis-mesos-slave5.cenagis.pl:2181,cenagis-mesos-master.cenagis.pl:2181",
    "accumulo.catalog": "cenagis.geonames",
    "accumulo.user": "administrator@xxxxxxxxxx",
    "accumulo.keytab.path": "/home/administrator/admiministrator.keytab"
}
feature = "geonames"
df = ( spark
    .read
    .format("geomesa")
    .options(**params)
    .option("geomesa.feature", feature)
    .load()
)
df.show()
 
                                                            

However if i try to write data back to Accumulo using code below

 
feature = "geonames2"
df.write.format("geomesa").options(**params).option("geomesa.feature", feature).save()

it does not work. Process just hangs. I can see errors with authentication in Spark executor logs (attached).
I have also attached logs from succesfull run of read operation described above, executed in the same spark context.

Add Comment

Get Jira notifications on your phone! Download the Jira Cloud app for Android or iOS

This message was sent by Atlassian Jira

Prev by Date: [geomesa-dev] [JIRA] (GEOMESA-2664) Bigtable - table names can exceed the limit
Next by Date: [geomesa-dev] [JIRA] (GEOMESA-2666) Upgrade the default C* client jar versions
Previous by thread: [geomesa-dev] [JIRA] (GEOMESA-2664) Bigtable - table names can exceed the limit
Next by thread: [geomesa-dev] [JIRA] (GEOMESA-2666) Upgrade the default C* client jar versions
Index(es):
- Date
- Thread

Breadcrumbs