Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Questions on filter method of dataframe

Hello,

1. The lambda data store isn't currently implemented for HBase. The underlying store is back-end agnostic, but you would need to create a data store factory that is either a) aware of the hbase data store params, or b) accepts arbitrary data store params, and then you would need to bundle the classpath correctly for hbase + kafka. Because of all the various permutations, we haven't implemented that yet, but if you're interested it would be a good contribution.

2. The kafka data store doesn't use a z-index. The data is stored as kafka messages, one per feature. When you start up a data store, it will read from kafka and populate an in-memory index with the features, which will be used for queries. If you read the kafka data into a data frame, you would have to read the whole topic, filter it, and then create in-memory indices on your data frame, similarly to what I mentioned below.

Thanks,

Emilio

On 10/22/19 2:44 AM, Yifan Wang wrote:
Hi,

Thank you for your help! I got two more questions on Geomesa DataStore.
1. How can I store persistent data via GeoMesa Lambda DataStore to Hbase?  (I noticed that when creating  Lambda DataStore, I have to set configurations for accumulo, 
does it mean I can only store persistent data to accumulo via Lambda DataStore?).And can I read data from Lambda DataStore kafka via Spark?
2. If I store data to Geomesa Kafka DataStore via Spark with Z-Index, will the index still be working when I read data via Structured Streaming and deserialize the SimpleFeature to DataFrame?

Best Regards,
Evan


On Mon, Oct 21, 2019 at 8:51 PM Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Hello,

Once you load the dataframe from HBase, it will no longer use the z3 index. There are various options to create in-memory indices on an existing data frame - see https://www.geomesa.org/documentation/user/spark/sparksql.html#in-memory-indexing and https://www.geomesa.org/documentation/user/spark/sparksql.html#spatial-partitioning-and-faster-joins

Thanks,

Emilio

On 10/16/19 10:29 PM, Yifan Wang wrote:
Hi, 

Recently I came across a problem on filter method of dataframe, 
I am wondering that if I got a dataframe reading from Hbase via spark session,
and then built cache for the dataframe, after that I used filter method like st_within on the dataframe,
will the z3-index still be used or not? Thank you!

Best regards,
Evan


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://urldefense.proofpoint.com/v2/url?u=https-3A__dev.locationtech.org_mailman_listinfo_geomesa-2Dusers&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=bFbOU0xECph47qaxajb1IA&m=7bEYFy3-Ww5btqcNx4qBwNu1dfFQCVZHWEWKYkYvofI&s=pju1vauN7PqE1V5ZoZARXNDZ1-J6NY0mHZ6VRaGr88o&e=

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top