Re: [geomesa-users] NULL Values in Indexed Attributes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] NULL Values in Indexed Attributes

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Fri, 11 Jan 2019 15:36:18 -0500
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

Hello,

If you want to update a feature through the geotools API, you need to use a modifying feature writer. There are examples of how to do so in the unit tests, for example:

https://github.com/locationtech/geomesa/blob/geomesa_2.11-2.2.0/geomesa-accumulo/geomesa-accumulo-datastore/src/test/scala/org/locationtech/geomesa/accumulo/data/AccumuloDataStoreTest.scala#L571-L580

As an implementation detail, when using GeoMesa you can sometimes update features through the standard appending writer. This works as long as none of the indexed values (geom, dtg, plus attributes) change, as the same index key will be used, and then the old values will be overwritten by the new values. However, as you have experienced, it doesn't work if any of the indexed values change - then you end up with multiple keys and duplicate features.

Note that a modifying write is usually multiple times slower than an appending write, as we have to 1) query the old value, 2) write a delete marker, and 3) write a new key/value for the update. So if you know that none of your indexed keys have changed, it can be advantageous to append instead of modify. If you are using Accumulo and making frequent updates, you may want to consider using the lambda data store, which keeps recent features in Kafka for fast modifications, and then persists to Accumulo for long-term storage: https://www.geomesa.org/documentation/user/lambda/index.html

Thanks,

Emilio

On 1/11/19 2:10 PM, BKRuns26.2 wrote:

As a sort of follow on to this, we have switched to using a place holder value, but now we are running into a new issue.

One of our scenarios is we write data that we later update (same FID, just certain attributes have changed). When we update an attribute that is indexed we end up being able to filter on both the new and the old value as if the old attribute index was not removed when the value was changed.

As a simple example:

-I worked up an feature with 3 attributes: geom, date and index_val. The index_val has a full index on it.

-I write a new record and put 'ABC' into index_val.

-I then grab that record via the FID and re-write it with index_val 'XYZ'. Nothing else about the data changes (same fid, date and geom).

-If I run a filter on the FID I only get 1 record with the XYZ as the index_vale

-if I query index_val = 'XYZ' it returns data as expected.

-But if I query index_val = 'ABC' it also still returns data, which with the record having been updated I would not have expected it to do

Any thoughts on how to handle this scenario or if there is another step I need to take to remove attribute indexes?

Thanks,

Brad
On Fri, Jan 4, 2019 at 12:32 PM Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Hello,

Yes, using a placeholder value should work for now. We could potentially support null values in the attribute index - if you're interested in contributing to that let us know. Up until now, people have only been interested in 'not null' queries.

From a brief search, it appears that we don't even have any unit tests for 'is null' queries, so it is definitely possible that they aren't working correctly.

Thanks,

Emilio

On 1/4/19 12:16 PM, BKRuns26.2 wrote:
Emilio,

After adding the explain logging, it does appear that is is attempting to do a full table scan, but it is still not returning any results. Unfortunately I cannot send out the results as it is running on a closed system. I am going to work on setting up a similar environment on our test system and will get you the results.

I assume the best approach (at least for Strings) would be to avoid NULL and use empty strings for values with no data or possibly the String NULL?

Thanks,

Brad
On Thu, Jan 3, 2019 at 4:07 PM Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
Hello,

Our attribute indices don't store null values, so if you query for "is null" it's going to end up doing a full table scan, which can take a while. That said, it should return eventually. If you have enabled query timeouts or blocked full table scans, that may be preventing a result.

Can you enable explain logging, and reply back with the results?

https://www.geomesa.org/documentation/user/geoserver.html#logging-explain-query-planning

Thanks,

Emilio

On 1/3/19 3:06 PM, BKRuns26.2 wrote:
We are running into an issue of not being able to filter null values on Indexed attributes.

We added a set of data that has several indexed attributes (full indexes) for improved query performance that sometimes contain null values. Filtering on the attributes via GeoServer works fine unless we are trying to filter for all values that are null. The standard GeoServer syntax of "is null" never returns an empty feature collection yet we know null values exist. No errors appear in the GeoServer log. If we filter "is not null" it returns only values that aren't null so that seems to work. Just the is null does not work.

Thanks,

Brad
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

References:
- [geomesa-users] NULL Values in Indexed Attributes
  - From: BKRuns26.2
- Re: [geomesa-users] NULL Values in Indexed Attributes
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] NULL Values in Indexed Attributes
  - From: BKRuns26.2
- Re: [geomesa-users] NULL Values in Indexed Attributes
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] NULL Values in Indexed Attributes
  - From: BKRuns26.2

Prev by Date: Re: [geomesa-users] NULL Values in Indexed Attributes
Next by Date: [geomesa-users] Geomesa compatibility with hbase
Previous by thread: Re: [geomesa-users] NULL Values in Indexed Attributes
Next by thread: [geomesa-users] Geomesa compatibility with hbase
Index(es):
- Date
- Thread

Breadcrumbs