Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-dev] Kafka datastore contribution request

Hi Chad,

All that sounds great.  I agree about finding a particular use case.  (We've all made it this far without the capability record/time read-behind.  I feel like it comes up more often for developers than production.)

I've invited you to GeoMesa's JIRA, and created a ticket here: https://geomesa.atlassian.net/browse/GEOMESA-1728.  As a note, you'll need to sign an Eclipse Contributor Agreement with the email address you use to create the PR (http://www.eclipse.org/legal/ECA.php).

Thanks for contributing!

Jim

On 03/15/2017 04:22 PM, Chad Phillips wrote:
I hadn't considered the case for specifying a read-behind.  I could see how it'd be useful when say the retention period on the topic itself may be a couple of days but you'd want to read behind maybe just an hour or so to prime the cache and then get anything new that comes in after that.  I could also see going back a specific number of messages being convenient when either the timespan of the data is long (e.g.: for low volume data) or the user might just want to prime the cache based on the last x messages.  Allowing for the combo of those settings is another route to go down where the read behind will start at the offset which was found at the point where either of the criteria was first met.

If you create a JIRA ticket for the auto.offset.reset task, I can work on that feature initially and create a pull request.  For the time and/or count based read behind feature you described, I'd need to find some more concrete use cases that require a specific look behind config before going down that route but, I definitely agree that it'd be a better long term solution to allow for that level of flexibility.

On Wed, Mar 15, 2017 at 5:43 AM, Jim Hughes <jnh5y@xxxxxxxx> wrote:
Hi Chad,

This sounds like a great feature.  And thanks for starting a discussion first!  For general contributing requirements, check out https://github.com/locationtech/geomesa/blob/master/CONTRIBUTING.md

Ideally, we want a little more control than 'start now' and 'start at the beginning'.  From a quick read, there are two separate configurations which could be exposed.  Your suggestion for exposing 'auto.offset.reset' is one of them, and that's a concrete piece of work. 

It'd also be great to specify a 'read-behind' number or time period.  This is slightly separate from the Kafka config mentioned  above, and would allow for fine-grained control.  For instance, a user could specify an offset of 1000 messages.  If they specify a time window, it is easy enough to use a binary search through the Kafka WAL to find an appropriate offset.  (This is already written in the Kafka ReplayDataStore bits.)

The two ideas are related, but can be worked out separately.  (That is, I'm not trying to suggest that any contribution would have to solve both.)  Thoughts?

Cheers,

Jim

On 3/15/2017 2:31 AM, Chad Phillips wrote:
I'd like to contribute a feature for the geomesa kafka datastore libraries that exposes the auto offset reset configuration in the Kafka consumer, when using the live feature source.  To preserve the existing default behaviour, this would be set to "largest" but, a user could also set it to "smallest" in order to start reading from the beginning of the topic, re-populating the cache with any existing features upon initialization of the feature source.  This is useful in situations where a user is expecting low volume data or highly volatile data where the data in the Kafka topic backing the feature source is only being kept for a short period of time.  In both of those situations, having the cache repopulate upon initialization (and continuously receive updates as they come in after that) makes it easy for a user to continue consuming data after restarting their system, or restarting GeoServer (for example), without having to use the replay feature source in addition to the live feature source.


_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-dev

_______________________________________________ geomesa-dev mailing list geomesa-dev@xxxxxxxxxxxxxxxx To change your delivery options, retrieve your password, or unsubscribe from this list, visit https://dev.locationtech.org/mailman/listinfo/geomesa-dev
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-dev


Back to the top