Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] What are best practices for rollback and possible update?

Hi Ben,

Just to follow-up on the idea of helping out...  (for you or others out there!)  Check out our Contributing guide here: https://github.com/locationtech/geomesa/blob/master/CONTRIBUTING.md.  If you have questions about the Eclipse Contributor Agreement, I can help with that.

Notably, we use a JIRA to manage tickets / work here: https://geomesa.atlassian.net/.  Let me know if you run into trouble creating an account.

Cheers,

Jim

On 12/05/2016 05:10 AM, Benjamin Weaver wrote:

Thanks, Anthony, et al., for your very useful info.


You mentioned using FeatureWriter to delete records. We cannot yet upgrade our geomesa past 1.2.1.


(1) If we delete records using FeatureWriter, will the deletions update the indexes properly?



ps Thanks for your mention of possible opportunities to help with geomesa  tools. I would like to help sometime if I can get out from under loads of work(!)



From: Benjamin Weaver
Sent: 25 October 2016 23:40
To: Geomesa User discussions
Subject: Re: [geomesa-users] What are best practices for rollback and possible update?
 

Thanks, Anthony, for this useful information.


For instance, a failed ingest might be caused by container failure on the mapreduce, with duplicate rows resulting. Or perhaps a certain set of data was loaded with improper (for us), non-UTC timestamp fields, a risk in Geomesa 1.2.1, which has not been upgraded to use Joda DateTime. We would want to roll back and re-ingest this lot with the duplicates removed or the timestamp fixed.


Thanks for your suggestion: How do we flush data from staging to source table?


A basic question for me concerns the case of the 5-table Geomesa table suite,** how does one delete by range, clone, flush, or export? With Accumulo or geomesa cmd-line commands? When deleting ranges, for example, which GeoMesa table(s) would one delete from?  _records? But then how are the index tables and metadata table updated?


**Geomesa table suite:  GeoMesa.tablename,

GeoMesa.tablename_records,

GeoMesa.tablename_st_idx,

GeoMesa.tablename_attr_idx (we use this)

GeoMesa.tablename_<featureName>_z3


It would be great simply to be able to rollback by deleting lots of rows through Accumulo row ranges. Is this possible on the GeoMesa table suite?


Thanks loads for your help,


Ben


From: geomesa-users-bounces@xxxxxxxxxxxxxxxx <geomesa-users-bounces@xxxxxxxxxxxxxxxx> on behalf of Anthony Fox <anthony.fox@xxxxxxxx>
Sent: 25 October 2016 14:11
To: Geomesa User discussions
Subject: Re: [geomesa-users] What are best practices for rollback and possible update?
 

Ben,

Interesting question.  Can you give more info on what a failed ingest
looks like?

We've thought a lot about a write-ahead log based approach to ingest for
performance reasons but I think it could apply in your scenario as well.
It's basically the idea of the staging table.  Load data into the
staging table and flush it to the main table.  Depending on how
up-to-date you need for your queries, you'll need to hit both the main
table and the staging table.

There may be some clever things you can do with Accumulo's facility for
cloning tables to support this.

http://accumulo.apache.org/1.7/accumulo_user_manual#_cloning_tables

In terms of update, you can use a regular FeatureWriter to update
records.  It is much slower than the appending FeatureWriter because it
needs to check the original record and delete any index entries.

Thanks,
Anthony


Benjamin Weaver <Benjamin.Weaver@xxxxxxxxxxxxxxxxxx> writes:

> Hi all,
>
>
> We are using Geomesa 1.2.1 on Accumulo 1.7.2. We are seeking to implement a rollback procedure to use during Ingest. This is our first priority; a second would be the ability to update data already in the database. We lack the space to maintain a backup copy of our large tables. We have one large Accumulo table containing 1 geomesa SimpleFeature.
>
>
> Load of a staging table, followed by geomesa cmd-line export of this table, and merge of this export into our main table, shows some promise as a hedge during ingest. Another approach would seem represented by the Accumulo command line, which enables row deletions but I did not know whether Accumulo shell could gracefully handle the Geomesa table suite.
>
>
> But what is the best way to rollback on mid-way failure of ingest? Are examples available? I saw some Scala classes that include calls to trans.rollback(), etc. Are these or other classes to be used for batch, or bulk or total rollback? Or are there other rollback tools and techniques I have overlooked?
>
>
> Any perspectives are appreciated and welcome!
>
>
> Ben Weaver
>
> This email (and any attachments) may contain confidential information and is intended solely for the recipient(s) to whom the email is addressed. If you received this email in error, please inform us immediately and delete the email and all attachments without further using, copying or disclosing the information. This email and any attachments are believed to be, but cannot be guaranteed to be, secure or virus-free. Satellite Applications Catapult Limited is registered in England & Wales. Company Number: 7964746. Registered office: Electron Building, Fermi Avenue, Harwell Oxford, Didcot, Oxfordshire OX11 0QR.
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users
This email (and any attachments) may contain confidential information and is intended solely for the recipient(s) to whom the email is addressed. If you received this email in error, please inform us immediately and delete the email and all attachments without further using, copying or disclosing the information. This email and any attachments are believed to be, but cannot be guaranteed to be, secure or virus-free. Satellite Applications Catapult Limited is registered in England & Wales. Company Number: 7964746. Registered office: Electron Building, Fermi Avenue, Harwell Oxford, Didcot, Oxfordshire OX11 0QR.

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users



Back to the top