Hi all,
If we ingest, say, the same line of text data twice (by
mistake) in Geomesa 1.2.1 we end up with duplicate data in our
Accumulo (1.7.2) database. We are ingesting using
Gemesa-generated featureIDs (setting our
featureBuilder.setFeatureID to NULL without the use of Hints).
A colleague asked me, why are duplicates generated in this
case? I realized I did not know.
1. How, exactly, in our
configuration of geomesa + Accumulo, is a geomesa row, or record made unique? I know the
importance of Accumulo logical
rows, but in this case of identical data we would want to
insure insertation of only
one geomesa record, namely, one instance of our geomesa
SimpleFeature.
1a. Are duplicate geomesa rows added because the time at
insertion differs? or because different featureIDs are
randomly generated on each insertion?
Potentially related questions:
2. How are featureIDs generated by geomesa? I thought
randomly, but I read a comment somewhere suggesting that
FeatureIDs were created out of an md5 hash of all the values
in the feature. But a colleague points out that even if this
is so, a featureID does not resemble an md5 hash, so must be
composed at least partially by other means
3. A potentially related question: can we create a z3 index
by using a data-derived timestamp--not the insertion
timestamp-- as the time dimension?
All comments and perspectives are appreciated and welcome!
Ben Weaver
This email (and any attachments) may contain confidential
information and is intended solely for the recipient(s) to whom
the email is addressed. If you received this email in error,
please inform us immediately and delete the email and all
attachments without further using, copying or disclosing the
information. This email and any attachments are believed to be,
but cannot be guaranteed to be, secure or virus-free. Satellite
Applications Catapult Limited is registered in England &
Wales. Company Number: 7964746. Registered office: Electron
Building, Fermi Avenue, Harwell Oxford, Didcot, Oxfordshire OX11
0QR.