Emilio,
Anthony, Jim,
Thanks
for your suggestions. With regards to overwriting a GeoMesa
with the same ID:
With
my use case I would be conceptually using the same feature
ID, location, and time; however, the location portion has
double-precision data in it and thus may not be exactly
equivalent to a previously indexed entry. Because of the
double-precision data within my default geometry attribute,
I think it is best not to assume equality between my
previously indexed {id, location, time} entry, and my new
{id, location, time} entry.
Jim
suggested that doing a pure lookup by feature ID is slow, so
to get around this slowness (since I also know the geometry
and time), if I add a geospatial-temporal component to the
Filter, will that ease the slowness concerns?
Below
is some Java code for modification. The code seems to
functionally work, but:
A)
Does combining the geospatialtemporalfilter with the
idfilter offer speed improvements over a simple idfilter for
larger data sets?
B)
Is combining the geospatialtemporalfilter with the idfilter
safe? According to
http://docs.geotools.org/stable/userguide/library/opengis/filter.html,
“Formally this style of Id matching is not supposed to be
mixed with the traditional attribute based evaluation (such
as a bounding box filter).” This statement gives me cause
for concern because I am mixing the id matching with a
bounding box and time filter.
public
boolean modifyFeatureByID(String featureName, String
featureID, List<String> attributes, List<Object>
values,
String timeAttribute, Date startTime, Date endTime, String
geometryAttribute, double startLat, double startLon, double
endLat, double endLon)
{
boolean wasSuccessful = true;
// queryString is of the form: "( ( NOT (myTime AFTER
2014-06-20T15:23:03.952Z)) AND ( NOT (myTime BEFORE
2012-06-20T15:23:03.952Z)) ) AND
BBOX(myGeometry,123.2,40.1,123.6,49.9)"
String queryString =
constructBoundedBoxAndTimeFrameQueryString(timeAttribute,
startTime, endTime,
geometryAttribute, startLat, startLon, endLat, endLon);
Filter geospatialtemporalfilter =
queryStringToFilter(queryString);
logger.debug("Modifying feature with id: '" + featureID + "'
from the '" + featureName + "' feature store");
if (attributes == null && values != null)
{
throw new IllegalArgumentException("if attributes list is
null, values list must also be null");
}
if (attributes != null && values == null)
{
throw new IllegalArgumentException("if values list is null,
attributes list must also be null");
}
if (attributes != null && values != null)
{
int numAttributes = attributes.size();
if (numAttributes != values.size())
{
throw new IllegalArgumentException("attributes and values
lists must be the same size");
}
if (numAttributes > 0)
{
Name[] attributeNames = new NameImpl[numAttributes];
Object[] attributeValues = new Object[numAttributes];
for (int k=0; k<numAttributes; k++)
{
attributeNames[k] = new NameImpl(attributes.get(k));
attributeValues[k] = values.get(k);
}
DataStore dataStore = createDataStore();
FeatureStore<SimpleFeatureType, SimpleFeature>
featureStore = createFeatureStore(dataStore, featureName);
FilterFactory2 ff = CommonFactoryFinder.getFilterFactory2();
Filter idfilter =
ff.id(Collections.singleton(ff.featureId(featureID)));
Filter filter =
ff.and(Arrays.asList(geospatialtemporalfilter, idfilter));
try
{
featureStore.modifyFeatures(attributeNames, attributeValues,
filter);
logger.debug("Feature with id: '" + featureID + "' has been
successfully modified within the '" + featureName + "'
feature store");
}
catch (Exception e)
{
wasSuccessful = false;
logger.error("Problem modifying feature with id: '" +
featureID + "' within the '" + featureName + "' feature
store", e);
}
dataStore.dispose();
}
else
{
logger.debug("modifyFeatureByID() invoked with empty
attributes/values inputs, no features were modified for
feature with id: " + featureID);
}
}
else
{
logger.debug("modifyFeatureByID() invoked with null
attributes/values inputs, no features were modified for
feature with id: " + featureID);
}
return wasSuccessful;
}
Hi all,
A couple caveats:
In GeoMesa, the row key contains the temporal and spatial data
for the feature. So if you try to overwrite a feature by
keeping the same feature ID, but you change the location
and/or time, it will create a new row and not overwrite the
existing entry.
If you do overwrite the exact same row, Accumulo by default
will apply a VersioningIterator so that you only see the
latest entry. You can change this behavior if you want:
http://accumulo.apache.org/1.5/accumulo_user_manual.html#_versioning_iterators_and_timestamps
I would encourage people to check out the GeoMesa quick start
tutorial:
http://geomesa.github.io/2014/05/28/geomesa-quickstart/
It lets you write and read just a few features at a time
through the API. You can test out your exact scenario and
determine what works for you.
Thanks,
- Emilio
On 06/20/2014 08:13 AM, Anthony Fox
wrote:
If you
overwrite records with the same id, then you'll observe
multiple records until Accumulo performs a table
compaction or you set a versioning iterator. Since this
sounds like a valid use case, we can by default set a
versioning iterator so you only see the most recent
version of your record. Table compactions will
periodically remove stale data.
-Anthony
On Thu, Jun 19, 2014 at 6:27 PM, Jim
Hughes <jnh5y@xxxxxxxx>
wrote:
Hi Beau,
Initially, I wanted to say that #1 is the intended
behavior. I wanted to check things out before
responding, and unfortunately, it has taken me a
bit longer than I expected.
The first thing to point out is how GeoTools
handles feature IDs. Since several folks could be
writing to a FeatureStore, you have to state your
feature ids should be used for a given feature
being written to the database. For example...
feature.getUserData.put(Hints.USE_PROVIDED_FID,
true)
(I might have the exact syntax off since I'm
bouncing between Scala and Java.) Without that
hint, GeoMesa will pick a random UUID for the
feature id.
Just to make sure that #1 isn't (currently)
possible, I tried writing two distinct with the
same id, and I ended up with two records with the
same FID.
As for the other two approaches, in the current
implementation, looking up features by ID involves
a table scan and hence generally is a bad idea.
We do have some work in progress which will make
such queries faster/sane. The last note on along
these lines is to point out that to support this
fully, we'll likely need to implement/override the
DataStore's function called getFeatureWriter.
I mention that because this is the GeoTools way of
doing #2. At the minute, we are using an abstract
implementation of this, and it should work
correctly. The filtering is done entirely on the
client side, so it'll be slow. If your data is
small (say, a few thousand records), this sort of
thing might be tenable.
I hope that helps clarify the matter; let me know
if you have other questions.
Jim
On 06/18/2014 04:37 PM, Beau Lalonde wrote:
Jim,
Others,
If
we do use the same ID, can we count on the
previous value getting
“overwritten”/replaced?
In
other words, if I actually intend to
overwrite/replace a feature with a
specific ID (if it exists, otherwise
create a new feature), which of the
following is the best option:
1.
Act
as if I am adding the feature, counting on
any existing feature with the same ID to
be overwritten/replaced
2.
Query
GeoMesa for the existence of a feature
with the specific ID, modify feature if it
exists, add feature if it doesn’t exist
3.
Blindly
attempt to remove the feature with the
specific ID, add a new feature with the
same ID
Any
suggestions for a recommended approach
would be helpful.
Thanks,
Beau
Hi
Adnan,
Great question! Geomesa uses the feature
id as a unique identifier. It sounds like
you might using the id field to
identify/name a thing which is
moving/changing shape/varying attributes
through time. If that's the use case, I'd
suggest putting that information which
identifies the object into a different
field like 'name' or 'identifier'.
As for documentation, I'd suggest checking
out
http://geomesa.github.io/ and
looking through the tutorials. We've
integrated with GeoTools, so I'd also
point to their documentation about
DataStores/FeatureStores (http://docs.geotools.org/latest/userguide/library/api/datastore.html
and
http://docs.geotools.org/stable/userguide/library/data/featuresource.html).*
Let us know what others questions we can
help with,
Jim
* In particular, I believe that you would
see the same behavior with (most) other
GeoTools FeatureStores.
On 06/18/2014 06:14 AM, Adnan Yaqoob
wrote:
Hello
Everybody,
I
am new to Geomesa and trying its API.
I have a question, how can I store
features with same id and geometry
with different time stamp and
attributes values. I tried to write
feature with same id with different
attributes and it was overwriting
previous feature. I am stuck on this
point, please help me understand.
Is
there any documentation for Geomesa
API?
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users