[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [geomesa-users] Question about min and max times in indexing
|
Emilio:
There are three feature types defined.
ActorRecordset~attributes : []
objectKey:String,entityName:String,entitySource:String,entityTitle:String,recordKey:String:cardinality=high:index=full,Name:String:cardinality=high:index=full,Type:String:cardinality=high:index=full,NameMetaphone:String:cardinality=high:index=full,Country:String:cardinality=high:index=full,AffiliationTo:String:cardinality=high:index=full,AffiliationStart:Date:cardinality=high:index=full,AffiliationEnd:Date:cardinality=high:index=full,Aliases:String:cardinality=high:index=full,GeoCountryCode:String:cardinality=high:index=full,*GeoLocation:Point;geomesa.index.dtg='AffiliationStart',geomesa.table.sharing='true',geomesa.indices='z3:4:3,z2:3:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0002'
ActorRecordset~id : [] \x02
ActorRecordset~stats-date : [] 2017-04-21T20:17:01.572Z
ActorRecordset~table.attr.v4 : [] CoalesceSearch_attr_v4
ActorRecordset~table.records.v2 : [] CoalesceSearch_records_v2
ActorRecordset~table.z2.v3 : [] CoalesceSearch_z2_v3
ActorRecordset~table.z3.v4 : [] CoalesceSearch_z3_v4
ICEWSArtifactRecordset~attributes : []
objectKey:String,entityName:String,entitySource:String,entityTitle:String,recordKey:String:cardinality=high:index=full,SourceFileName:String:cardinality=high:index=full,RawText:String:cardinality=high:index=full,Md5Sum:String:cardinality=high:index=full,DateIngested:Date:cardinality=high:index=full,ArtifactDate:Date:cardinality=high:index=full,*theWorld:Polygon;geomesa.index.dtg='DateIngested',geomesa.table.sharing='true',geomesa.indices='xz3:1:3,xz2:1:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0003'
ICEWSArtifactRecordset~id : [] \x03
ICEWSArtifactRecordset~stats-date : [] 2017-04-21T20:20:58.054Z
ICEWSArtifactRecordset~table.attr.v4 : [] CoalesceSearch_attr_v4
ICEWSArtifactRecordset~table.records.v2 : [] CoalesceSearch_records_v2
ICEWSArtifactRecordset~table.xz2.v1 : [] CoalesceSearch_xz2
ICEWSArtifactRecordset~table.xz3.v1 : [] CoalesceSearch_xz3
Linkages~attributes : []
objectKey:String:cardinality=high:index=full,entity1Key:String,entity1Name:String,entity1Source:String,entity1Version:String,entity1Key:String:cardinality=high:index=full,entity1Name:String,entity1Source:String,entity1Version:String,lastModified:Date:cardinality=high:index=full,label:String:cardinality=low:index=full,linkType:String:cardinality=low:index=full,*theWorld:Polygon;geomesa.index.dtg='lastModified',geomesa.table.sharing='true',geomesa.indices='xz3:1:3,xz2:1:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0001'
Linkages~id : [] \x01
Linkages~stats-date : [] 2017-04-21T20:16:02.269Z
Linkages~table.attr.v4 : [] CoalesceSearch_attr_v4
Linkages~table.records.v2 : [] CoalesceSearch_records_v2
Linkages~table.xz2.v1 : [] CoalesceSearch_xz2
Linkages~table.xz3.v1 : [] CoalesceSearch_xz3
On 4/21/17 4:28 PM, Emilio Lahr-Vivaz wrote:
We will always set a default date field for indexing, so that is why
you see the date validation message. However, it seems like
you are setting the hints correctly. It is odd though, because there
shouldn't ever be a situation where we create both the XZ3 and Z3
index for a single feature type. Do you have other feature types in
the same catalog table? Can you scan the catalog table and reply with
the result of the 'attributes' row?
Thanks,
Emilio
On 04/21/2017 04:20 PM, David Boyd wrote:
Emilio:
Some more information. I am getting this message:
2017-04-21 16:17:01,484 | WARN | [main] |
(GeoMesaSchemaValidator.scala:90) - geomesa.index.dtg is not valid
or defined for simple feature type SimpleFeatureTypeImpl
http://www.opengis.net/gml:ActorRecordset identified extends
Feature(objectKey:objectKey,entityName:entityName,entitySource:entitySource,entityTitle:entityTitle,recordKey:recordKey,Name:Name,Type:Type,NameMetaphone:NameMetaphone,Country:Country,AffiliationTo:AffiliationTo,AffiliationStart:AffiliationStart,AffiliationEnd:AffiliationEnd,Aliases:Aliases,GeoCountryCode:GeoCountryCode,GeoLocation:GeoLocation).
However, the following attribute(s) can be used in GeoMesa's
temporal index: AffiliationStart, AffiliationEnd. GeoMesa will now
point geomesa.index.dtg to the first temporal attribute found:
AffiliationStart
Now when I create my schema's. Despite specifically disabling those
indexes and not specifying a time field for geomesa.index.dtg.
I have also tried adding:
feature.getUserData().put("geomesa.index.dtg",null);
To my code. Same result.
On 4/21/17 4:04 PM, David Boyd wrote:
Emilio:
Thanks for the detailed explanation.
I am trying to disable the Z3 index. I have added the following to
my code:
final String indexes = "z2,records,id,attr";
SimpleFeatureType feature = tb.buildFeatureType();
// index recordkey, cardinality is high because there is
only one record per key.
feature.getDescriptor(ENTITY_RECORD_KEY_COLUMN_NAME).getUserData().put("index",
"full");
feature.getDescriptor(ENTITY_RECORD_KEY_COLUMN_NAME).getUserData().put("cardinality",
"high");
feature.getUserData().put("geomeas.indexes.enabled",indexes);
I then create other attribute indexes the call createSchema with the
feature.
I am still getting the exception:
java.lang.IllegalArgumentException: requirement failed: Value out
of bounds ([0.0 604800.0]): -432000.0
at scala.Predef$.require(Predef.scala:224)
at
org.locationtech.geomesa.curve.NormalizedDimension$class.normalize(NormalizedDimension.scala:17)
at
org.locationtech.geomesa.curve.NormalizedTime.normalize(NormalizedDimension.scala:33)
When I look at my accumulo tables I still have:
CoalesceSearch_xz3
CoalesceSearch_z3_v4
I dropped all my tables before this was run.
What am I missing?
On 4/21/17 10:02 AM, Emilio Lahr-Vivaz wrote:
Yeah, that error is a bit obtuse but it's coming from converting
the date into an index value. I believe that currently if a feature
fails to validate for any index, it will not be stored at all. This
is to prevent partial indexing, where your query results might
differ based on which index it uses. Previously we allowed partial
indexing, and I think at this point we'd like to support both based
on a configuration property, but haven't implemented it yet.
We haven't really had any use-cases so far for storing data that
old, so we don't currently support it. However, there are a couple
things you could do (off the top of my head):
* Add another date field for indexing, or disable the z3 index. If
the date isn't part of the primary z index, then it won't cause any
problems. You can still filter on it as normal, it just won't use
the date in the primary range planning so queries will be slower.
To alleviate that, you could add an attribute index on the date
field - that does not have the same restrictions on date range, but
it is not a composite index so query planning will use either date
*or* geometry but not both.
* Offset dates by some fixed amount to bring them into an indexable
range, and add some logic in your client to transform queries and
results. This may be fairly complicated...
From a technical perspective I don't think there is any reason we
couldn't store dates before the epoch, it just hasn't been
implemented.
Thanks,
Emilio
On 04/20/2017 10:13 PM, David Boyd wrote:
Emilio:
Thanks. I puzzled it out in the end.
How would one date index historical data? The data I have has
numerous dates before the Epoch. The exception I am
getting below. Does this mean my feature did not get stored, or
just the date was not indexed? If the latter, how would
this data behave on a query including the date?
2017-04-20 17:11:12,306 | WARN | [Thread-7] |
(ICEWS_EntityExtractor.java:240) - StartDateString: 1968-01-01
StartDate: 1968-01-01T00:00:00.000-05:00 EndDateString:
1996-08-31 EndDate: 1996-08-31T00:00:00.000-04:00
2017-04-20 17:11:12,306 | INFO | [Thread-7] |
(ICEWS_EntityExtractor.java:300) - Persisting 2 ICEWS records.
2017-04-20 17:11:12,556 | ERROR | [Thread-7] |
(AccumuloPersistor.java:1073) - requirement failed: Value out of
bounds ([0.0 604800.0]): -241200.0
java.lang.IllegalArgumentException: requirement failed: Value out
of bounds ([0.0 604800.0]): -241200.0
at scala.Predef$.require(Predef.scala:224)
at
org.locationtech.geomesa.curve.NormalizedDimension$class.normalize(NormalizedDimension.scala:17)
On 4/20/17 6:07 PM, Emilio Lahr-Vivaz wrote:
Hi David,
I don't believe that this is in our documentation, but it's
commented in our source code. The min date will always be the
unix epoch, and the max date depends on the indexing interval of
your z-curve (the default interval is week):
https://github.com/locationtech/geomesa/blob/master/geomesa-z3/src/main/scala/org/locationtech/geomesa/curve/BinnedTime.scala#L15-L39
Thanks,
Emilio
On 04/20/2017 04:45 PM, David Boyd wrote:
All:
Haven't found this in the documents yet so I thought I would
ask.
I have a two fields in my data representing a startTime and an
endTime.
Values for those string fields are normally dates but can also
be "beginning of time" and
"end of time" respectively.
I originally I tried setting beginning of time to be 01/01/1111
but I would get an
index out of range error (I assume it is because this was before
the standard Unix epoc).
That error was down in the XZ3 index creation.
I then tried using new DateTime(Long.MIN) and new
DateTime(Long.MAX) but the max
now throws errors in Joda.Time.
So what are the min and max Times supported by Geomesa in the
indexes?
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
--
========= mailto:dboyd@xxxxxxxxxxxxxxxxx ============
David W. Boyd
VP, Data Solutions
10432 Balls Ford, Suite 240
Manassas, VA 20109
office: +1-703-552-2862
cell: +1-703-402-7908
============== http://www.incadencecorp.com/ ============
ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
Chair ANSI/INCITS TC Big Data
Co-chair NIST Big Data Public Working Group Reference Architecture
First Robotic Mentor - FRC, FTC - www.iliterobotics.org
Board Member- USSTEM Foundation - www.usstem.org
The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited. If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.