One other thing: if you enable 'trace' level logging for our query
planner class, we will print out the explain info for every query
that gets run.
For log4j, it would be:
log4j.category.org.locationtech.geomesa.accumulo.index.QueryPlanner=TRACE
Thanks,
Emilio
On 09/03/2015 02:10 PM, Emilio
Lahr-Vivaz wrote:
Hi Ben,
Yes, the interop package is a java class that makes it easier to
call our underlying scala classes from java - but the effect is
the same.
We do have a method that will show you the ranges for a given
query. Construct your query object as usual, and then call:
org.locationtech.geomesa.accumulo.data.AccumuloDataStore#explainQuery
By default, this method will print various info to the console,
including the number of ranges.
If you want a more detailed look, you can call:
org.locationtech.geomesa.accumulo.data.AccumuloDataStore#getQueryPlan
Which will return the full objects that we use to create the
accumulo scan.
Thanks,
Emilio
On 09/03/2015 02:00 PM, Ben Southall wrote:
Hello,
Given that I've seen timeout/out-of-memory issues too, I just
wanted to check a couple of things:
Is org.locationtech.geomesa.utils.interop.SimpleFeatureTypes
equivalent to
org.locationtech.geomesa.utils.geotools.SimpleFeatureTypes in
1.1.0 rc.2?
Is there a method that will return the underlying accumulo
ranges for a given query, or is a breakpoint my best bet?
Thanks!
Ben
-----Original Message-----
From: geomesa-users-bounces@xxxxxxxxxxxxxxxx
[mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx] On Behalf Of
Emilio Lahr-Vivaz
Sent: Wednesday, September 02, 2015 7:07 PM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] Date Indexing, Stucked queries
Hi Marcel,
The problem with your date not being indexed is because you are
using the DataUtilities class to create your simple feature
type. The index hints are a geomesa-specific feature, so to
trigger them you have to use the following method instead:
org.locationtech.geomesa.utils.interop.SimpleFeatureTypes#createType
I think the issue with your memory is that your query is fairly
large (5+ years), which means that we end up creating a lot of
ranges for accumulo to scan. I used a small polygon 4 by 5
degrees square with that date range, and the query resulted in
660261 ranges. To alleviate the problem, you may want to split
your query up into smaller chunks (maybe
6 months at a time).
I've created a ticket here to track the issue:
https://geomesa.atlassian.net/browse/GEOMESA-905
Thanks,
Emilio
On 09/02/2015 02:20 PM, Marcel wrote:
This is how I build my simple feature
type (slightly adapted version
from geomesa-gdelt project).
private static SimpleFeatureType buildGDELTFeatureType(String
featureName) throws SchemaException {
String spec = Joiner.on(",").join(attributes);
SimpleFeatureType featureType =
DataUtilities.createType(featureName, spec);
// This tells GeoMesa to use this Attribute as the
Start Time
index
featureType.getUserData().put(Constants.SF_PROPERTY_START_TIME,
"SQLDATE");
return featureType;
}
/**
* list of gdelt attributes with their datatypes. *geom
indicates
that this attribute will be the default geometry.
*/
private static List<String> attributes =
Lists.newArrayList("GLOBALEVENTID:Integer",
"SQLDATE:Date:index=full",
"MonthYear:Integer",
"Year:Integer", "FractionDate:Float",
"Actor1Code:String",
"Actor1Name:String", "Actor1CountryCode:String",
"Actor1KnownGroupCode:String",
"Actor1EthnicCode:String",
"Actor1Religion1Code:String",
"Actor1Religion2Code:String",
"Actor1Type1Code:String",
"Actor1Type2Code:String", "Actor1Type3Code:String",
"Actor2Code:String", "Actor2Name:String",
"Actor2CountryCode:String", "Actor2KnownGroupCode:String",
"Actor2EthnicCode:String",
"Actor2Religion1Code:String",
"Actor2Religion2Code:String",
"Actor2Type1Code:String",
"Actor2Type2Code:String",
"Actor2Type3Code:String", "IsRootEvent:Integer",
"EventCode:String", "EventBaseCode:String",
"EventRootCode:String", "QuadClass:Integer",
"GoldsteinScale:Float", "NumMentions:Integer",
"NumSources:Integer", "NumArticles:Integer", "AvgTone:Float",
"Actor1Geo_Type:Integer",
"Actor1Geo_FullName:String",
"Actor1Geo_CountryCode:String",
"Actor1Geo_ADM1Code:String",
"Actor1Geo_Lat:Float",
"Actor1Geo_Long:Float", "Actor1Geo_FeatureID:String",
"Actor2Geo_Type:Integer",
"Actor2Geo_FullName:String",
"Actor2Geo_CountryCode:String",
"Actor2Geo_ADM1Code:String",
"Actor2Geo_Lat:Float",
"Actor2Geo_Long:Float", "Actor2Geo_FeatureID:String",
"ActionGeo_Type:Integer",
"ActionGeo_FullName:String",
"ActionGeo_CountryCode:String",
"ActionGeo_ADM1Code:String",
"ActionGeo_Lat:Float",
"ActionGeo_Long:Float", "ActionGeo_FeatureID:String",
"DATEADDED:Integer", "SourceUrl:String",
"*geom:Point:srid=4326");
I´m using geomesa 1.1.0-rc.4. Yes I dropped all of my
geomesa-table
before reingesting them.
These stucked queries and heapspace errors only occurs when
executing
geotemporal queries like this one. I ingested a 1 GiB
gdelt-testfile.
/**
* find all events in ukraine since 2010 (until
2015-06-30) in
connection
* with protests (eventrootcode = 14).
*/
private static SimpleFeatureIterator
getResultsForQuery13(Map<String, String> dsConf) {
SimpleFeatureSource featureSource =
SimpleFeatureSourceFactory.getSimpleFeatureSource(dsConf);
FilterFactory2 ff =
CommonFactoryFinder.getFilterFactory2();
DateFormat df = new SimpleDateFormat("yyyy-MM-dd");
Date start = null;
Date end = null;
try {
start = df.parse("2010-01-01");
end = df.parse("2015-06-30");
} catch (java.text.ParseException e) {
e.printStackTrace();
}
Filter timeFilter =
ff.between(ff.property(GDELTConstants.DATE),
ff.literal(start),
ff.literal(end));
// bound query spatially to ukraine
Filter spatialFilter = null;
try {
spatialFilter = ECQL.toFilter(
"Contains(Polygon((34.01626 44.00715, ...
,34.01626 44.00715)), " + GDELTConstants.GEOM + ")");
} catch (CQLException e) {
e.printStackTrace();
}
// Now we can combine our time filter and our spatial
filter
using a
// boolean and operator
Filter timeSpatialFilter = ff.and(timeFilter,
spatialFilter);
Filter attributeFilter =
ff.like(ff.property(GDELTConstants.EVENT_ROOT_CODE), "14");
Filter completeFilter = ff.and(timeSpatialFilter,
attributeFilter);
Query query = new
Query(dsConf.get(AccumuloDataStoreConfiguration.FEATURE_NAME),
completeFilter,
new String[] { GDELTConstants.GLOBAL_EVENTID,
GDELTConstants.DATE });
SimpleFeatureCollection sfCollection = null;
try {
sfCollection = featureSource.getFeatures(query);
} catch (IOException e) {
e.printStackTrace();
}
return sfCollection.features();
}
Thanks,
Marcel Jacob.
Am 01.09.2015 21:34, schrieb Emilio Lahr-Vivaz:
Hi Marcel,
Could you provide your full simple feature type string? I'll
try to
reproduce the error you're seeing with the full table scan.
Also,
what version of geomesa are you currently using? Did you
re-ingest
your data using the new version? If not, what was the old
version
that you ingested the data with?
With regards to the queries not finishing - we try to
optimize
queries so that they only scan records that are likely to
match.
However, depending on the query, we can't always do that. If
you're
seeing the 'full table scan' warning, then the query won't
completely
return until it has scanned your entire dataset, even if
none of the
features actually match. In all cases, the scan should
eventually
return, but if you're getting memory errors you might need
to bump up
some settings somewhere. If java gets low on memory and
starts
swapping to disk, it can slow things to a crawl. Where are
you seeing
the heapspace errors?
Thanks,
Emilio
On 09/01/2015 11:58 AM, Marcel wrote:
Hello,
after some weeks of abstinence I continued working with
Geomesa.
First of all I updated to the new geomesa version and some
of my
problems got solved.
Unfortunately others were not. My data imported
successfully on the
cluster, but it seems that my Date attribute was not
indexed. I used
"SQLDATE:Date:index=full" for this attribute. But when
executing a
query using a temporal filter the logger says: "Running
full table
scan for schema event with filter SQLDATE AFTER
1991-04-28T22:00:00+00:00". Is this the correct way to
define that
my attribute should be indexed?
Another problem seems to appear when there are 0 results
for my
query. These queries often dont finish. Sometimes even a
HeapSpace
error occurs. Maybe this stays in connection with my
missing
indexing date attribute when scanning over all records.
Best regards,
Marcel Jacob.
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password,
or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
http://www.locationtech.org/mailman/listinfo/geomesa-users
|