[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [geomesa-users] Strange Query Performance Issue
|
Have you tried using the Accumulo tracing functionality? That is
probably the best way to tell what's going on.
Idly speculating, possibly the increase in tablet servers (and
presumably tablets?) is causing the location lookup to be slow - in a
join query, each result coming back from the primary table will result
in an additional single row scan. Possibly locating the tablet for each
row is taking longer.
One geomesa knob that you can play around with is
'accumulo.query.record-threads', which controls the parallelism for the
join scan.
You could also possibly bypass the problem by using a 'full' index
instead of a join index - that should be much faster to query in general.
If you figure anything out, let us know!
Thanks,
Emilio
On 03/01/2018 12:26 PM, David Boyd wrote:
All:
I am running geomesa 1.3.4 on two different clusters with Accumulo
(1.7.2).
One cluster has 5 nodes, the other 25 nodes. Both are on similar ec2
instance types
on AWS and the baseline software on both is the same.
The problem is that the 25 node cluster is significantly slower on
this particular
query. On the 5 node cluster the query generally takes less than a
second. On the 25
node cluster the query can take over 6 seconds.
We are ingesting data into both clusters based on GDELT. They have
similar amounts of data.
One of my featuresets contains information about agents in GDELT and
is called agentrecordset.
As part of the ingest we do a de-duplication of these agents based on
4 fields providing a unique
key. To do this we query as follows to see if a record exists:
agentname = 'FRANCE' AND agentcode = 'FRAGOV' AND agentcountrycode =
'FRA' AND agentgeofullname = 'France'
The most unique field is agentgeofullname and there is a join
attribute index on that field.
The explain plan for the query on both clusters reads the same, and
the schema description for the clusters is the same see
below. The information in the _queries table is also the same
except the times take much longer.
I am stumped any thoughts.
Schema:
[root@node26 geomesa]# bin/geomesa describe-schema -u oe_user -p
ANArmy0f1Trains12 -i oerepo -c CoalesceSearch -f agentrecordset
INFO Describing attributes of feature 'agentrecordset'
agentgeocountrycode | String
agentcountrycode | String
agentgeoadm1code | String
pmesiiptsocial | Float
source | String
title | String
pmesiiptpolitical | Float
pmesiipttime | Float
agentcode | String
ontologyreference | String
agentgeoadm2code | String
agenttype2code | String
agentgeofeatureid | String
agentreligion1code | String
agentgeofullname | String (Attribute indexed - join)
pmesiiptmilitary | Float
pmesiiptphysicalenvironment | Float
agenttype3code | String
datecreated | Date (Spatio-temporally indexed)
issimulation | Boolean
pmesiiptinfrastructure | Float
agentknowngroupcode | String
pmesiiptinformation | Float
version | String
agentgeotype | Integer
tags | String
agentreligion2code | String
agentgeolocation | Point (Spatially indexed)
objectkey | String (Attribute indexed)
lastmodified | Date
datasource | String
pmesiipteconomic | Float
agentname | String
agenttype1code | String
name | String
agentethniccode | String
namemetaphone | String
status | Integer
User data:
geomesa.index.dtg | datecreated
geomesa.indices | z3:4:3,z2:3:3,records:2:3,attr:5:3
geomesa.table.sharing | true
geomesa.table.sharing.prefix |
Explain:
bin/geomesa explain -u oe_user -f agentrecordset -c CoalesceSearch -p
ANArmy0f1Trains12 -q "agentname = 'NATIONAL BANK' AND agentcode =
'GOVBUS' AND (agentcountrycode = '' OR agentcountrycode IS NULL) AND
agentgeofullname = 'United States'" -i oerepo
Planning 'agentrecordset' ((agentname = 'NATIONAL BANK' AND agentcode
= 'GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL))
AND agentgeofullname = 'United States'
Original filter: ((agentname = 'NATIONAL BANK' AND agentcode =
'GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL))
AND agentgeofullname = 'United States'
Hints: bin[false] arrow[false] density[false] stats[false]
map-aggregate[false] sampling[none]
Sort: none
Transforms: None
Strategy selection:
Query processing took 158ms and produced 1 options
Filter plan: FilterPlan[AttributeIndex[agentgeofullname = 'United
States'][(agentcountrycode = '' OR agentcountrycode IS NULL) AND
agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']]
Strategy selection took 3ms for 1 options
Strategy 1 of 1: AttributeIndex
Strategy filter: AttributeIndex[agentgeofullname = 'United
States'][(agentcountrycode = '' OR agentcountrycode IS NULL) AND
agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']
Geometries: FilterValues(List(POLYGON ((-180 -90, 0 -90, 180 -90,
180 90, 0 90, -180 90, -180 -90))),true,false)
Intervals: FilterValues(List(),true,false)
Plan: org.locationtech.geomesa.accumulo.index.JoinPlan
Table: CoalesceSearch_attr_v5
Deduplicate: false
Column Families (1): List(I)
Ranges (4): [%01;%00;%0e;%00;United
States%00;::%01;%00;%0e;%00;United States%01;),
[%01;%00;%0e;%01;United States%00;::%01;%00;%0e;%01;United
States%01;), [%01;%00;%0e;%02;United
States%00;::%01;%00;%0e;%02;United States%01;),
[%01;%00;%0e;%03;United States%00;::%01;%00;%0e;%03;United States%01;)
Iterators (0):
Join Plan: org.locationtech.geomesa.accumulo.index.BatchScanPlan
Table: CoalesceSearch_records_v2
Deduplicate: false
Column Families (1): List(F)
Ranges (0):
Iterators (1):
name:filter-transform-iter, priority:25,
class:org.locationtech.geomesa.accumulo.iterators.KryoLazyFilterTransformIterator,
properties:{sft=agentgeocountrycode:String,agentcountrycode:String,agentgeoadm1code:String,pmesiiptsocial:Float,source:String,title:String,pmesiiptpolitical:Float,pmesiipttime:Float,agentcode:String,ontologyreference:String,agentgeoadm2code:String,agenttype2code:String,agentgeofeatureid:String,agentreligion1code:String,agentgeofullname:String:cardinality=high:index=join,pmesiiptmilitary:Float,pmesiiptphysicalenvironment:Float,agenttype3code:String,datecreated:Date,issimulation:Boolean,pmesiiptinfrastructure:Float,agentknowngroupcode:String,pmesiiptinformation:Float,version:String,agentgeotype:Integer,tags:String,agentreligion2code:String,*agentgeolocation:Point,objectkey:String:cardinality=high:index=full,lastmodified:Date,datasource:String,pmesiipteconomic:Float,agentname:String,agenttype1code:String,name:String,agentethniccode:String,namemetaphone:String,status:Integer;geomesa.index.dtg='datecreated',geomesa.table.sharing='true',geomesa.indices='z3:4:3,z2:3:3,records:2:3,attr:5:3',geomesa.table.sharing.prefix='\u0001',
index=records:2, cql=(agentcountrycode = '' OR agentcountrycode IS
NULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS'}
Plan creation took 254ms
Query planning took 481ms