[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[geomesa-users] Strange Query Performance Issue
|
All:
I am running geomesa 1.3.4 on two different clusters with Accumulo
(1.7.2).
One cluster has 5 nodes, the other 25 nodes. Both are on similar ec2
instance types
on AWS and the baseline software on both is the same.
The problem is that the 25 node cluster is significantly slower on this
particular
query. On the 5 node cluster the query generally takes less than a
second. On the 25
node cluster the query can take over 6 seconds.
We are ingesting data into both clusters based on GDELT. They have
similar amounts of data.
One of my featuresets contains information about agents in GDELT and is
called agentrecordset.
As part of the ingest we do a de-duplication of these agents based on 4
fields providing a unique
key. To do this we query as follows to see if a record exists:
agentname = 'FRANCE' AND agentcode = 'FRAGOV' AND agentcountrycode =
'FRA' AND agentgeofullname = 'France'
The most unique field is agentgeofullname and there is a join attribute
index on that field.
The explain plan for the query on both clusters reads the same, and the
schema description for the clusters is the same see
below. The information in the _queries table is also the same except
the times take much longer.
I am stumped any thoughts.
Schema:
[root@node26 geomesa]# bin/geomesa describe-schema -u oe_user -p
ANArmy0f1Trains12 -i oerepo -c CoalesceSearch -f agentrecordset
INFO Describing attributes of feature 'agentrecordset'
agentgeocountrycode | String
agentcountrycode | String
agentgeoadm1code | String
pmesiiptsocial | Float
source | String
title | String
pmesiiptpolitical | Float
pmesiipttime | Float
agentcode | String
ontologyreference | String
agentgeoadm2code | String
agenttype2code | String
agentgeofeatureid | String
agentreligion1code | String
agentgeofullname | String (Attribute indexed - join)
pmesiiptmilitary | Float
pmesiiptphysicalenvironment | Float
agenttype3code | String
datecreated | Date (Spatio-temporally indexed)
issimulation | Boolean
pmesiiptinfrastructure | Float
agentknowngroupcode | String
pmesiiptinformation | Float
version | String
agentgeotype | Integer
tags | String
agentreligion2code | String
agentgeolocation | Point (Spatially indexed)
objectkey | String (Attribute indexed)
lastmodified | Date
datasource | String
pmesiipteconomic | Float
agentname | String
agenttype1code | String
name | String
agentethniccode | String
namemetaphone | String
status | Integer
User data:
geomesa.index.dtg | datecreated
geomesa.indices | z3:4:3,z2:3:3,records:2:3,attr:5:3
geomesa.table.sharing | true
geomesa.table.sharing.prefix |
Explain:
bin/geomesa explain -u oe_user -f agentrecordset -c CoalesceSearch -p
ANArmy0f1Trains12 -q "agentname = 'NATIONAL BANK' AND agentcode =
'GOVBUS' AND (agentcountrycode = '' OR agentcountrycode IS NULL) AND
agentgeofullname = 'United States'" -i oerepo
Planning 'agentrecordset' ((agentname = 'NATIONAL BANK' AND agentcode
= 'GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL))
AND agentgeofullname = 'United States'
Original filter: ((agentname = 'NATIONAL BANK' AND agentcode =
'GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL)) AND
agentgeofullname = 'United States'
Hints: bin[false] arrow[false] density[false] stats[false]
map-aggregate[false] sampling[none]
Sort: none
Transforms: None
Strategy selection:
Query processing took 158ms and produced 1 options
Filter plan: FilterPlan[AttributeIndex[agentgeofullname = 'United
States'][(agentcountrycode = '' OR agentcountrycode IS NULL) AND
agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']]
Strategy selection took 3ms for 1 options
Strategy 1 of 1: AttributeIndex
Strategy filter: AttributeIndex[agentgeofullname = 'United
States'][(agentcountrycode = '' OR agentcountrycode IS NULL) AND
agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']
Geometries: FilterValues(List(POLYGON ((-180 -90, 0 -90, 180 -90,
180 90, 0 90, -180 90, -180 -90))),true,false)
Intervals: FilterValues(List(),true,false)
Plan: org.locationtech.geomesa.accumulo.index.JoinPlan
Table: CoalesceSearch_attr_v5
Deduplicate: false
Column Families (1): List(I)
Ranges (4): [%01;%00;%0e;%00;United
States%00;::%01;%00;%0e;%00;United States%01;),
[%01;%00;%0e;%01;United States%00;::%01;%00;%0e;%01;United
States%01;), [%01;%00;%0e;%02;United
States%00;::%01;%00;%0e;%02;United States%01;),
[%01;%00;%0e;%03;United States%00;::%01;%00;%0e;%03;United States%01;)
Iterators (0):
Join Plan: org.locationtech.geomesa.accumulo.index.BatchScanPlan
Table: CoalesceSearch_records_v2
Deduplicate: false
Column Families (1): List(F)
Ranges (0):
Iterators (1):
name:filter-transform-iter, priority:25,
class:org.locationtech.geomesa.accumulo.iterators.KryoLazyFilterTransformIterator,
properties:{sft=agentgeocountrycode:String,agentcountrycode:String,agentgeoadm1code:String,pmesiiptsocial:Float,source:String,title:String,pmesiiptpolitical:Float,pmesiipttime:Float,agentcode:String,ontologyreference:String,agentgeoadm2code:String,agenttype2code:String,agentgeofeatureid:String,agentreligion1code:String,agentgeofullname:String:cardinality=high:index=join,pmesiiptmilitary:Float,pmesiiptphysicalenvironment:Float,agenttype3code:String,datecreated:Date,issimulation:Boolean,pmesiiptinfrastructure:Float,agentknowngroupcode:String,pmesiiptinformation:Float,version:String,agentgeotype:Integer,tags:String,agentreligion2code:String,*agentgeolocation:Point,objectkey:String:cardinality=high:index=full,lastmodified:Date,datasource:String,pmesiipteconomic:Float,agentname:String,agenttype1code:String,name:String,agentethniccode:String,namemetaphone:String,status:Integer;geomesa.index.dtg='datecreated',geomesa.table.sharing='true',geomesa.indices='z3:4:3,z2:3:3,records:2:3,attr:5:3',geomesa.table.sharing.prefix='\u0001',
index=records:2, cql=(agentcountrycode = '' OR agentcountrycode IS
NULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS'}
Plan creation took 254ms
Query planning took 481ms
--
========= mailto:dboyd@xxxxxxxxxxxxxxxxx ============
David W. Boyd
VP, Data Solutions
10432 Balls Ford, Suite 240
Manassas, VA 20109
office: +1-703-552-2862
cell: +1-703-402-7908
============== http://www.incadencecorp.com/ ============
ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
Chair ANSI/INCITS TC Big Data
Co-chair NIST Big Data Public Working Group Reference Architecture
First Robotic Mentor - FRC, FTC - www.iliterobotics.org
Board Member- USSTEM Foundation - www.usstem.org
The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited. If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.