Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[geomesa-users] Strange Query Performance Issue

All:

   I am running geomesa 1.3.4 on two different clusters with Accumulo (1.7.2). One cluster has 5 nodes, the other 25 nodes.  Both are on similar ec2 instance types
on AWS and the baseline software on both is the same.

The problem is that the 25 node cluster is significantly slower on this particular query.  On the 5 node cluster the query generally takes less than a second.  On the 25
node cluster the query can take over 6 seconds.

We are ingesting data into both clusters based on GDELT. They have similar amounts of data. One of my featuresets contains information about agents in GDELT and is called agentrecordset.

As part of the ingest we do a de-duplication of these agents based on 4 fields providing a unique
key.   To do this we query as follows to see if a record exists:

agentname = 'FRANCE' AND agentcode = 'FRAGOV' AND agentcountrycode = 'FRA' AND agentgeofullname = 'France'

The most unique field is agentgeofullname and there is a join attribute index on that field.

The explain plan for the query on both clusters reads the same, and the schema description for the clusters is the same see below.    The information in the _queries table is also the same except the times take much longer.

I am stumped any thoughts.

Schema:

[root@node26 geomesa]# bin/geomesa describe-schema -u oe_user -p ANArmy0f1Trains12 -i oerepo -c CoalesceSearch -f agentrecordset
INFO  Describing attributes of feature 'agentrecordset'
agentgeocountrycode         | String
agentcountrycode            | String
agentgeoadm1code            | String
pmesiiptsocial              | Float
source                      | String
title                       | String
pmesiiptpolitical           | Float
pmesiipttime                | Float
agentcode                   | String
ontologyreference           | String
agentgeoadm2code            | String
agenttype2code              | String
agentgeofeatureid           | String
agentreligion1code          | String
agentgeofullname            | String  (Attribute indexed - join)
pmesiiptmilitary            | Float
pmesiiptphysicalenvironment | Float
agenttype3code              | String
datecreated                 | Date    (Spatio-temporally indexed)
issimulation                | Boolean
pmesiiptinfrastructure      | Float
agentknowngroupcode         | String
pmesiiptinformation         | Float
version                     | String
agentgeotype                | Integer
tags                        | String
agentreligion2code          | String
agentgeolocation            | Point   (Spatially indexed)
objectkey                   | String  (Attribute indexed)
lastmodified                | Date
datasource                  | String
pmesiipteconomic            | Float
agentname                   | String
agenttype1code              | String
name                        | String
agentethniccode             | String
namemetaphone               | String
status                      | Integer

User data:
  geomesa.index.dtg            | datecreated
  geomesa.indices              | z3:4:3,z2:3:3,records:2:3,attr:5:3
  geomesa.table.sharing        | true
  geomesa.table.sharing.prefix |
Explain:
bin/geomesa explain -u oe_user -f agentrecordset -c CoalesceSearch -p ANArmy0f1Trains12 -q "agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS' AND (agentcountrycode = '' OR agentcountrycode IS NULL) AND agentgeofullname = 'United States'" -i oerepo Planning 'agentrecordset' ((agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL)) AND agentgeofullname = 'United States'   Original filter: ((agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL)) AND agentgeofullname = 'United States'   Hints: bin[false] arrow[false] density[false] stats[false] map-aggregate[false] sampling[none]
  Sort: none
  Transforms: None
  Strategy selection:
    Query processing took 158ms and produced 1 options
    Filter plan: FilterPlan[AttributeIndex[agentgeofullname = 'United States'][(agentcountrycode = '' OR agentcountrycode IS NULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']]
    Strategy selection took 3ms for 1 options
  Strategy 1 of 1: AttributeIndex
    Strategy filter: AttributeIndex[agentgeofullname = 'United States'][(agentcountrycode = '' OR agentcountrycode IS NULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']     Geometries: FilterValues(List(POLYGON ((-180 -90, 0 -90, 180 -90, 180 90, 0 90, -180 90, -180 -90))),true,false)
    Intervals: FilterValues(List(),true,false)
    Plan: org.locationtech.geomesa.accumulo.index.JoinPlan
      Table: CoalesceSearch_attr_v5
      Deduplicate: false
      Column Families (1): List(I)
      Ranges (4): [%01;%00;%0e;%00;United States%00;::%01;%00;%0e;%00;United States%01;), [%01;%00;%0e;%01;United States%00;::%01;%00;%0e;%01;United States%01;), [%01;%00;%0e;%02;United States%00;::%01;%00;%0e;%02;United States%01;), [%01;%00;%0e;%03;United States%00;::%01;%00;%0e;%03;United States%01;)
      Iterators (0):
      Join Plan: org.locationtech.geomesa.accumulo.index.BatchScanPlan
        Table: CoalesceSearch_records_v2
        Deduplicate: false
        Column Families (1): List(F)
        Ranges (0):
        Iterators (1):
          name:filter-transform-iter, priority:25, class:org.locationtech.geomesa.accumulo.iterators.KryoLazyFilterTransformIterator, properties:{sft=agentgeocountrycode:String,agentcountrycode:String,agentgeoadm1code:String,pmesiiptsocial:Float,source:String,title:String,pmesiiptpolitical:Float,pmesiipttime:Float,agentcode:String,ontologyreference:String,agentgeoadm2code:String,agenttype2code:String,agentgeofeatureid:String,agentreligion1code:String,agentgeofullname:String:cardinality=high:index=join,pmesiiptmilitary:Float,pmesiiptphysicalenvironment:Float,agenttype3code:String,datecreated:Date,issimulation:Boolean,pmesiiptinfrastructure:Float,agentknowngroupcode:String,pmesiiptinformation:Float,version:String,agentgeotype:Integer,tags:String,agentreligion2code:String,*agentgeolocation:Point,objectkey:String:cardinality=high:index=full,lastmodified:Date,datasource:String,pmesiipteconomic:Float,agentname:String,agenttype1code:String,name:String,agentethniccode:String,namemetaphone:String,status:Integer;geomesa.index.dtg='datecreated',geomesa.table.sharing='true',geomesa.indices='z3:4:3,z2:3:3,records:2:3,attr:5:3',geomesa.table.sharing.prefix='\u0001', index=records:2, cql=(agentcountrycode = '' OR agentcountrycode IS NULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS'}
    Plan creation took 254ms
  Query planning took 481ms

--
========= mailto:dboyd@xxxxxxxxxxxxxxxxx ============
David W. Boyd
VP,  Data Solutions
10432 Balls Ford, Suite 240
Manassas, VA 20109
office:   +1-703-552-2862
cell:     +1-703-402-7908
============== http://www.incadencecorp.com/ ============
ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
Chair ANSI/INCITS TC Big Data
Co-chair NIST Big Data Public Working Group Reference Architecture
First Robotic Mentor - FRC, FTC - www.iliterobotics.org
Board Member- USSTEM Foundation - www.usstem.org

The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.


Back to the top