Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Strange Query Performance Issue

Sorry, the data store parameter is 'recordThreads' in 1.3.4 - it's been updated in 2.0.0-m.1.

Thanks,

Emilio

On 03/01/2018 02:40 PM, Emilio Lahr-Vivaz wrote:
Have you tried using the Accumulo tracing functionality? That is probably the best way to tell what's going on.

Idly speculating, possibly the increase in tablet servers (and presumably tablets?) is causing the location lookup to be slow - in a join query, each result coming back from the primary table will result in an additional single row scan. Possibly locating the tablet for each row is taking longer.

One geomesa knob that you can play around with is 'accumulo.query.record-threads', which controls the parallelism for the join scan.

You could also possibly bypass the problem by using a 'full' index instead of a join index - that should be much faster to query in general.

If you figure anything out, let us know!

Thanks,

Emilio

On 03/01/2018 12:26 PM, David Boyd wrote:
All:

   I am running geomesa 1.3.4 on two different clusters with Accumulo (1.7.2). One cluster has 5 nodes, the other 25 nodes.  Both are on similar ec2 instance types
on AWS and the baseline software on both is the same.

The problem is that the 25 node cluster is significantly slower on this particular query.  On the 5 node cluster the query generally takes less than a second.  On the 25
node cluster the query can take over 6 seconds.

We are ingesting data into both clusters based on GDELT. They have similar amounts of data. One of my featuresets contains information about agents in GDELT and is called agentrecordset.

As part of the ingest we do a de-duplication of these agents based on 4 fields providing a unique
key.   To do this we query as follows to see if a record exists:

agentname = 'FRANCE' AND agentcode = 'FRAGOV' AND agentcountrycode = 'FRA' AND agentgeofullname = 'France'

The most unique field is agentgeofullname and there is a join attribute index on that field.

The explain plan for the query on both clusters reads the same, and the schema description for the clusters is the same see below.    The information in the _queries table is also the same except the times take much longer.

I am stumped any thoughts.

Schema:

[root@node26 geomesa]# bin/geomesa describe-schema -u oe_user -p ANArmy0f1Trains12 -i oerepo -c CoalesceSearch -f agentrecordset
INFO  Describing attributes of feature 'agentrecordset'
agentgeocountrycode         | String
agentcountrycode            | String
agentgeoadm1code            | String
pmesiiptsocial              | Float
source                      | String
title                       | String
pmesiiptpolitical           | Float
pmesiipttime                | Float
agentcode                   | String
ontologyreference           | String
agentgeoadm2code            | String
agenttype2code              | String
agentgeofeatureid           | String
agentreligion1code          | String
agentgeofullname            | String  (Attribute indexed - join)
pmesiiptmilitary            | Float
pmesiiptphysicalenvironment | Float
agenttype3code              | String
datecreated                 | Date    (Spatio-temporally indexed)
issimulation                | Boolean
pmesiiptinfrastructure      | Float
agentknowngroupcode         | String
pmesiiptinformation         | Float
version                     | String
agentgeotype                | Integer
tags                        | String
agentreligion2code          | String
agentgeolocation            | Point   (Spatially indexed)
objectkey                   | String  (Attribute indexed)
lastmodified                | Date
datasource                  | String
pmesiipteconomic            | Float
agentname                   | String
agenttype1code              | String
name                        | String
agentethniccode             | String
namemetaphone               | String
status                      | Integer

User data:
  geomesa.index.dtg            | datecreated
  geomesa.indices              | z3:4:3,z2:3:3,records:2:3,attr:5:3
  geomesa.table.sharing        | true
  geomesa.table.sharing.prefix |
Explain:
bin/geomesa explain -u oe_user -f agentrecordset -c CoalesceSearch -p ANArmy0f1Trains12 -q "agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS' AND (agentcountrycode = '' OR agentcountrycode IS NULL) AND agentgeofullname = 'United States'" -i oerepo Planning 'agentrecordset' ((agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL)) AND agentgeofullname = 'United States'   Original filter: ((agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL)) AND agentgeofullname = 'United States'   Hints: bin[false] arrow[false] density[false] stats[false] map-aggregate[false] sampling[none]
  Sort: none
  Transforms: None
  Strategy selection:
    Query processing took 158ms and produced 1 options
    Filter plan: FilterPlan[AttributeIndex[agentgeofullname = 'United States'][(agentcountrycode = '' OR agentcountrycode IS NULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']]
    Strategy selection took 3ms for 1 options
  Strategy 1 of 1: AttributeIndex
    Strategy filter: AttributeIndex[agentgeofullname = 'United States'][(agentcountrycode = '' OR agentcountrycode IS NULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']     Geometries: FilterValues(List(POLYGON ((-180 -90, 0 -90, 180 -90, 180 90, 0 90, -180 90, -180 -90))),true,false)
    Intervals: FilterValues(List(),true,false)
    Plan: org.locationtech.geomesa.accumulo.index.JoinPlan
      Table: CoalesceSearch_attr_v5
      Deduplicate: false
      Column Families (1): List(I)
      Ranges (4): [%01;%00;%0e;%00;United States%00;::%01;%00;%0e;%00;United States%01;), [%01;%00;%0e;%01;United States%00;::%01;%00;%0e;%01;United States%01;), [%01;%00;%0e;%02;United States%00;::%01;%00;%0e;%02;United States%01;), [%01;%00;%0e;%03;United States%00;::%01;%00;%0e;%03;United States%01;)
      Iterators (0):
      Join Plan: org.locationtech.geomesa.accumulo.index.BatchScanPlan
        Table: CoalesceSearch_records_v2
        Deduplicate: false
        Column Families (1): List(F)
        Ranges (0):
        Iterators (1):
          name:filter-transform-iter, priority:25, class:org.locationtech.geomesa.accumulo.iterators.KryoLazyFilterTransformIterator, properties:{sft=agentgeocountrycode:String,agentcountrycode:String,agentgeoadm1code:String,pmesiiptsocial:Float,source:String,title:String,pmesiiptpolitical:Float,pmesiipttime:Float,agentcode:String,ontologyreference:String,agentgeoadm2code:String,agenttype2code:String,agentgeofeatureid:String,agentreligion1code:String,agentgeofullname:String:cardinality=high:index=join,pmesiiptmilitary:Float,pmesiiptphysicalenvironment:Float,agenttype3code:String,datecreated:Date,issimulation:Boolean,pmesiiptinfrastructure:Float,agentknowngroupcode:String,pmesiiptinformation:Float,version:String,agentgeotype:Integer,tags:String,agentreligion2code:String,*agentgeolocation:Point,objectkey:String:cardinality=high:index=full,lastmodified:Date,datasource:String,pmesiipteconomic:Float,agentname:String,agenttype1code:String,name:String,agentethniccode:String,namemetaphone:String,status:Integer;geomesa.index.dtg='datecreated',geomesa.table.sharing='true',geomesa.indices='z3:4:3,z2:3:3,records:2:3,attr:5:3',geomesa.table.sharing.prefix='\u0001', index=records:2, cql=(agentcountrycode = '' OR agentcountrycode IS NULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS'}
    Plan creation took 254ms
  Query planning took 481ms


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users


Back to the top