Re: [geomesa-users] Strange Query Performance Issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Strange Query Performance Issue

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Thu, 1 Mar 2018 14:42:41 -0500
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

Sorry, the data store parameter is 'recordThreads' in 1.3.4 - it's beenupdated in 2.0.0-m.1.


Thanks,

Emilio

On 03/01/2018 02:40 PM, Emilio Lahr-Vivaz wrote:

Have you tried using the Accumulo tracing functionality? That isprobably the best way to tell what's going on.
Idly speculating, possibly the increase in tablet servers (andpresumably tablets?) is causing the location lookup to be slow - in ajoin query, each result coming back from the primary table will resultin an additional single row scan. Possibly locating the tablet foreach row is taking longer.
One geomesa knob that you can play around with is'accumulo.query.record-threads', which controls the parallelism forthe join scan.
You could also possibly bypass the problem by using a 'full' indexinstead of a join index - that should be much faster to query in general.
If you figure anything out, let us know!

Thanks,

Emilio

On 03/01/2018 12:26 PM, David Boyd wrote:
All:
I am running geomesa 1.3.4 on two different clusters with Accumulo(1.7.2).One cluster has 5 nodes, the other 25 nodes. Both are on similar ec2instance types
on AWS and the baseline software on both is the same.
The problem is that the 25 node cluster is significantly slower onthis particularquery. On the 5 node cluster the query generally takes less than asecond. On the 25
node cluster the query can take over 6 seconds.
We are ingesting data into both clusters based on GDELT. They havesimilar amounts of data.One of my featuresets contains information about agents in GDELT andis called agentrecordset.
As part of the ingest we do a de-duplication of these agents based on4 fields providing a unique
key.   To do this we query as follows to see if a record exists:
agentname = 'FRANCE' AND agentcode = 'FRAGOV' AND agentcountrycode ='FRA' AND agentgeofullname = 'France'
The most unique field is agentgeofullname and there is a joinattribute index on that field.
The explain plan for the query on both clusters reads the same, andthe schema description for the clusters is the same seebelow. The information in the _queries table is also the sameexcept the times take much longer.
I am stumped any thoughts.

Schema:
[root@node26 geomesa]# bin/geomesa describe-schema -u oe_user -pANArmy0f1Trains12 -i oerepo -c CoalesceSearch -f agentrecordset
INFO  Describing attributes of feature 'agentrecordset'
agentgeocountrycode         | String
agentcountrycode            | String
agentgeoadm1code            | String
pmesiiptsocial              | Float
source                      | String
title                       | String
pmesiiptpolitical           | Float
pmesiipttime                | Float
agentcode                   | String
ontologyreference           | String
agentgeoadm2code            | String
agenttype2code              | String
agentgeofeatureid           | String
agentreligion1code          | String
agentgeofullname            | String  (Attribute indexed - join)
pmesiiptmilitary            | Float
pmesiiptphysicalenvironment | Float
agenttype3code              | String
datecreated                 | Date    (Spatio-temporally indexed)
issimulation                | Boolean
pmesiiptinfrastructure      | Float
agentknowngroupcode         | String
pmesiiptinformation         | Float
version                     | String
agentgeotype                | Integer
tags                        | String
agentreligion2code          | String
agentgeolocation            | Point   (Spatially indexed)
objectkey                   | String  (Attribute indexed)
lastmodified                | Date
datasource                  | String
pmesiipteconomic            | Float
agentname                   | String
agenttype1code              | String
name                        | String
agentethniccode             | String
namemetaphone               | String
status                      | Integer

User data:
  geomesa.index.dtg            | datecreated
  geomesa.indices              | z3:4:3,z2:3:3,records:2:3,attr:5:3
  geomesa.table.sharing        | true
  geomesa.table.sharing.prefix |
Explain:
bin/geomesa explain -u oe_user -f agentrecordset -c CoalesceSearch-p ANArmy0f1Trains12 -q "agentname = 'NATIONAL BANK' AND agentcode ='GOVBUS' AND (agentcountrycode = '' OR agentcountrycode IS NULL) ANDagentgeofullname = 'United States'" -i oerepoPlanning 'agentrecordset' ((agentname = 'NATIONAL BANK' ANDagentcode = 'GOVBUS') AND (agentcountrycode = '' OR agentcountrycodeIS NULL)) AND agentgeofullname = 'United States' Original filter: ((agentname = 'NATIONAL BANK' AND agentcode ='GOVBUS') AND (agentcountrycode = '' OR agentcountrycode IS NULL))AND agentgeofullname = 'United States' Hints: bin[false] arrow[false] density[false] stats[false]map-aggregate[false] sampling[none]
  Sort: none
  Transforms: None
  Strategy selection:
    Query processing took 158ms and produced 1 options
Filter plan: FilterPlan[AttributeIndex[agentgeofullname ='United States'][(agentcountrycode = '' OR agentcountrycode IS NULL)AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS']]
    Strategy selection took 3ms for 1 options
  Strategy 1 of 1: AttributeIndex
Strategy filter: AttributeIndex[agentgeofullname = 'UnitedStates'][(agentcountrycode = '' OR agentcountrycode IS NULL) ANDagentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS'] Geometries: FilterValues(List(POLYGON ((-180 -90, 0 -90, 180-90, 180 90, 0 90, -180 90, -180 -90))),true,false)
    Intervals: FilterValues(List(),true,false)
    Plan: org.locationtech.geomesa.accumulo.index.JoinPlan
      Table: CoalesceSearch_attr_v5
      Deduplicate: false
      Column Families (1): List(I)
Ranges (4): [%01;%00;%0e;%00;UnitedStates%00;::%01;%00;%0e;%00;United States%01;),[%01;%00;%0e;%01;United States%00;::%01;%00;%0e;%01;UnitedStates%01;), [%01;%00;%0e;%02;UnitedStates%00;::%01;%00;%0e;%02;United States%01;),[%01;%00;%0e;%03;United States%00;::%01;%00;%0e;%03;United States%01;)
      Iterators (0):
      Join Plan: org.locationtech.geomesa.accumulo.index.BatchScanPlan
        Table: CoalesceSearch_records_v2
        Deduplicate: false
        Column Families (1): List(F)
        Ranges (0):
        Iterators (1):
name:filter-transform-iter, priority:25,class:org.locationtech.geomesa.accumulo.iterators.KryoLazyFilterTransformIterator,properties:{sft=agentgeocountrycode:String,agentcountrycode:String,agentgeoadm1code:String,pmesiiptsocial:Float,source:String,title:String,pmesiiptpolitical:Float,pmesiipttime:Float,agentcode:String,ontologyreference:String,agentgeoadm2code:String,agenttype2code:String,agentgeofeatureid:String,agentreligion1code:String,agentgeofullname:String:cardinality=high:index=join,pmesiiptmilitary:Float,pmesiiptphysicalenvironment:Float,agenttype3code:String,datecreated:Date,issimulation:Boolean,pmesiiptinfrastructure:Float,agentknowngroupcode:String,pmesiiptinformation:Float,version:String,agentgeotype:Integer,tags:String,agentreligion2code:String,*agentgeolocation:Point,objectkey:String:cardinality=high:index=full,lastmodified:Date,datasource:String,pmesiipteconomic:Float,agentname:String,agenttype1code:String,name:String,agentethniccode:String,namemetaphone:String,status:Integer;geomesa.index.dtg='datecreated',geomesa.table.sharing='true',geomesa.indices='z3:4:3,z2:3:3,records:2:3,attr:5:3',geomesa.table.sharing.prefix='\u0001',index=records:2, cql=(agentcountrycode = '' OR agentcountrycode ISNULL) AND agentname = 'NATIONAL BANK' AND agentcode = 'GOVBUS'}
    Plan creation took 254ms
  Query planning took 481ms
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, orunsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

References:
- [geomesa-users] Strange Query Performance Issue
  - From: David Boyd
- Re: [geomesa-users] Strange Query Performance Issue
  - From: Emilio Lahr-Vivaz

Prev by Date: Re: [geomesa-users] Strange Query Performance Issue
Next by Date: [geomesa-users] GeoMesa 2.0.0 release candidate 1
Previous by thread: Re: [geomesa-users] Strange Query Performance Issue
Next by thread: [geomesa-users] GeoMesa 2.0.0 release candidate 1
Index(es):
- Date
- Thread

Breadcrumbs