Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...

Hi Takashi,

One way to possibly improve performance is to use an optimized delete. The FeatureStore delete method queries each feature and then deletes it - instead, you could just do a batch delete on the whole table (note: this assumes that you do not have other schemas sharing the table).

in scala:

val ds = dataStore.asInstanceOf[org.locationtech.geomesa.accumulo.data.AccumuloDataStore] val tables = org.locationtech.geomesa.accumulo.data.tables.GeoMesaTable.getTables(sft)
val tableNames = tables.map(table => ds.getTableName(typeName, table))
tableNames.par.foreach { table =>
  val deleter = ds.connector.createBatchDeleter(table, ...)
  deleter.setRanges(List(new Range()))
  deleter.delete()
  deleter.close()
}

since you're just using the Z2 index, you may be able to simplify this to:

val tableName = ds.getTableName(typeName, org.locationtech.geomesa.accumulo.data.tables.Z2Table)
// delete the single table

I've also created a ticket so that we can detect this case and handle it automatically.

Thanks,

Emilio

On 12/14/2016 10:53 PM, Takashi Sasaki wrote:
Hi Emilio,

I understood detail of generateStats and collectQueryStats and decided
to disable them.

When you delete your data, how are you doing it?
It's easy way, the code is below.
--
private void deleteFeatures(String simpleFeatureTypeName, DataStore
dataStore) throws IOException {
  FeatureStore featureStore = (SimpleFeatureStore)
dataStore.getFeatureSource(simpleFeatureTypeName);
  featureStore.removeFeatures(Filter.INCLUDE);
  dataStore.dispose();
}
--

I wanted to choice Kafka, but abandoned it due to some application
specification...
(I can not say it in detail because of internal confidentiality)


I tried to use only z2 index, I felt it improved somewhat.


Thank you for reply,

Takashi

Hi Takashi,

In geomesa-1.2.3 there are 2 different parameters you might be referencing: generateStats and collectQueryStats. collectQueryStats was named collectStats in older versions, but was renamed to make it clearer. generateStats will store summary statistics for your data, which is then used for query planning. Since you are deleting the data every few minutes, you probably want to disable this, as it will introduce needless overhead. collectQueryStats is a simple form of auditing that will log every query to accumulo, in the '<catalog>_queries' table.

When you delete your data, how are you doing it? It might be the rapid deleting and re-creating that is causing Accumulo problems - managing table state is a single master process and it often seems to cause some contention. Depending on your use case, maybe you should consider using the Kafka data store instead - it excels at real-time data, and hardware costs are considerably lower. It doesn't provide some of the more advanced features of the Accumulo data store, but that might not be a problem for you.

Another tip to reduce your hardware requirements is to disable any indices that you aren't using. It sounds like all your queries are against the z2 index (i.e. they have a spatial component) - if so you could disable the other indices. See here for instructions: http://www.geomesa.org/documentation/1.2.3/user/data_management.html#customizing-index-creation

Thanks,

Emilio

On 12/14/2016 01:47 AM, Takashi Sasaki wrote:
Oops,  I forgot to mention important things.

I'm ingesting the data actually not using mapreduce, but I'm using
multithread programing.
So I delete and re-ingest seven tables "in parallel".
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users



Back to the top