Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Thu, 15 Dec 2016 08:59:04 -0500
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://www.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://www.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://www.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

Hi Takashi,

One way to possibly improve performance is to use an optimized delete.The FeatureStore delete method queries each feature and then deletes it- instead, you could just do a batch delete on the whole table (note:this assumes that you do not have other schemas sharing the table).


in scala:

val ds =dataStore.asInstanceOf[org.locationtech.geomesa.accumulo.data.AccumuloDataStore]val tables =org.locationtech.geomesa.accumulo.data.tables.GeoMesaTable.getTables(sft)

val tableNames = tables.map(table => ds.getTableName(typeName, table))
tableNames.par.foreach { table =>
  val deleter = ds.connector.createBatchDeleter(table, ...)
  deleter.setRanges(List(new Range()))
  deleter.delete()
  deleter.close()
}

since you're just using the Z2 index, you may be able to simplify this to:

val tableName = ds.getTableName(typeName,org.locationtech.geomesa.accumulo.data.tables.Z2Table)

// delete the single table

I've also created a ticket so that we can detect this case and handle itautomatically.


Thanks,

Emilio

On 12/14/2016 10:53 PM, Takashi Sasaki wrote:

Hi Emilio,

I understood detail of generateStats and collectQueryStats and decided
to disable them.

When you delete your data, how are you doing it?

It's easy way, the code is below.
--
private void deleteFeatures(String simpleFeatureTypeName, DataStore
dataStore) throws IOException {
  FeatureStore featureStore = (SimpleFeatureStore)
dataStore.getFeatureSource(simpleFeatureTypeName);
  featureStore.removeFeatures(Filter.INCLUDE);
  dataStore.dispose();
}
--

I wanted to choice Kafka, but abandoned it due to some application
specification...
(I can not say it in detail because of internal confidentiality)


I tried to use only z2 index, I felt it improved somewhat.


Thank you for reply,

Takashi

Hi Takashi,

In geomesa-1.2.3 there are 2 different parameters you might be referencing: generateStats and collectQueryStats. collectQueryStats was named collectStats in older versions, but was renamed to make it clearer. generateStats will store summary statistics for your data, which is then used for query planning. Since you are deleting the data every few minutes, you probably want to disable this, as it will introduce needless overhead. collectQueryStats is a simple form of auditing that will log every query to accumulo, in the '<catalog>_queries' table.

When you delete your data, how are you doing it? It might be the rapid deleting and re-creating that is causing Accumulo problems - managing table state is a single master process and it often seems to cause some contention. Depending on your use case, maybe you should consider using the Kafka data store instead - it excels at real-time data, and hardware costs are considerably lower. It doesn't provide some of the more advanced features of the Accumulo data store, but that might not be a problem for you.

Another tip to reduce your hardware requirements is to disable any indices that you aren't using. It sounds like all your queries are against the z2 index (i.e. they have a spatial component) - if so you could disable the other indices. See here for instructions: http://www.geomesa.org/documentation/1.2.3/user/data_management.html#customizing-index-creation

Thanks,

Emilio

On 12/14/2016 01:47 AM, Takashi Sasaki wrote:
Oops,  I forgot to mention important things.

I'm ingesting the data actually not using mapreduce, but I'm using
multithread programing.
So I delete and re-ingest seven tables "in parallel".

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.locationtech.org/mailman/listinfo/geomesa-users

References:
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Andrew
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Takashi Sasaki
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Takashi Sasaki
- Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
  - From: Takashi Sasaki

Prev by Date: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Next by Date: [geomesa-users] Geomesa Apache Spark Analysis
Previous by thread: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Next by thread: Re: [geomesa-users] How to avoid Tablet Server crashing, and server cost problem...
Index(es):
- Date
- Thread

Breadcrumbs