Hi GeoMesa Users,
we are using GeoMesa with an S3 file system datastore and are experiencing extremely slow response times when we access our data - even with a “moderate” number of files stored in it (let’s say 10.000).
Our setup:
* GeoMesa 2.3.0
* Filesystem datastore pointing to an S3 URL
** encoding: orc
** partition scheme: daily,xz2-8bits
** leaf-storage: true
We’re accessing that data store using different “clients”:
* a Java microservice which uses the GeoTools GeoMesa API and is running in the same AWS region as the S3 bucket
* GeoServer (2.14) running in the same AWS region as the S3 bucket
* geomesa-fs CLI running in the same AWS region as the S3 bucket
All of them are really slow (it takes minutes up to hours until we get a response). Doing some debugging with our microservice we found out that even operations like org.geotools.data.DataStore.getTypeNames() takes really
long because all of the metadata files seem to be scanned (which does not seem to be necessary since reading the per-feature top-level storage.json files should be sufficient). Is that “works-as-designed” or might that be a bug inside the Geomesa-FSDS implementation?
Is there anything (besides switching the actual data store) we can do to improve the performance?
We’re doing a “geomesa-fs compact …” from time to time which gives us a fairly acceptable performance (but also takes hours, sometimes even days, to complete).
Thanks,
Christian
Mit freundlichen Grüßen / Kind regards
Christian Sickert
Crowd Data & Analytics for Automated Driving
Daimler AG - Mercedes-Benz Cars Development - RD/AFC
+49 176 309 71612
christian.sickert@xxxxxxxxxxx