Re: [geomesa-users] "explain" command line tool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] "explain" command line tool

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Mon, 17 Dec 2018 11:08:04 -0500
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

Hello,

The filesystem data store is more rudimentary than most of our data stores, and it doesn't support explain planning yet. In general, the planning is not very interesting, as typically the partitions are broad (e.g. when using something like 'daily,z2:2-bits', there are 4 spatial quadrants per day). If you are using multi-threading (the default, controllable via data store parameter "fs.read-threads"), then you can see which partitions are being hit for a given query by enabling debug logging on 'org.locationtech.geomesa.fs'.

Which data store implementation to pick depends on your use case. The filesystem data store is very easy to get started with, and provides extremely low operating costs if you run it on something like s3. Using a full-fledged database like HBase or Accumulo can up your setup complexity and operating costs considerably, but gives you much more fine-grained query capabilities, as well as distributed push-down processing.

If you generally code to the geotools data store API (and/or the spark API), then it should be fairly trivial (from a client perspective) to switch between data store implementations. Roughly, they all offer the same capabilities, but some operations may be (much) slower depending on the implementation, and a few things may not be implemented everywhere (i.e. the explain command). Since you are in the exploratory phase, the filesystem datastore should give you a decent idea of the *kind* of things that you can do, with the understanding that it may not be the *fastest* you could do them.

Thanks,

Emilio

On 12/16/18 1:49 PM, Andrew Ames wrote:

I have been following the quickstart guide using the geomesa-fs command line tools.

It seems that there is no "explain" command when using bin/geomesa-fs. However, I see the "explain" command throughout parts of the documentation. (I was curious to find out why some spatio-temporal queries were slower than others.)

Is this because I chose to use geomesa-fs and not some other command line tools package? (I have been using HDFS and installed the hadoop deps in order to work through tutorials and work with some big data I have.)

Is there another package I should be using if I am just getting started with indexing "big" spatio-temporal data? Like Accumulo? (Ultimately, I want to use this stack to experiment with heatmap generation and flow analysis of moving entities and so on. Balancing transformations across nodes is also something I want to play with.)

--

Andrew Ames
aames@xxxxxxxxxx
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users

References:
- [geomesa-users] "explain" command line tool
  - From: Andrew Ames

Prev by Date: [geomesa-users] "explain" command line tool
Next by Date: [geomesa-users] GeoMesa 2.2.0 and 2.1.1 released
Previous by thread: [geomesa-users] "explain" command line tool
Next by thread: [geomesa-users] GeoMesa 2.2.0 and 2.1.1 released
Index(es):
- Date
- Thread

Breadcrumbs