Ah, I would not use the command-line tools for performance testing.
There is a substantial overhead involved in spinning up the JVM for
each query, which is likely dominating the time for smaller queries.
You can use the Accumulo monitor page to look at the index tables
associated with your data and see how many splits there are, and
where they are located. It is usually available on port 9995.
Thanks,
Emilio
On 5/21/19 12:49 PM, Tin Vu wrote:
Hi Emilio,
Thanks for your enthusiasm. I did not use geotools API
programmatically. Instead, I use the GeoMesa-Accumulo command
lines tool to submit a query. In particular, a query looks
like this:
geomesa-accumulo export -u root -p *password* -c *dataset*
-f *data_model* -q bbox(geom,x1,y1,x2,y2) -F csv
How could I check that my data is distributed across
cluster? I store them by Accumulo with HDFS as the file
system.
Thanks,
Tin
Are you using the geotools API
programmatically then? There are a lot of things that can
affect the query performance, a few things I would look at:
* Check if you data is distributed across the cluster. By
default, GeoMesa will create 4 splits on ingestion. If your
data doesn't reach the split threshold, then you will only
be querying 4 regions on at most 4 servers.
* Check that client can handle the number of threads being
used. GeoMesa spawns multiple client threads per query
(based on the data store configuration), so by default you'd
be running 8 threads per query.
* Try to determine the bottleneck - you may be saturating
your network, or your client may not be reading results as
fast as they are being delivered.
I'm not familiar with how SpatialHadoop works, so those
things may or may not be affecting it as well.
At any rate, I don't think anyone has compared the two
before. I'd be interested to see some more detailed results
(code samples, timings, etc), if you'd share them.
Thanks,
Emilio
On
5/20/19 9:10 AM, Tin Vu wrote:
I used concurrent threads. 1 thread for 1
query.
Hello,
How are you submitting queries to GeoMesa?
Thanks,
Emilio
On
5/19/19 3:25 PM, Tin Vu wrote:
Hi Emilio,
Thanks for your response. I executed my
experiments as follows:
1. Cluster: 1 master node, 12 slave
nodes, 64 GB memory in each node.
2. Dataset: Open street map All Nodes
(size 96 GB, 2.7 Billion records).
3. Queries: I created 10 batches of
queries with different size (for example,
query area / whole space area = 10^-12,
10^-11,...., 10^-2). Each batch contains 100
square query in the same size. Those query
is randomly distributed in the whole space.
4. I submit those batches of queries to
SpatialHadoop and GeoMesa, wait until they
finish then count the running time.
Thanks,
Tin
Hello,
Could you say more about how you're
querying? SpatialHadoop uses map/reduce
jobs, if I understand - it seems like there
would be a lot of overhead to spin up the
job. How long are your queries taking? How
big is your cluster?
Thanks,
Emilio
On
5/16/19 3:20 PM, Tin Vu wrote:
Hi all,
I just wanted to to ask you a
question about the performance of
GeoMesa range query. This is my
experimental set up:
3. Query: range query with
different selectivity: 10^-12, 10^-11,
10^-10, which is the ratio of query
range and total area of the dataset
space.
I saw that GeoMesa does not work
better than SpatialHadoop, which is
not expected. Since I think that
GeoMesa (organize data in
record-level) should be better than
SpatialHadoop (organize data in
block-level) in highly selective
queries. Could you tell me any idea to
tune GeoMesa such that it can provide
a better performance?
Thanks,
Tin
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your
password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your
password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
|