Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-dev] DataStoreFinder too slow on first connection

Hi Maria,

Great question. The DataStoreFinder.getDataStore calls reads through the JVM classpath for all the DataStoreFactory's it can find, instantiates them, and holds them in a registry object(1). It sounds like the classpath scanning and classloading is what is taking several seconds.

The DataStoreFinder.getDataStore approach for getting a DataStore is a general one which is great for building up general, re-usable code. Given the performance concern, you can opt instead create an AccumuloDataStoreFactory directly and call createDataStore. That maybe a little quicker for loading up the necessary classes, etc.

If you have data coming in frequently, NiFi may be a fit. GeoMesa has a NiFi adapters (3) which would manage the DataStore connections, etc.

If you can cache/share/re-use the DataStore connection in the client app, that might be helpful. DataStore objects tend to be somewhat heavy-weight, so creating them frequently has some downsides. As another option, could you setup a small server to post the incoming data?

If none of those suggestions help out with your client app, it is worth noting that DataStore objects should be cleaned up by calling the dispose() method.

I hope that helps; let us know if you have any other questions.

Cheers,

Jim

1. https://github.com/geotools/geotools/blob/master/modules/library/main/src/main/java/org/geotools/data/DataStoreFinder.java#L113-L131

2. https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/AccumuloDataStoreFactory.scala

3. https://github.com/geomesa/geomesa-nifi/

On 2018-04-13 10:03, Maria Krommyda wrote:
Hello everyone,

I am dealing with a very weird and unexpected problem that I would
like to share with you in case you have any suggestions.

I have set up my system with Zookeeper 3.4.6 on localhost, Hadoop
2.2.0, Accumulo 1.7.3, Geomesa 2.11-1.3.5 and Geotools 15.1.

I have written a very simple script that connects to an Geomesa
Accumulo DataStore and uploads some data.

I use the line:

DataStore dataStore = DataStoreFinder.getDataStore(parameters); in my
code

I am importing the org.geotools.data.DataStoreFinder accordingly.

The first time that I call the function, with the above line, it takes
around 4 to 5 secs to find the DataStore and less than 300 msecs to
upload the data.

If I create a loop and call this function more than once, even with
some delay between the calls, from the second time onward it takes
less than 20msecs to find the datastore and approximately the same
time (300 msecs) to upload the data. I am not sure if this has
something to do with Java optimization, and the connection is
maintained from the first call or with anything else.

The problem is that I want to call the function from a client app that
will call quite often but only once each time, making the 4 secs a
serious problem.

I have tried searching for any related problems but I couldn't find
anything helpful. So any ideas and thoughts on what might be the
problem are highly appreciated.

Thank you very much for your time.

Best regards,
Maria.
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-dev


Back to the top