Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[geomesa-dev] Σχετ: Σχετ: DataStoreFinder too slow on first connection

Hello, 

I have no control over the client side of the application so the WFS-T is not an option for me.
Also the client does not have access to my file system. 

What has been set up currently (as a test while trying to figure out how to handle data and queries with the GeoMesa) is a Java Service Page, which is called with an http request and the relevant parameters and what is returned is a JSON object.

The preferable solution for me would include keeping the http request as the communication method.

Reading through the NiFi documentation, I thought that something like that is possible, but I am still unsure of how it should be properly set up to read the parameters and call GeoMesa using them and return the object to the client.

It would be great if you could point me to the right direction.

Best regards,
Maria.



Στις 11:48 μ.μ. Παρασκευή, 13 Απριλίου 2018, ο/η Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> έγραψε:


Hello,

You really have a lot of different options. At a basic level, it sounds like you want some persistent service that will have a data store instance and respond to client requests. How do you plan to communicate between your client and your service? That will probably inform what solution is best, and how you configure NiFi will depend on how you plan to send messages. For example, if your client can just write files to your filesystem, it is fairly easy to configure NiFi to monitor a folder and ingest data written there.

In addition to NiFi, another option you might consider is using geoserver with WFS-T, which might be easier to set up. I believe there are a variety of clients that you could then use to send requests over http, depending on where your client runs.

I would guess that most of the time spent getting the data store is from class loading. But still, 5 seconds seems high. Using the geomesa command-line tools, I'm able to invoke java, get the datastore and retrieve the current feature types in ~2 seconds:

$ time ./geomesa-accumulo get-type-names -c geomesa -u user -p pass -i instance -z zoo
Current feature types:
example-csv

real    0m2.071s
user    0m3.332s
sys    0m0.164s

(user time is higher because it counts time spent in multiple cpu cores - my machine has 4 cores)

Thanks,

Emilio

On 04/13/2018 03:58 PM, Maria Krommyda wrote:
Hello Jim, 

Let me start by thanking you for taking the time to give me such detailed response. 

I am still surprised that it needs 5 secs to find the only one existing datastore, and very curious as to what would have happened if there were many more, but at least I now understand the reason.
I tried the AccumuloDataStoreFactory but as you predicted it didn't improve the time.

Thank you for the dispose() tip, I had misunderstood it to mean the destruction of the schema (delete all data) and I was not using it. 

NiFi seems the only solution for what I try accomplish.
Can I ask you for some, as detailed as possible, examples of how I can configure the NiFi, if you know of any available.
I searched for them but all the examples that I found assume a deep understanding of how NiFi works, which I do not have, and give only an overview of the steps that should be followed.
What I would like to achieve is to receive a request from the client, either to insert data to the DataStore or a query, process that to the GeoMesa and send back a response, either confirming the data storage or the query results properly formatted.

Thank you once more for you time!

Best regards, 
Maria.




Στις 5:33 μ.μ. Παρασκευή, 13 Απριλίου 2018, ο/η Jim Hughes <jhughes@xxxxxxxx> έγραψε:


Hi Maria,

Great question.  The DataStoreFinder.getDataStore calls reads through
the JVM classpath for all the DataStoreFactory's it can find,
instantiates them, and holds them in a registry object(1).  It sounds
like the classpath scanning and classloading is what is taking several
seconds.

The DataStoreFinder.getDataStore approach for getting a DataStore is a
general one which is great for building up general, re-usable code. 
Given the performance concern, you can opt instead create an
AccumuloDataStoreFactory directly and call createDataStore.  That maybe
a little quicker for loading up the necessary classes, etc.

If you have data coming in frequently, NiFi may be a fit.  GeoMesa has a
NiFi adapters (3) which would manage the DataStore connections, etc.

If you can cache/share/re-use the DataStore connection in the client
app, that might be helpful.  DataStore objects tend to be somewhat
heavy-weight, so creating them frequently has some downsides.  As
another option, could you setup a small server to post the incoming
data?

If none of those suggestions help out with your client app, it is worth
noting that DataStore objects should be cleaned up by calling the
dispose() method.

I hope that helps; let us know if you have any other questions.

Cheers,

Jim

1.
https://github.com/geotools/geotools/blob/master/modules/library/main/src/main/java/org/geotools/data/DataStoreFinder.java#L113-L131

2.
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/AccumuloDataStoreFactory.scala

3. https://github.com/geomesa/geomesa-nifi/

On 2018-04-13 10:03, Maria Krommyda wrote:
> Hello everyone,
>
> I am dealing with a very weird and unexpected problem that I would
> like to share with you in case you have any suggestions.
>
> I have set up my system with Zookeeper 3.4.6 on localhost, Hadoop
> 2.2.0, Accumulo 1.7.3, Geomesa 2.11-1.3.5 and Geotools 15.1.
>
> I have written a very simple script that connects to an Geomesa
> Accumulo DataStore and uploads some data.
>
> I use the line:
>
> DataStore dataStore = DataStoreFinder.getDataStore(parameters); in my
> code
>
> I am importing the org.geotools.data.DataStoreFinder accordingly.
>
> The first time that I call the function, with the above line, it takes
> around 4 to 5 secs to find the DataStore and less than 300 msecs to
> upload the data.
>
> If I create a loop and call this function more than once, even with
> some delay between the calls, from the second time onward it takes
> less than 20msecs to find the datastore and approximately the same
> time (300 msecs) to upload the data. I am not sure if this has
> something to do with Java optimization, and the connection is
> maintained from the first call or with anything else.
>
> The problem is that I want to call the function from a client app that
> will call quite often but only once each time, making the 4 secs a
> serious problem.
>
> I have tried searching for any related problems but I couldn't find
> anything helpful. So any ideas and thoughts on what might be the
> problem are highly appreciated.
>
> Thank you very much for your time.
>
> Best regards,
> Maria.

> _______________________________________________
> geomesa-dev mailing list
> geomesa-dev@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or
> unsubscribe from this list, visit
> https://dev.locationtech.org/mailman/listinfo/geomesa-dev




Virus-free. www.avg.com


_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-dev




Back to the top