Re: [geomesa-dev] Σχετ: Σχετ: Σχετ: DataStoreFinder too slow on first c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-dev] Σχετ: Σχετ: Σχετ: DataStoreFinder too slow on first connection

From: Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>
Date: Tue, 17 Apr 2018 09:07:16 -0400
Delivered-to: geomesa-dev@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-dev>
List-help: <mailto:geomesa-dev-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-dev>, <mailto:geomesa-dev-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-dev>, <mailto:geomesa-dev-request@locationtech.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

Hello,

I don't think you'd want to use NiFi as a query platform. NiFi is designed to manage flows of data, which in my experience means ingestion pipelines. And indeed, we only offer ingestion processors for geomesa.

If you want to query over http, I'd strongly suggest looking into geoserver. It implements an OGC standard interface for querying, which means that there are lots of client libraries available that you can use to call it, including ones for _javascript_ and java. Geomesa provides plugins that let you access it through geoserver:

http://www.geomesa.org/documentation/user/accumulo/geoserver.html

If you want a custom endpoint, then you will need to translate your request into a Query object and call the data store featureReader/featureSource methods in code. The geotools documentation has a lot of cruft in it (related to the UI framework used), but covers the basics of querying a data store:

http://docs.geotools.org/latest/userguide/tutorial/filter/query.html

Thanks,

Emilio

On 04/17/2018 04:20 AM, Maria Krommyda wrote:

Hello,

I would like to give NiFi a go before looking into JSP.

Let's just assume for now that I have figured out the HTTP request and that I can get the parameters that I need from the client.

So now I have a bounding box that I want to use in a query.

Can you please let me know if there is a Processor to query GeoMesa using parameters?

The Processors that are available are only to ingest data, or have I understood something wrong?

Thank you for your time!

Best regards,

Maria.
Στις 4:13 μ.μ. Δευτέρα, 16 Απριλίου 2018, ο/η Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> έγραψε:
Hello,

You should be able to have some persistent state in a jsp page, so possibly the easiest thing for you is to modify your current jsp to use a shared data store instance.

If you want to continue with nifi, I'd suggest asking on the nifi user forums for ways accept data from http requests. The geomesa nifi integration is mainly through our converter framework, which converts different data formats into geotools simple features that geomesa can ingest. So you would need to set up a flow that will generate data in e.g. csv format, in order to pass it to the geomesa nifi processor. Once you get to that point, we can help out with configuring the geomesa processor appropriately.

Thanks,

Emilio
On 04/14/2018 02:52 PM, Maria Krommyda wrote:
Hello,

I have no control over the client side of the application so the WFS-T is not an option for me.

Also the client does not have access to my file system.

What has been set up currently (as a test while trying to figure out how to handle data and queries with the GeoMesa) is a Java Service Page, which is called with an http request and the relevant parameters and what is returned is a JSON object.

The preferable solution for me would include keeping the http request as the communication method.

Reading through the NiFi documentation, I thought that something like that is possible, but I am still unsure of how it should be properly set up to read the parameters and call GeoMesa using them and return the object to the client.

It would be great if you could point me to the right direction.

Best regards,

Maria.
Στις 11:48 μ.μ. Παρασκευή, 13 Απριλίου 2018, ο/η Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> έγραψε:
Hello,

You really have a lot of different options. At a basic level, it sounds like you want some persistent service that will have a data store instance and respond to client requests. How do you plan to communicate between your client and your service? That will probably inform what solution is best, and how you configure NiFi will depend on how you plan to send messages. For example, if your client can just write files to your filesystem, it is fairly easy to configure NiFi to monitor a folder and ingest data written there.

In addition to NiFi, another option you might consider is using geoserver with WFS-T, which might be easier to set up. I believe there are a variety of clients that you could then use to send requests over http, depending on where your client runs.

I would guess that most of the time spent getting the data store is from class loading. But still, 5 seconds seems high. Using the geomesa command-line tools, I'm able to invoke java, get the datastore and retrieve the current feature types in ~2 seconds:

$ time ./geomesa-accumulo get-type-names -c geomesa -u user -p pass -i instance -z zoo
Current feature types:
example-csv

real    0m2.071s
user    0m3.332s
sys    0m0.164s

(user time is higher because it counts time spent in multiple cpu cores - my machine has 4 cores)

Thanks,

Emilio

On 04/13/2018 03:58 PM, Maria Krommyda wrote:
Hello Jim,

Let me start by thanking you for taking the time to give me such detailed response.

I am still surprised that it needs 5 secs to find the only one existing datastore, and very curious as to what would have happened if there were many more, but at least I now understand the reason.

I tried the AccumuloDataStoreFactory but as you predicted it didn't improve the time.

Thank you for the dispose() tip, I had misunderstood it to mean the destruction of the schema (delete all data) and I was not using it.

NiFi seems the only solution for what I try accomplish.

Can I ask you for some, as detailed as possible, examples of how I can configure the NiFi, if you know of any available.

I searched for them but all the examples that I found assume a deep understanding of how NiFi works, which I do not have, and give only an overview of the steps that should be followed.

What I would like to achieve is to receive a request from the client, either to insert data to the DataStore or a query, process that to the GeoMesa and send back a response, either confirming the data storage or the query results properly formatted.

Thank you once more for you time!

Best regards,

Maria.

Στις 5:33 μ.μ. Παρασκευή, 13 Απριλίου 2018, ο/η Jim Hughes <jhughes@xxxxxxxx> έγραψε:

Hi Maria,

Great question. The DataStoreFinder.getDataStore calls reads through
the JVM classpath for all the DataStoreFactory's it can find,
instantiates them, and holds them in a registry object(1). It sounds
like the classpath scanning and classloading is what is taking several
seconds.

The DataStoreFinder.getDataStore approach for getting a DataStore is a
general one which is great for building up general, re-usable code.
Given the performance concern, you can opt instead create an
AccumuloDataStoreFactory directly and call createDataStore. That maybe
a little quicker for loading up the necessary classes, etc.

If you have data coming in frequently, NiFi may be a fit. GeoMesa has a
NiFi adapters (3) which would manage the DataStore connections, etc.

If you can cache/share/re-use the DataStore connection in the client
app, that might be helpful. DataStore objects tend to be somewhat
heavy-weight, so creating them frequently has some downsides. As
another option, could you setup a small server to post the incoming
data?

If none of those suggestions help out with your client app, it is worth
noting that DataStore objects should be cleaned up by calling the
dispose() method.

I hope that helps; let us know if you have any other questions.

Cheers,

Jim

1.
https://github.com/geotools/geotools/blob/master/modules/library/main/src/main/java/org/geotools/data/DataStoreFinder.java#L113-L131

2.
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/AccumuloDataStoreFactory.scala

3. https://github.com/geomesa/geomesa-nifi/

On 2018-04-13 10:03, Maria Krommyda wrote:
> Hello everyone,
>
> I am dealing with a very weird and unexpected problem that I would
> like to share with you in case you have any suggestions.
>
> I have set up my system with Zookeeper 3.4.6 on localhost, Hadoop
> 2.2.0, Accumulo 1.7.3, Geomesa 2.11-1.3.5 and Geotools 15.1.
>
> I have written a very simple script that connects to an Geomesa
> Accumulo DataStore and uploads some data.
>
> I use the line:
>
> DataStore dataStore = DataStoreFinder.getDataStore(parameters); in my
> code
>
> I am importing the org.geotools.data.DataStoreFinder accordingly.
>
> The first time that I call the function, with the above line, it takes
> around 4 to 5 secs to find the DataStore and less than 300 msecs to
> upload the data.
>
> If I create a loop and call this function more than once, even with
> some delay between the calls, from the second time onward it takes
> less than 20msecs to find the datastore and approximately the same
> time (300 msecs) to upload the data. I am not sure if this has
> something to do with Java optimization, and the connection is
> maintained from the first call or with anything else.
>
> The problem is that I want to call the function from a client app that
> will call quite often but only once each time, making the 4 secs a
> serious problem.
>
> I have tried searching for any related problems but I couldn't find
> anything helpful. So any ideas and thoughts on what might be the
> problem are highly appreciated.
>
> Thank you very much for your time.
>
> Best regards,
> Maria.

> _______________________________________________
> geomesa-dev mailing list
> geomesa-dev@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or
> unsubscribe from this list, visit
> https://dev.locationtech.org/mailman/listinfo/geomesa-dev

Virus-free. www.avg.com
_______________________________________________
geomesa-dev mailing list
geomesa-dev@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-dev

References:
- [geomesa-dev] DataStoreFinder too slow on first connection
  - From: Maria Krommyda
- Re: [geomesa-dev] DataStoreFinder too slow on first connection
  - From: Jim Hughes
- [geomesa-dev] Σχετ: DataStoreFinder too slow on first connection
  - From: Maria Krommyda
- Re: [geomesa-dev] Σχετ: DataStoreFinder too slow on first connection
  - From: Emilio Lahr-Vivaz
- [geomesa-dev] Σχετ: Σχετ: DataStoreFinder too slow on first connection
  - From: Maria Krommyda
- Re: [geomesa-dev] Σχετ: Σχετ: DataStoreFinder too slow on first connection
  - From: Emilio Lahr-Vivaz
- [geomesa-dev] Σχετ: Σχετ: Σχετ: DataStoreFinder too slow on first connection
  - From: Maria Krommyda

Prev by Date: [geomesa-dev] [JIRA] (GEOMESA-2262) Add gs:// to list of remote file prefixes
Next by Date: [geomesa-dev] Σχετ: Σχετ: Σχετ: Σχετ: DataStoreFinder too slow on first connection
Previous by thread: [geomesa-dev] Σχετ: Σχετ: Σχετ: DataStoreFinder too slow on first connection
Next by thread: [geomesa-dev] [JIRA] (GEOMESA-2259) Bigtable data store tools aren't configured correctly
Index(es):
- Date
- Thread

Breadcrumbs