Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa

From: Damiano Albani <damiano.albani@xxxxxxxxx>
Date: Fri, 27 Jan 2017 17:03:05 +0100
Delivered-to: geomesa-users@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geomesa-users>
List-help: <mailto:geomesa-users-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geomesa-users>, <mailto:geomesa-users-request@locationtech.org?subject=unsubscribe>

On Fri, Jan 27, 2017 at 4:46 PM, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:

That converter needs some more work... it generally works fine for smaller data sets, but for larger files (e.g. planet.gpx) it bogs down. If you're interested, it needs to be changed to use an arbitrary JDBC connection for storing the nodes - the in memory/on disk implementations don't hold up. I've actually been holding off because I'm not sure the underlying OSM converter libraries will be approved for use by Eclipse.

That's good to know.

I don't intend to ingest the whole planet -- at least not with a single file.

I actually planned to use the extractions from geofabrik.de.
The largest single regions are in the order of a few GB at most (as PBF).

But maybe that's already too big for the in-memory database?

Otherwise I had the idea of an offline preprocessing of the OSM dataset, by producing a collection of lots of equally, reasonably sized extracts.

That led me to actually think about what is the most space / CPU efficient format for vector data?

all text-based formats (XML, CSV) are super verbose
Shapefile is pretty limited (maximum DB size, length of field name, no datetime type)
Avro is quite specific and has no support in tools like OGR

Would something SQLite (or even SpatiaLite) be the most appropriate after all?

Support for SQLite / SpatiaLite in GeoTools is not really up to date but I managed to fix that (I shall publish that on Github when I have time...).

Or maybe I have missed an obvious format??

Damiano Albani
Geodan

References:
- [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Emilio Lahr-Vivaz
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Damiano Albani
- Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
  - From: Emilio Lahr-Vivaz

Prev by Date: Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
Next by Date: [geomesa-users] Null pointer exception when trying to use Geomesa:Density
Previous by thread: Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa
Next by thread: [geomesa-users] Null pointer exception when trying to use Geomesa:Density
Index(es):
- Date
- Thread

Breadcrumbs