Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Tips on ingesting a lot of data into GeoMesa

On Fri, Jan 27, 2017 at 4:46 PM, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx> wrote:
That converter needs some more work... it generally works fine for smaller data sets, but for larger files (e.g. planet.gpx) it bogs down. If you're interested, it needs to be changed to use an arbitrary JDBC connection for storing the nodes - the in memory/on disk implementations don't hold up. I've actually been holding off because I'm not sure the underlying OSM converter libraries will be approved for use by Eclipse.

That's good to know.
I don't intend to ingest the whole planet -- at least not with a single file.
I actually planned to use the extractions from geofabrik.de.
The largest single regions are in the order of a few GB at most (as PBF).
But maybe that's already too big for the in-memory database?

Otherwise I had the idea of an offline preprocessing of the OSM dataset, by producing a collection of lots of equally, reasonably sized extracts.
That led me to actually think about what is the most space / CPU efficient format for vector data?
  • all text-based formats (XML, CSV) are super verbose
  • Shapefile is pretty limited (maximum DB size, length of field name, no datetime type)
  • Avro is quite specific and has no support in tools like OGR
Would something SQLite (or even SpatiaLite) be the most appropriate after all?
Support for SQLite / SpatiaLite in GeoTools is not really up to date but I managed to fix that (I shall publish that on Github when I have time...).
Or maybe I have missed an obvious format??

--
Damiano Albani
Geodan

Back to the top