[geotrellis-user] simple zonal stats example

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[geotrellis-user] simple zonal stats example

From: Charlie Hofmann <charlie.hofmann@xxxxxxx>
Date: Fri, 11 Aug 2017 18:33:44 +0000
Accept-language: en-US
Delivered-to: geotrellis-user@xxxxxxxxxxxxxxxx
List-archive: <https://dev.locationtech.org/mhonarc/lists/geotrellis-user>
List-help: <mailto:geotrellis-user-request@locationtech.org?subject=help>
List-subscribe: <https://dev.locationtech.org/mailman/listinfo/geotrellis-user>, <mailto:geotrellis-user-request@locationtech.org?subject=subscribe>
List-unsubscribe: <https://dev.locationtech.org/mailman/options/geotrellis-user>, <mailto:geotrellis-user-request@locationtech.org?subject=unsubscribe>
Thread-index: AdMS0FyMBT1BTUTxTkSXDEUqWuhKYA==
Thread-topic: simple zonal stats example

Hi list,

I’m trying to run polygonalSum for a variety of polygons on a 10x10 degree float raster. I’ve forked the geotrellis-landsat-tutorial and put together some code here:

https://github.com/wri/geotrellis-zonal-stats. I’m very new to Scala and even newer to GeoTrellis, so any help on code/style/convention is appreciated.

My code in ZonalStats.scala does the following:

Reads a bunch of 256 x 256 raster tiles from s3 using S3GeoTiffRDD
Converts this to a tiledRDD and then to a layerRDD
Reads a geojson file to get the geometries from it
Maps over the geometries (115 in total), calculating polygonalSumDouble for each one

I’m running this on EMR using a yarn-managed cluster of m3.xlarge machines— this takes about half an hour. To package the code, I execute:

./sbt "project geotrellis-zonal-stats" assembly

And then to run it:

spark-submit --class tutorial.ZonalStats target/scala-2.11/demo-assembly-0.2.0.jar --master yarn --executor-memory 15g

Most of the GeoTrellis examples deal with making web services for tiled maps or on-the-fly geoprocessing. The workflow outlined above is for one-off analysis. While it works (the polygonalSum values are correct), I’d like to speed it up if possible. In particular, I’m wondering:

Would following the ETL process to ingest and write GeoTrellis layers to S3 speed things up?
Are there any shortcuts I can take regarding GeoTiffs > tiledRDD > layerRDD?
Is it possible to get geoJSON properties (not just geometry) when mapping over a JsonFeatureCollection?
What colossal n00b mistakes am I making?

My ultimate goal is to store a global coverage of 0.00025 degree TIFs on s3, then tabulate polygonalSums for all GADM admin level 2 boundaries.

Thanks to all you folks for your help, and for developing such a cool tool set! Looking forward to building this into our regular workflow!

Charlie

Follow-Ups:
- Re: [geotrellis-user] simple zonal stats example
  - From: Rob Emanuele

Prev by Date: Re: [geotrellis-user] How to generate rasters from a CSV
Next by Date: Re: [geotrellis-user] simple zonal stats example
Previous by thread: [geotrellis-user] How to generate rasters from a CSV
Next by thread: Re: [geotrellis-user] simple zonal stats example
Index(es):
- Date
- Thread

Breadcrumbs