Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[geotrellis-user] simple zonal stats example

Hi list,

 

I’m trying to run polygonalSum for a variety of polygons on a 10x10 degree float raster. I’ve forked the geotrellis-landsat-tutorial and put together some code here:

https://github.com/wri/geotrellis-zonal-stats. I’m very new to Scala and even newer to GeoTrellis, so any help on code/style/convention is appreciated.

 

My code in ZonalStats.scala does the following:

 

  1. Reads a bunch of 256 x 256 raster tiles from s3 using S3GeoTiffRDD
  2. Converts this to a tiledRDD and then to a layerRDD
  3. Reads a geojson file to get the geometries from it
  4. Maps over the geometries (115 in total), calculating polygonalSumDouble for each one

 

I’m running this on EMR using a yarn-managed cluster of m3.xlarge machines— this takes about half an hour. To package the code, I execute:

./sbt "project geotrellis-zonal-stats" assembly

 

And then to run it:

spark-submit --class tutorial.ZonalStats target/scala-2.11/demo-assembly-0.2.0.jar --master yarn --executor-memory 15g

 

Most of the GeoTrellis examples deal with making web services for tiled maps or on-the-fly geoprocessing. The workflow outlined above is for one-off analysis. While it works (the polygonalSum values are correct), I’d like to speed it up if possible. In particular, I’m wondering:

 

  1. Would following the ETL process to ingest and write GeoTrellis layers to S3 speed things up?
  2. Are there any shortcuts I can take regarding GeoTiffs > tiledRDD > layerRDD?
  3. Is it possible to get geoJSON properties (not just geometry) when mapping over a JsonFeatureCollection?
  4. What colossal n00b mistakes am I making?

 

My ultimate goal is to store a global coverage of 0.00025 degree TIFs on s3, then tabulate polygonalSums for all GADM admin level 2 boundaries.

 

Thanks to all you folks for your help, and for developing such a cool tool set! Looking forward to building this into our regular workflow!

 

Charlie

 


Back to the top