GeoGit in Action: Distributed Versioning for Geospatial Data

Every project working with geospatial information eventually faces the problem of managing change over time. At the center of the issue is data provenance: where the data originated, whom it belongs to, and what set of individual changes were made to a particular piece of information in order to reach its current state. While versioning approaches have existed for a while, they are cumbersome and have been a challenge in many workflows, especially those that involve more than one individual. We created GeoGit to help solve these problems.

GeoGit takes concepts and lessons learned from working with code in open source communities and applies them to managing geospatial information. GeoGit allows for decentralized management of versioned data and enables new and innovative workflows for collaboration. Users are able to track edits to geospatial information by importing raw data into repositories where they can view history, revert to older versions, branch into sandboxed areas, merge back in, and push to remote repositories.

Working with GeoGit

Once installed, a simple working session might look like this (data references are from the freely available Natural Earth collection):

1. Create a repository and import raw geospatial data (from Shapefiles and spatial databases such as PostGIS, Oracle Spatial or SQL Server):

    
mkdir repo
cd repo
geogit init
geogit shp import ne_110m_coastline.shp

  

2. Add the imported data to the staging area. This command signals that this is information to be versioned and tracked and prepares it for final insertion into the repository.

    
geogit add

  

3. Commit the information to the repository. Developers familiar with Git will appreciate the familiar API and command line options. In this case, we are passing a commit message that will be associated with this change.

    
geogit commit -m “Add coastline”

  

4. In order to make changes and collaborate with others, a typical workflow involves creating branches to isolate changes from the master branch. Creating a branch in GeoGit is as easy as issuing the following command:

    
geogit branch branch1

  

This creates a new branch called branch1 where all commits will go to until another branch is chosen. Branching is an important concept in GeoGit as it enables editors of geospatial content to modify information without worrying about interfering with the quality of the main version, usually stored in the master branch.

5. When changes are ready to be brought back into the main version, they can be merged into another branch using the merge command.

    
geogit checkout master (switches to the master branch)
geogit merge edits

  

Upon merging, if conflicts are detected (for example, two users independently modify the same geometry with different outcomes) a merge conflict is returned and a commit cannot happen until the conflict is resolved. This is an important feature that prevents geospatial data corruption and enforces workflows that involve data quality assurance.

Voila!

Anyone familiar with tools like Git, which handles distributed version control for source code, will immediately see the advantages this approach brings.

GeoGit is an open source project based on the Java platform and is developed by committers across several organizations. It has recently been submitted as a project of the LocationTech working group within the Eclipse Foundation. GeoGit has also been designed to be extensible, and there is already a Python wrapper library that make these operations easier and enables automation.

To learn more, visit http://geogit.org/ or watch a full presentation on how we're redefining geospatial data versioning with GeoGit below.

About the Authors

Juan Marin

Juan Marin
Boundless