Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [udig-devel] to R or not to R

Hey all,

Do we want to do spatial stats? Is rain wet? You bet we want to interact
with R! We want to even though it's going to be hard any way we go---not
only to we need to pass big data sets back and forth, not only do we
need a real interaction with R, but we also probably need to fix R. You
still want to play?


I have spent a bit of time figuring out how to roundtrip to R. Actually,
not just to R but to the sp objects in R (these are the new standard S4
classes for spatial objects). This would mean all the new spatial
analysis functions would work (Many of the old packages may be ported
over eventually). There are however, as you all appreciate, a number of
issues.


In R itself we have hard problems. R was built intentionally around the
idea that, unlike in S, everything was going to fit into memory. For
real GIS data, that assumption doesn't hold. There are rumors of efforts
to make PostgreSQL backed objects in R and then to rewrite the sp
classes to use incremental algorithms on such data. That will be the
ultimate end goal of our work but involves lots of hacking so it doesn't
get us the cheap satisfaction we all crave.

Exchanging objects with R is another issue. There are lots of pipe dream
projects by some of the core R folk to make this easy in various ways.
Ideally, we would share exactly the same object in memory as R does but
this seems unlikely to happen any time soon. The most elegant solution I
found in my work was to exchange objects through PostGIS.

Once you exchange objects, the only issue for uDig/geotools is the
thread of control. The cheap solution is to simply have an R command
line where users can run an R session. I'm hoping the JGrass bean shell
will give us that without requiring us to live in their GRASS workspace
view of the world. The aproach on which I spent a bunch of time was
Rjava. Rjava and Rserver were headed for identical interfaces last I was
involved so perhaps they have matured to be identical. It means we could
make that strategy decision a client choice. I learned a lot and have
notes on all this somewhere but since I have been focusing on more
mundane stuff recently, I can't conjure this up right now. I know
nothing about the other alternatives that Andrea Antonello proposes.
Adding yet another language into the mix on top of R, C and Java seemed
heavy and indirect. However, I am not against it either. 

If we want to do more and digest or replace R output (ie its plot and
print methods) we have a much bigger problem on our hands. There are a
number of ways to do this badly, all of which would be less satisfying
that simply using R. Doing it well is a hard problem that would take
lots of work to understand.

I have not explored either any of the internationalization issues of
these various approaches. I've never, for example, used R in any locale
other than C. So we will need to understand how our architecture would
look in italian, french, spanish, basque...

Ultimately any solution other than providing a simple command prompt for
the user will require yet more uDig DESIGN. That's my big kick with uDig
right now: we need formal DESIGN so we can all agree on what is going on
and all our hacks can play nicely with each other. For example, GUI
driven spatial analysis will require us to keep lots of different
selections around so we will need a really clear understanding of how
multiple selections get created, maintained, and replaced by the uDig
UI.


So I think we could setup something quick by having beanshell give us an
R prompt and have objects sent into and out of a PostGIS database. For
some simple analyses, we could even automate the sequence into something
that could be run from a uDig UI. If we want to do much more than that,
things get mega complicated very quickly.

I would love to explore some of these options, perhaps at the code
sprint which I assume will have some JGrass folk attending,

--adrian



Back to the top