From: Andrey Platov <andrey@xxxxxxxxx>
Date: December 8, 2006 12:08:50 PM PST
To: ward.cunningham@xxxxxxxxxxx
Subject: DLTK: JDT generalization, type inferencing, etc
Hello Ward,
First I all we were very pleased to meet you at Creation Review.
Interesting issues were raised but unfortunately discussion was not
so productive as it may be primarily because of extremely bad line
quality we had. So I'd like to write some thoughts on the issues
and possibly in articles later (if anythings below will be really
interesting and useful to the community). Also please consider this
email as quick-and-dirty writing, so I would not recheck facts,
which may be obsolete already.
============= JDT Generalization and potential common base for
language projects ====================
Goal of JDT generalization is to build intermediate layer which I
will call LTK further (because some work on this already started in
org.eclipse.ltk.* plugins), physically this could be part of
platform or separate set of plugin, but this is not relevant for
the goal. At the moment we cannot provide concrete proposals for a
kind of language layer, however general directions seems to be
visible for most language projects.
To understand requirements for this layer better we shall look at
potential LTK users and explore *failure* stories also because they
are very important. We can have a long talks about reasons and
requirements to have LTK layer for JDT and CDT projects (I'll call
these and other successful projects sharing JDT architecture xDT),
but this looks like several car vendor taking about common car
engine platform to optimize expenses and reach other goals... Which
is definitely improtand but do not give us a whole picture.
Since we stared development of TruStudio PHP IDE in the fall of
2002 I saw evolution of some open-source Eclipse-based language
IDEs, which were started from scratch by people without any
experience in xDT projects (non-xDT projects). Most of them passed
following steps:
Step 1 (born): trivial features implemented like syntax
highlighting and code outline.
Step 2 (boyhood): code launching implemented, possibly debugging
Step 3 (youth): developers going to implement a must features for
modern IDEs like code assistance, navigation, search, etc. But
stuck into this... [Actually this is one of Eclipse Hidden Agenda:
Eclipse (2.x) positioned as a tool for developing IDEs, newcomers
definitely can implement Step 1 & 2 extremely fast with great
results. After these steps passed euphoria makes newcomers to
consider Eclipse as a Silver Bullet and expect other features (like
code assistance) to be developed very fast too. In fact there is a
big gap between efforts required to build simple IDE and full-
featured IDE which could be filled by LTK] So after some failures
developers understand that there is a must need powerful project-
wide structural model and related services (like code index/search
engine).
Step 4 (stagnation): some teams trying to copy/paste JDT model and
going into stagnation (PHPEclipse), some teams are going to
implement model/services from scratch with near the same result
(PHP IDE). Reason is the same: big gap which is not easy (cheap) to
jump over.
This steps described very briefly and simplified but are important
for further analysis.
Let me start with the Step 1 and looks how LTK can help projects at
this step and some reasons behind:
Generally situation with syntax highlighting, and other preferences
is the same: JDT copy/paste, or implementation from scratch. Non
copy/paste implementation are very ugly and feature-less (like in
ANTLR plugin). During JDT evolution a lot of things were pushed
down to the Platform (like annotations, etc), however such push can
be a little bit aggressive. For example: if Platform have a base
for editors with syntax highlighting why there are no preference
page implementation for it? If Platform have support for launching/
debug, which there are no model/UI for runtime (environment)
configuration?
Of course this is very disputable thing to move such kind of stuff
into the Platform, but intermediate layer for such stuff could be
very useful in the long run. For example like in Debian Linux
distribution there are stable, testing, and unstable distros -
Platform, LTK, xDT projects can play the same role: xDT projects
propose common things to LTK and if this thing will be found useful
by other projects pushed down to the Platform, stays in LTK, or
discarded depending on some criteria.
I think there are 2 big reasons to have a lot of things in the LTK
layer even if they are used by 2-3 projects. The first is common
look and feel. Definitely some plugin authors makes Eclipse look
very ugly which is annoying, etc. The second is much more important
(and also disputable) is to *force projects to follow Eclipse
concepts* (under Eclipse concepts I mean xDT, WTP, and other mature
project concepts). For example let's consider PHP IDE project. The
project more or less copies "JRE Installations" preference page,
however there is *huge* conceptual difference: in JDT end user
select/configure his/her environment and IDE "adapts" to this
environment (starting from language version, up to available
libraries and other settings). PHP IDE consider this settings at
project runtime only, which definitely violation of WYSIWYG-alike
(but applied to code) concept, which makes Eclipse a great tool. So
it's very likely that LTK layer with Eclipse concepts "built-in"
may force developers to follow them (or at least to think about how
to make things in Eclipse-way)... Note: I mentioned JDT and PHP
IDE, but actually many projects follow this concept as well as many
ignore.
Preliminary proposal 1: more aggressive push of common and
relatively independent building blocks down to LTK layer. Best
candidates: preference pages, and utility (mostly GUI-related)
components. At first look this activity seems to be useless,
because we're talking about trivial and relatively easy to
implement things, but even a trivial syntax highlighting preference
page implementation will not take less than 1000 LOCs. Environment
configuration facilities will take much more because of more
complex model/UI, model persistence, environment change
notifications, etc... This is true for common JUnit-alike GUI for
launching tests (a kind of "Tests Console") and a lot of other things.
So potentially there are 10-50 KLOCs of code could be shared among
the projects and could save days or weeks of development for new
ones. Also such actions will not have a big impact on xDT projects
because it very likely that things like preference pages is not
reused in third-party projects, moreover flexible and common
framework could help third parties which would like to plug into
such things. Hypothetical example: Mylar project would like to
focus on tasks (bugs), which are actual only for current Java
environment (Java6 for example). At the moment there are no way for
Mylar to listen for project environment changes using "common
protocol". Common API in LTK will resolve this problem for all
current and future xDT projects. Also this will help new projects
with Step 1 in scenario above.
-----------------
Step 4 (Common structural model and depending services) is the most
interesting issue. In fact a lot of JDT components (I guess >50%)
are built on top of structural model (IJavaElement hierarchy),
which is definitively fundamental thing in JDT architecture. So
sharing common model between different languages will give *huge*
benefits. Just a few of them:
1) Huge simplification of IDE development (as from our experience
there are no more than 25% of logic, which is actually language-
dependent, > 75% of code is language-independent or purely
technical stuff like caches, string utils, etc.
2) Interesting benefits like languages interoperability: for
example there are no good way to "integrate" DLTK (scripting
languages) with JDT and/or CDT. But of course scripting in Java or
C/C++ projects is a popular thing. Even if end-user call into Java
or C method and we'd love to show code completion proposals we
shall to ask JDT/CDT for proposals, translate (adapt) model
elements (for methods proposal) into our (DLTK's) method
descriptor, etc... If end-user would like to navigate into java
code from script we shall perform nearly the same procedure vice-
versa, and there will be N different implementations for N
languages (even if such implementations will have 100%-equal logic
for JDT and CDT - will need to provide both because JDT and CDT
model hierarchies are absolutely different from Java point of view).
3) Better easy and wider integration with other projects. For
example Mylar at the moment have generic resource bridge and works
well with any eclipse project at resource level. Common structural
project model (compilation units, classes, methods) will allow such
tools to work with all existing and future tools at code structure
level.
We understand huge amount of problems which shall be resolved to
have common base in long run. This was relatively easy for DLTK
because no one external project rely on DLTK, we do not mention
that any parts of our work could be moved to LTK in observable
future, we do not have huge codebase, which should be supported
during during generalization/decomposition work, etc. And DLTK work
is looks like JDT "port" work where all platform... sorry language-
specific stuff replaced with abstractions, and concrete languages
stuff instantiated using extension points (sounds not ambitious but
one of our main goal is to "clone" JDT for scripting languages).
This experience is quite useless except understanding on practice
which blocks and components are required for other languages and
how JDT implementation could be decoupled.
If the goal is common layer (LTK), there are at least 2 approaches
possible.
1) DLTK way described above. Advantages already described. Also
such approach will give us all 3 benefits. Also it seems that this
approach will reduce failures of new projects implementing IDEs
because of benefit (1). Disadvantages are:
- another abstraction layer we shall inherit from and all
corresponding disadvantages, or huge amount of adapters/extension
points, etc if the goal is to build layer flexible enough.
- a lot of refactoring work required for existing projects like CDT
and JDT.
2) "Opposite" approach. Let's for example start with "Minimal"
common model, and let xDT's models to be adaptable to it. This
common model will be minimal, but very easy to be implemented
inside every xDT project. Possibly this API will be useless for
most tasks but goal is just to start to use it. It should be enough
for minimal needs like Mylar task described above, or simple Monkey
scripts looking for code structure, etc. I have a feel that
benefits (2) & (3) could be achieved not very big efforts. Next
steps is decomposition of xDT cores, and there are definitely some
places which could be moved out of cores like "code indexer".
Having minimal common model, LTK indexer will work on top of it and
common model may be improved to fulfill common index needs... And
so on. This looks to be realistic approach but of course much more
research and collaboration between interested projects required.
Anyway we will be happy to participate in this run when started
with both proposing things to be pushed down to LTK and using/
testing LTK components... and of course development/research.
======== Type Inferencing ==========
We understood that type inferencing is a *must* thing for Dynamic
Language IDE, especially if we want to have an IDE with JDT-
comparable features. In Trustudio we used abstract interpreters.
Running it from start of the function contained expression for
which we infer type, interpreter collect type information and
follow some empiric rules for performance optimization, like do not
interpret statements, which most likely to be useless for further
calculations (i.e. function call without any assignment, etc). This
is relatively simple, but provide very good results for most cases,
and was very appreciated by the end users (especially most leading
IDEs have no type inference at all, or very poor implementations
and less precise results).
Between TruStudio and DLTK we were thinking about to build data
(types)-flow graph for the whole workspace and containing projects.
The idea was to update this graph incrementally when user modifies
code. Advantage is to have up-to-date type information for any part
of the code, which will allow us to perform as complex analysis as
we want without performance degradations during it. This idea was
not implemented because (I hope fortunately, and we're correct)
following reasons. First is less precise results comparing to
abstract interpreter for some cases, because it is sensitive to
context of expression (for which we infer type). Opposite graph
will give union of results available for every execution path. Of
course there are a lot of cases when graph will give more precise
results because whole project covered. For example in interpreter
approach we do not infer argument type of top-level function and
this is a reason for many failures. Another reason to discard graph
approach is that we was not sure it will be able to fit whole graph
into adequate amount of memory and because a lot research required
to be sure that we will not have other performance issues. Having
(building) graphs on code parts makes this approach much less
attractive and we can have guaranteed nearly the same or better
results using improved abstract interpretation.
So finally we decided to improve abstract interpreters with DDP
algorithm (http://lexspoon.org/ti)... Or improve DDP with abstract
interpretation :)... DDP is perfectly correlates to our ideas about
abstract interpreter improvement and not mutually exclusive (in
fact DDP itself is very abstract approach), so we hope we'll get
very acceptable inference quality.
Mikhail Kalugin working on TI for DLTK, and he will be happy to
discuss TI issues further as well as update you with details and
other ideas we are going to try for DLTK if you're interested.
Please sorry for messy thoughts and long email, hope this could be
helpful for you, and I'd be glad to discuss any interested topic in
details.
Kind Regards,
Andrey Platov
xored software, Inc.