Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[technology-pmc] Fwd: DLTK: JDT generalization, type inferencing, etc

Friends -- I received this detailed response to questions John and I asked at the DLTK creation review. FYI. Best regards. -- Ward


Begin forwarded message:

From: Andrey Platov <andrey@xxxxxxxxx>
Date: December 8, 2006 12:08:50 PM PST
To: ward.cunningham@xxxxxxxxxxx
Subject: DLTK: JDT generalization, type inferencing, etc

Hello Ward,

First I all we were very pleased to meet you at Creation Review. Interesting issues were raised but unfortunately discussion was not so productive as it may be primarily because of extremely bad line quality we had. So I'd like to write some thoughts on the issues and possibly in articles later (if anythings below will be really interesting and useful to the community). Also please consider this email as quick-and-dirty writing, so I would not recheck facts, which may be obsolete already.

============= JDT Generalization and potential common base for language projects ====================

Goal of JDT generalization is to build intermediate layer which I will call LTK further (because some work on this already started in org.eclipse.ltk.* plugins), physically this could be part of platform or separate set of plugin, but this is not relevant for the goal. At the moment we cannot provide concrete proposals for a kind of language layer, however general directions seems to be visible for most language projects.

To understand requirements for this layer better we shall look at potential LTK users and explore *failure* stories also because they are very important. We can have a long talks about reasons and requirements to have LTK layer for JDT and CDT projects (I'll call these and other successful projects sharing JDT architecture xDT), but this looks like several car vendor taking about common car engine platform to optimize expenses and reach other goals... Which is definitely improtand but do not give us a whole picture.

Since we stared development of TruStudio PHP IDE in the fall of 2002 I saw evolution of some open-source Eclipse-based language IDEs, which were started from scratch by people without any experience in xDT projects (non-xDT projects). Most of them passed following steps:

Step 1 (born): trivial features implemented like syntax highlighting and code outline.
Step 2 (boyhood): code launching implemented, possibly debugging
Step 3 (youth): developers going to implement a must features for modern IDEs like code assistance, navigation, search, etc. But stuck into this... [Actually this is one of Eclipse Hidden Agenda: Eclipse (2.x) positioned as a tool for developing IDEs, newcomers definitely can implement Step 1 & 2 extremely fast with great results. After these steps passed euphoria makes newcomers to consider Eclipse as a Silver Bullet and expect other features (like code assistance) to be developed very fast too. In fact there is a big gap between efforts required to build simple IDE and full- featured IDE which could be filled by LTK] So after some failures developers understand that there is a must need powerful project- wide structural model and related services (like code index/search engine). Step 4 (stagnation): some teams trying to copy/paste JDT model and going into stagnation (PHPEclipse), some teams are going to implement model/services from scratch with near the same result (PHP IDE). Reason is the same: big gap which is not easy (cheap) to jump over.

This steps described very briefly and simplified but are important for further analysis.

Let me start with the Step 1 and looks how LTK can help projects at this step and some reasons behind:

Generally situation with syntax highlighting, and other preferences is the same: JDT copy/paste, or implementation from scratch. Non copy/paste implementation are very ugly and feature-less (like in ANTLR plugin). During JDT evolution a lot of things were pushed down to the Platform (like annotations, etc), however such push can be a little bit aggressive. For example: if Platform have a base for editors with syntax highlighting why there are no preference page implementation for it? If Platform have support for launching/ debug, which there are no model/UI for runtime (environment) configuration?

Of course this is very disputable thing to move such kind of stuff into the Platform, but intermediate layer for such stuff could be very useful in the long run. For example like in Debian Linux distribution there are stable, testing, and unstable distros - Platform, LTK, xDT projects can play the same role: xDT projects propose common things to LTK and if this thing will be found useful by other projects pushed down to the Platform, stays in LTK, or discarded depending on some criteria.

I think there are 2 big reasons to have a lot of things in the LTK layer even if they are used by 2-3 projects. The first is common look and feel. Definitely some plugin authors makes Eclipse look very ugly which is annoying, etc. The second is much more important (and also disputable) is to *force projects to follow Eclipse concepts* (under Eclipse concepts I mean xDT, WTP, and other mature project concepts). For example let's consider PHP IDE project. The project more or less copies "JRE Installations" preference page, however there is *huge* conceptual difference: in JDT end user select/configure his/her environment and IDE "adapts" to this environment (starting from language version, up to available libraries and other settings). PHP IDE consider this settings at project runtime only, which definitely violation of WYSIWYG-alike (but applied to code) concept, which makes Eclipse a great tool. So it's very likely that LTK layer with Eclipse concepts "built-in" may force developers to follow them (or at least to think about how to make things in Eclipse-way)... Note: I mentioned JDT and PHP IDE, but actually many projects follow this concept as well as many ignore.

Preliminary proposal 1: more aggressive push of common and relatively independent building blocks down to LTK layer. Best candidates: preference pages, and utility (mostly GUI-related) components. At first look this activity seems to be useless, because we're talking about trivial and relatively easy to implement things, but even a trivial syntax highlighting preference page implementation will not take less than 1000 LOCs. Environment configuration facilities will take much more because of more complex model/UI, model persistence, environment change notifications, etc... This is true for common JUnit-alike GUI for launching tests (a kind of "Tests Console") and a lot of other things.

So potentially there are 10-50 KLOCs of code could be shared among the projects and could save days or weeks of development for new ones. Also such actions will not have a big impact on xDT projects because it very likely that things like preference pages is not reused in third-party projects, moreover flexible and common framework could help third parties which would like to plug into such things. Hypothetical example: Mylar project would like to focus on tasks (bugs), which are actual only for current Java environment (Java6 for example). At the moment there are no way for Mylar to listen for project environment changes using "common protocol". Common API in LTK will resolve this problem for all current and future xDT projects. Also this will help new projects with Step 1 in scenario above.

-----------------

Step 4 (Common structural model and depending services) is the most interesting issue. In fact a lot of JDT components (I guess >50%) are built on top of structural model (IJavaElement hierarchy), which is definitively fundamental thing in JDT architecture. So sharing common model between different languages will give *huge* benefits. Just a few of them:

1) Huge simplification of IDE development (as from our experience there are no more than 25% of logic, which is actually language- dependent, > 75% of code is language-independent or purely technical stuff like caches, string utils, etc. 2) Interesting benefits like languages interoperability: for example there are no good way to "integrate" DLTK (scripting languages) with JDT and/or CDT. But of course scripting in Java or C/C++ projects is a popular thing. Even if end-user call into Java or C method and we'd love to show code completion proposals we shall to ask JDT/CDT for proposals, translate (adapt) model elements (for methods proposal) into our (DLTK's) method descriptor, etc... If end-user would like to navigate into java code from script we shall perform nearly the same procedure vice- versa, and there will be N different implementations for N languages (even if such implementations will have 100%-equal logic for JDT and CDT - will need to provide both because JDT and CDT model hierarchies are absolutely different from Java point of view). 3) Better easy and wider integration with other projects. For example Mylar at the moment have generic resource bridge and works well with any eclipse project at resource level. Common structural project model (compilation units, classes, methods) will allow such tools to work with all existing and future tools at code structure level.

We understand huge amount of problems which shall be resolved to have common base in long run. This was relatively easy for DLTK because no one external project rely on DLTK, we do not mention that any parts of our work could be moved to LTK in observable future, we do not have huge codebase, which should be supported during during generalization/decomposition work, etc. And DLTK work is looks like JDT "port" work where all platform... sorry language- specific stuff replaced with abstractions, and concrete languages stuff instantiated using extension points (sounds not ambitious but one of our main goal is to "clone" JDT for scripting languages). This experience is quite useless except understanding on practice which blocks and components are required for other languages and how JDT implementation could be decoupled.

If the goal is common layer (LTK), there are at least 2 approaches possible.

1) DLTK way described above. Advantages already described. Also such approach will give us all 3 benefits. Also it seems that this approach will reduce failures of new projects implementing IDEs because of benefit (1). Disadvantages are: - another abstraction layer we shall inherit from and all corresponding disadvantages, or huge amount of adapters/extension points, etc if the goal is to build layer flexible enough. - a lot of refactoring work required for existing projects like CDT and JDT.

2) "Opposite" approach. Let's for example start with "Minimal" common model, and let xDT's models to be adaptable to it. This common model will be minimal, but very easy to be implemented inside every xDT project. Possibly this API will be useless for most tasks but goal is just to start to use it. It should be enough for minimal needs like Mylar task described above, or simple Monkey scripts looking for code structure, etc. I have a feel that benefits (2) & (3) could be achieved not very big efforts. Next steps is decomposition of xDT cores, and there are definitely some places which could be moved out of cores like "code indexer". Having minimal common model, LTK indexer will work on top of it and common model may be improved to fulfill common index needs... And so on. This looks to be realistic approach but of course much more research and collaboration between interested projects required.

Anyway we will be happy to participate in this run when started with both proposing things to be pushed down to LTK and using/ testing LTK components... and of course development/research.

======== Type Inferencing ==========

We understood that type inferencing is a *must* thing for Dynamic Language IDE, especially if we want to have an IDE with JDT- comparable features. In Trustudio we used abstract interpreters. Running it from start of the function contained expression for which we infer type, interpreter collect type information and follow some empiric rules for performance optimization, like do not interpret statements, which most likely to be useless for further calculations (i.e. function call without any assignment, etc). This is relatively simple, but provide very good results for most cases, and was very appreciated by the end users (especially most leading IDEs have no type inference at all, or very poor implementations and less precise results).

Between TruStudio and DLTK we were thinking about to build data (types)-flow graph for the whole workspace and containing projects. The idea was to update this graph incrementally when user modifies code. Advantage is to have up-to-date type information for any part of the code, which will allow us to perform as complex analysis as we want without performance degradations during it. This idea was not implemented because (I hope fortunately, and we're correct) following reasons. First is less precise results comparing to abstract interpreter for some cases, because it is sensitive to context of expression (for which we infer type). Opposite graph will give union of results available for every execution path. Of course there are a lot of cases when graph will give more precise results because whole project covered. For example in interpreter approach we do not infer argument type of top-level function and this is a reason for many failures. Another reason to discard graph approach is that we was not sure it will be able to fit whole graph into adequate amount of memory and because a lot research required to be sure that we will not have other performance issues. Having (building) graphs on code parts makes this approach much less attractive and we can have guaranteed nearly the same or better results using improved abstract interpretation.

So finally we decided to improve abstract interpreters with DDP algorithm (http://lexspoon.org/ti)... Or improve DDP with abstract interpretation :)... DDP is perfectly correlates to our ideas about abstract interpreter improvement and not mutually exclusive (in fact DDP itself is very abstract approach), so we hope we'll get very acceptable inference quality.

Mikhail Kalugin working on TI for DLTK, and he will be happy to discuss TI issues further as well as update you with details and other ideas we are going to try for DLTK if you're interested.

Please sorry for messy thoughts and long email, hope this could be helpful for you, and I'd be glad to discuss any interested topic in details.

Kind Regards,
Andrey Platov
xored software, Inc.




Back to the top