[technology-pmc] Fwd: DLTK: JDT generalization, type inferencing, etc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[technology-pmc] Fwd: DLTK: JDT generalization, type inferencing, etc

From: Ward Cunningham <ward.cunningham@xxxxxxxxxxx>
Date: Tue, 12 Dec 2006 08:03:48 -0800
Delivered-to: technology-pmc@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/listinfo/technology-pmc>
List-help: <mailto:technology-pmc-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/technology-pmc>, <mailto:technology-pmc-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/technology-pmc>, <mailto:technology-pmc-request@eclipse.org?subject=unsubscribe>

Friends -- I received this detailed response to questions John and Iasked at the DLTK creation review. FYI. Best regards. -- Ward



Begin forwarded message:

From: Andrey Platov <andrey@xxxxxxxxx>
Date: December 8, 2006 12:08:50 PM PST
To: ward.cunningham@xxxxxxxxxxx
Subject: DLTK: JDT generalization, type inferencing, etc

Hello Ward,
First I all we were very pleased to meet you at Creation Review.Interesting issues were raised but unfortunately discussion was notso productive as it may be primarily because of extremely bad linequality we had. So I'd like to write some thoughts on the issuesand possibly in articles later (if anythings below will be reallyinteresting and useful to the community). Also please consider thisemail as quick-and-dirty writing, so I would not recheck facts,which may be obsolete already.
============= JDT Generalization and potential common base forlanguage projects ====================
Goal of JDT generalization is to build intermediate layer which Iwill call LTK further (because some work on this already started inorg.eclipse.ltk.* plugins), physically this could be part ofplatform or separate set of plugin, but this is not relevant forthe goal. At the moment we cannot provide concrete proposals for akind of language layer, however general directions seems to bevisible for most language projects.
To understand requirements for this layer better we shall look atpotential LTK users and explore *failure* stories also because theyare very important. We can have a long talks about reasons andrequirements to have LTK layer for JDT and CDT projects (I'll callthese and other successful projects sharing JDT architecture xDT),but this looks like several car vendor taking about common carengine platform to optimize expenses and reach other goals... Whichis definitely improtand but do not give us a whole picture.
Since we stared development of TruStudio PHP IDE in the fall of2002 I saw evolution of some open-source Eclipse-based languageIDEs, which were started from scratch by people without anyexperience in xDT projects (non-xDT projects). Most of them passedfollowing steps:
Step 1 (born): trivial features implemented like syntaxhighlighting and code outline.
Step 2 (boyhood): code launching implemented, possibly debugging
Step 3 (youth): developers going to implement a must features formodern IDEs like code assistance, navigation, search, etc. Butstuck into this... [Actually this is one of Eclipse Hidden Agenda:Eclipse (2.x) positioned as a tool for developing IDEs, newcomersdefinitely can implement Step 1 & 2 extremely fast with greatresults. After these steps passed euphoria makes newcomers toconsider Eclipse as a Silver Bullet and expect other features (likecode assistance) to be developed very fast too. In fact there is abig gap between efforts required to build simple IDE and full-featured IDE which could be filled by LTK] So after some failuresdevelopers understand that there is a must need powerful project-wide structural model and related services (like code index/searchengine).Step 4 (stagnation): some teams trying to copy/paste JDT model andgoing into stagnation (PHPEclipse), some teams are going toimplement model/services from scratch with near the same result(PHP IDE). Reason is the same: big gap which is not easy (cheap) tojump over.
This steps described very briefly and simplified but are importantfor further analysis.
Let me start with the Step 1 and looks how LTK can help projects atthis step and some reasons behind:
Generally situation with syntax highlighting, and other preferencesis the same: JDT copy/paste, or implementation from scratch. Noncopy/paste implementation are very ugly and feature-less (like inANTLR plugin). During JDT evolution a lot of things were pusheddown to the Platform (like annotations, etc), however such push canbe a little bit aggressive. For example: if Platform have a basefor editors with syntax highlighting why there are no preferencepage implementation for it? If Platform have support for launching/debug, which there are no model/UI for runtime (environment)configuration?
Of course this is very disputable thing to move such kind of stuffinto the Platform, but intermediate layer for such stuff could bevery useful in the long run. For example like in Debian Linuxdistribution there are stable, testing, and unstable distros -Platform, LTK, xDT projects can play the same role: xDT projectspropose common things to LTK and if this thing will be found usefulby other projects pushed down to the Platform, stays in LTK, ordiscarded depending on some criteria.
I think there are 2 big reasons to have a lot of things in the LTKlayer even if they are used by 2-3 projects. The first is commonlook and feel. Definitely some plugin authors makes Eclipse lookvery ugly which is annoying, etc. The second is much more important(and also disputable) is to *force projects to follow Eclipseconcepts* (under Eclipse concepts I mean xDT, WTP, and other matureproject concepts). For example let's consider PHP IDE project. Theproject more or less copies "JRE Installations" preference page,however there is *huge* conceptual difference: in JDT end userselect/configure his/her environment and IDE "adapts" to thisenvironment (starting from language version, up to availablelibraries and other settings). PHP IDE consider this settings atproject runtime only, which definitely violation of WYSIWYG-alike(but applied to code) concept, which makes Eclipse a great tool. Soit's very likely that LTK layer with Eclipse concepts "built-in"may force developers to follow them (or at least to think about howto make things in Eclipse-way)... Note: I mentioned JDT and PHPIDE, but actually many projects follow this concept as well as manyignore.
Preliminary proposal 1: more aggressive push of common andrelatively independent building blocks down to LTK layer. Bestcandidates: preference pages, and utility (mostly GUI-related)components. At first look this activity seems to be useless,because we're talking about trivial and relatively easy toimplement things, but even a trivial syntax highlighting preferencepage implementation will not take less than 1000 LOCs. Environmentconfiguration facilities will take much more because of morecomplex model/UI, model persistence, environment changenotifications, etc... This is true for common JUnit-alike GUI forlaunching tests (a kind of "Tests Console") and a lot of other things.
So potentially there are 10-50 KLOCs of code could be shared amongthe projects and could save days or weeks of development for newones. Also such actions will not have a big impact on xDT projectsbecause it very likely that things like preference pages is notreused in third-party projects, moreover flexible and commonframework could help third parties which would like to plug intosuch things. Hypothetical example: Mylar project would like tofocus on tasks (bugs), which are actual only for current Javaenvironment (Java6 for example). At the moment there are no way forMylar to listen for project environment changes using "commonprotocol". Common API in LTK will resolve this problem for allcurrent and future xDT projects. Also this will help new projectswith Step 1 in scenario above.
-----------------
Step 4 (Common structural model and depending services) is the mostinteresting issue. In fact a lot of JDT components (I guess >50%)are built on top of structural model (IJavaElement hierarchy),which is definitively fundamental thing in JDT architecture. Sosharing common model between different languages will give *huge*benefits. Just a few of them:
1) Huge simplification of IDE development (as from our experiencethere are no more than 25% of logic, which is actually language-dependent, > 75% of code is language-independent or purelytechnical stuff like caches, string utils, etc.2) Interesting benefits like languages interoperability: forexample there are no good way to "integrate" DLTK (scriptinglanguages) with JDT and/or CDT. But of course scripting in Java orC/C++ projects is a popular thing. Even if end-user call into Javaor C method and we'd love to show code completion proposals weshall to ask JDT/CDT for proposals, translate (adapt) modelelements (for methods proposal) into our (DLTK's) methoddescriptor, etc... If end-user would like to navigate into javacode from script we shall perform nearly the same procedure vice-versa, and there will be N different implementations for Nlanguages (even if such implementations will have 100%-equal logicfor JDT and CDT - will need to provide both because JDT and CDTmodel hierarchies are absolutely different from Java point of view).3) Better easy and wider integration with other projects. Forexample Mylar at the moment have generic resource bridge and workswell with any eclipse project at resource level. Common structuralproject model (compilation units, classes, methods) will allow suchtools to work with all existing and future tools at code structurelevel.
We understand huge amount of problems which shall be resolved tohave common base in long run. This was relatively easy for DLTKbecause no one external project rely on DLTK, we do not mentionthat any parts of our work could be moved to LTK in observablefuture, we do not have huge codebase, which should be supportedduring during generalization/decomposition work, etc. And DLTK workis looks like JDT "port" work where all platform... sorry language-specific stuff replaced with abstractions, and concrete languagesstuff instantiated using extension points (sounds not ambitious butone of our main goal is to "clone" JDT for scripting languages).This experience is quite useless except understanding on practicewhich blocks and components are required for other languages andhow JDT implementation could be decoupled.
If the goal is common layer (LTK), there are at least 2 approachespossible.
1) DLTK way described above. Advantages already described. Alsosuch approach will give us all 3 benefits. Also it seems that thisapproach will reduce failures of new projects implementing IDEsbecause of benefit (1). Disadvantages are:- another abstraction layer we shall inherit from and allcorresponding disadvantages, or huge amount of adapters/extensionpoints, etc if the goal is to build layer flexible enough.- a lot of refactoring work required for existing projects like CDTand JDT.
2) "Opposite" approach. Let's for example start with "Minimal"common model, and let xDT's models to be adaptable to it. Thiscommon model will be minimal, but very easy to be implementedinside every xDT project. Possibly this API will be useless formost tasks but goal is just to start to use it. It should be enoughfor minimal needs like Mylar task described above, or simple Monkeyscripts looking for code structure, etc. I have a feel thatbenefits (2) & (3) could be achieved not very big efforts. Nextsteps is decomposition of xDT cores, and there are definitely someplaces which could be moved out of cores like "code indexer".Having minimal common model, LTK indexer will work on top of it andcommon model may be improved to fulfill common index needs... Andso on. This looks to be realistic approach but of course much moreresearch and collaboration between interested projects required.
Anyway we will be happy to participate in this run when startedwith both proposing things to be pushed down to LTK and using/testing LTK components... and of course development/research.
======== Type Inferencing ==========
We understood that type inferencing is a *must* thing for DynamicLanguage IDE, especially if we want to have an IDE with JDT-comparable features. In Trustudio we used abstract interpreters.Running it from start of the function contained expression forwhich we infer type, interpreter collect type information andfollow some empiric rules for performance optimization, like do notinterpret statements, which most likely to be useless for furthercalculations (i.e. function call without any assignment, etc). Thisis relatively simple, but provide very good results for most cases,and was very appreciated by the end users (especially most leadingIDEs have no type inference at all, or very poor implementationsand less precise results).
Between TruStudio and DLTK we were thinking about to build data(types)-flow graph for the whole workspace and containing projects.The idea was to update this graph incrementally when user modifiescode. Advantage is to have up-to-date type information for any partof the code, which will allow us to perform as complex analysis aswe want without performance degradations during it. This idea wasnot implemented because (I hope fortunately, and we're correct)following reasons. First is less precise results comparing toabstract interpreter for some cases, because it is sensitive tocontext of expression (for which we infer type). Opposite graphwill give union of results available for every execution path. Ofcourse there are a lot of cases when graph will give more preciseresults because whole project covered. For example in interpreterapproach we do not infer argument type of top-level function andthis is a reason for many failures. Another reason to discard graphapproach is that we was not sure it will be able to fit whole graphinto adequate amount of memory and because a lot research requiredto be sure that we will not have other performance issues. Having(building) graphs on code parts makes this approach much lessattractive and we can have guaranteed nearly the same or betterresults using improved abstract interpretation.
So finally we decided to improve abstract interpreters with DDPalgorithm (http://lexspoon.org/ti)... Or improve DDP with abstractinterpretation :)... DDP is perfectly correlates to our ideas aboutabstract interpreter improvement and not mutually exclusive (infact DDP itself is very abstract approach), so we hope we'll getvery acceptable inference quality.
Mikhail Kalugin working on TI for DLTK, and he will be happy todiscuss TI issues further as well as update you with details andother ideas we are going to try for DLTK if you're interested.
Please sorry for messy thoughts and long email, hope this could behelpful for you, and I'd be glad to discuss any interested topic indetails.
Kind Regards,
Andrey Platov
xored software, Inc.

Prev by Date: [technology-pmc] eclipse long talks
Next by Date: [technology-pmc] Missed the call
Previous by thread: [technology-pmc] eclipse long talks
Next by thread: [technology-pmc] Minutes posted; initial cut of EclipseCon long talks decisions
Index(es):
- Date
- Thread

Breadcrumbs