Re: [Dltk-dev] AST Discussion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [Dltk-dev] AST Discussion
From: Andrei Sobolev <andrei.sobolev@xxxxxxxxx>
Date: Sun, 4 May 2008 23:59:28 +0700 (NOVST)
Delivered-to: dltk-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/dltk-dev>
List-help: <mailto:dltk-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/dltk-dev>, <mailto:dltk-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/dltk-dev>, <mailto:dltk-dev-request@eclipse.org?subject=unsubscribe>
Hi Mark,

By functionality:
1) Indexing/Search.
Indexing uses structure model functionality to build map of rererences, type declarations, etc to source modules.
Search uses some core AST elements, like ModuleDeclaration, TypeDeclaration, etc. To map founded from index element,(declaration, reference) to existed model element, then model element are shown to user.
Open type functionality: Used index entries and build model elements without using of AST hierarchy.

Using of AST in search could be easy moved to language specific layer. And I suppose we should do it in near future.

2) Syntax highlighting/folding
There is no need of ast at all. Only one point, for advanced syntax highlighting using of AST will provide less code.
For folding we have basic AST based implementation. This implementation uses the same set of core AST nodes, TypeDeclaration, MethodDeclaration, etc.

3) Code assistance.
For all code assistance, eg. Code selection/Code completion. Entry point it ISourceModule and position in file.
Then language specific part are work to report model elements.

Basic index, search model is not enough for correct code assistance. For example for Tcl, Ruby classes, namespaces could be in some set of modules. And for some selection/completions we need to parser about 80 modules. Mixin model uses generic index, which is filled from "Script builder" process. Mixin model contain all parts of composite classes, namespaces and provide more fast search times.

Mixin model is very easy thing:
Mixin model is tree of keys with associated objects.

Mixin model are implemented using lazy building/loading with one one rule, if we ask for some key then all sub keys will be loaded/builded.
To obtain keys information it uses index/search. Script builder report only keys. Then model are builded it is possibly to report any kind of object and associate it with specific key.

Mixin model is only first step for correct runtime model implementation. We think about creating generic runtime model infrastructure.

Also it will be very interesting if we could separate some low level frameworks, like indexing/basic search from high level frameworks line runtime model, inferencing etc.

Current situation in Tcl, Ruby are following:
1) Each implementation provide its own set of AST nodes. This is language specific layer.
2) Each implementation has its own mixin model with different set of language specific search elements.
3) Each implementation has its own completion/selection engine. We use some abstract classes based on core AST elements.

I suppose it is not very difficult to remove using of core AST from Ruby, Tcl implementations.

As for generic AST etc model.
Some time ago I've think about creating generic model for scripting languages. Each language will map all elements to this model elements. And then will be possibly to execute some algorithms like selection/code completion and receive some results, then map this language results back to main language.
Language should be more generic and include most of possibly scenarios, for different cases. I've not yet sure what it is possibly for each language, but for some kind of languages I suppose it is.

Best regards,
Andrei Sobolev.
----- Original Message -----
From: "Mark Howe" <Mark.Howe@xxxxxxxxxxxx>
To: "DLTK Developer Discussions" <dltk-dev@xxxxxxxxxxx>
Sent: Friday, May 2, 2008 6:22:18 AM GMT +06:00 Almaty, Novosibirsk
Subject: RE: [Dltk-dev] AST Discussion

Andrey, can you explain how functionality (i.e. search, syntax highlighting, code completion, inferencing etc) is currently split up, ie what is provided by IModelElement and ASTNode vs language specific models (eg for Ruby, the mixin builder and it's model)? Another way of asking is what clients depend on the generic model and AST and what clients have to depend on language-specific layers?

Mark

-----Original Message-----
From: dltk-dev-bounces@xxxxxxxxxxx [mailto:dltk-dev-bounces@xxxxxxxxxxx] On Behalf Of Andrey Platov
Sent: Tuesday, April 29, 2008 5:43 AM
To: DLTK Developer Discussions
Cc: DLTK Developer Discussions
Subject: Re: [Dltk-dev] AST Discussion

Hi folks,

Let me please try to summarize AST thoughts we have for now:

- Adopters (language implementors) want to be as flexible as possible with nodes hierarchy.

- Core AST shall add some value to language implementation - without value it's better to abandon Core AST in favour of language-specific ones.

There are at least 3 observalbe fields where we can try to find a value of Core AST:

1) Core services. As Andrei S. mentioned there are a few services built on top of Core AST, however language implementors may provide own implementation of those services on top of custom AST with minimal efforts.

2) AST rewrite. At this moment I can't say if Core AST can add some value to concrete AST rewriters - I hope we'll know about this later. And we're anxiously waiting for Zend folks to see their initial implementation for AST Rewrite/PHP.

3) AST persistency. This looks like a place where Core AST framework can add great value for some languages/environments.

Problem: popular Ruby and TCL frameworks may include hundreeds of files. Specific of such languages may require IDE to parse most of source modules from those frameworks for simple operations (e.g. code assist - remember that Ruby class built from 70+ sources ;). Having 30-50ms time to parse a module in average would not save us from long running operations (1000 files parse may take up to 5 minutes).

So persistent AST looks a kind of unversal solution to performance problems. Of course AST shall be complete enough to fulfil other service requirements (e.g. Source Element parser shall be able to build structural model from persistent AST as well as other services can work without accessing source code).

Current idea is to employ EMF for AST persistency. So language implementors will be able to build AST tree from EMF objects of any kind, and provide hierarchies, which reflects target language best. Services may enforce additional requirements on hiearchy, but with EMF we can be much flexible:

//tell service which class reflect Statement in my language FoldingService(MyASTPackage.eINSTANCE.getMyStatementNode()); //assume ctor accept EClass describing statements in the language-specific AST

Also we'll be able to persist AST's of any kind including ones annotated with language-specific information (virtually any EObject).

So most value I see now from Core AST is persistent services for ASTs of any kind. Please share your thoughts.

Kind Regards,
Andrey


----- Original Message -----
From: "Andrei Sobolev" <andrei.sobolev@xxxxxxxxx>
To: "DLTK Developer Discussions" <dltk-dev@xxxxxxxxxxx>
Sent: Tuesday, April 29, 2008 1:33:56 PM GMT +06:00 Almaty, Novosibirsk
Subject: Re: [Dltk-dev] AST Discussion

Hi all,

Current DLTK mostly used as API for some core DLTK functionality such as search.
My opinion is to separate it from such places and make special structures as in ISourceElementParser for structure model creation.
This allow us some extra space in AST modifications. And allow to make some separate sub frameworks, like search framework, ast framework, etc.

For Remote functionality we need to implement feature named "offline indexing".
We need some utility to index source code and create special files on remote systems (for interpreter libraries, etc). Then if DLTK find such index information from it will be used to build structure model, search, etc. Also we plan to make such indexes for interpreter libraries and store them in metadata. This will give great performance benefit for remote projects, and for some search, completion operations. We plan to store AST's in what index.

To solve this requirement we need a functionality to save and load AST trees.
In current implementation it will be very difficult, and will require some work for each language, because we have different AST trees.

We think about make EMF based AST tree. This will allow easy persistence, and some other benefits.

Best regards,
Andrei Sobolev.

> My current use of the existing DLTK structures is somewhat limited,
> coercing rules and attributes both as MethodDeclarations and the
> grammar statement as a TypeDeclaration.
>
> Thanks,
> Gerald
>
> At 11:39 AM 4/28/2008, Mark Howe wrote:
>> Content-Language: en-US
>> Content-Type: multipart/alternative;
>>
>> boundary="_000_6355D410F100AC49AF5FB137855762B03636EB06cgmb01codegearn_"
>>
>> That is the intent, hopefully it's possible. Do you use AST now?
>>
>> Thanks
>> Mark
>>
>> ------------------------------------------------------------------------
>>     From: dltk-dev-bounces@xxxxxxxxxxx [
>>     mailto:dltk-dev-bounces@xxxxxxxxxxx] On Behalf Of Gerald Rosenberg
>>     Sent: Tuesday, April 22, 2008 5:24 PM
>>     To: DLTK Developer Discussions
>>     Subject: Re: [Dltk-dev] AST Discussion
>>
>>     Is the intent to generalize the AST structure enough to handle a
>>     'language' such as Antlr?
>>
>>     Formally, an Antlr module is composed of a grammar statement,
>>     globally scoped attributes, rules, and rule scoped attributes.
>>     While not exact, in general an attribute can be treated as an
>>     expression and a rule as a statement.  The requirements for
>>     rewriting (refactoring?) and formatting will be different from
>>     classical expressions and statements, but hopefully within the
>>     scope of the new DLTK abstractions.
>>
>>     Happy to help flush out the requirements.
>>
>>     Best,
>>     Gerald
>>
>>
>>
>>     At 04:10 PM 4/22/2008, Mark Howe wrote:
>>>         Content-Language: en-US
>>>         Content-Type: multipart/alternative;
>>>
>>>         boundary="_000_6355D410F100AC49AF5FB137855762B03636E307cgmb01codegearn_"
>>>
>>>         Andrey, Andrei and I have had some discussion about the need
>>>         for a rewriter for DLTK. The time frame is probably after
>>>         the release of 1.0 this summer. However, prior to 1.0 and
>>>         starting the rewriter we should discuss changes we may want
>>>         to make the AST.
>>>
>>>         My reasons for suggesting changes to the AST are:
>>>
>>>         We should avoid having to work in multiple AST's on DLTK.
>>>         With a careful design we should be able to the use the
>>>         generic AST for the rewriter and formatting. This is
>>>         important to avoid duplication of work among different
>>>         languages. That won't preclude languages from using a
>>>         dedicated AST.
>>>
>>>         I have some suggestions to kick start the discussion.
>>>
>>>         Generalize the ASTNode hierachy
>>>
>>>         Generalize the ASTNode hierarchy so it better fits all
>>>         dynamic languages. Various languages have different notions
>>>         of what an 'expression' and a 'statement' are. I suggest
>>>         removing Expression and Statement from the ASTNode hierarchy
>>>         (i.e. flattening the hierchy). Instead have a property on
>>>         ASTNode which returns whether it is a statement or an
>>>         expression. For instance a field declaration is an
>>>         expression in Ruby (in fact a method declaration is an
>>>         expression, although it returns a null) but is currently a
>>>         Statement -> Declaration -> FieldDeclaration.
>>>
>>>         Modify the ASTVisitor to support the flattened hierarchy,
>>>         currently it has
>>>
>>>         visit(Expression ..) visit(Statement..)
>>>         visit(MethodDeclaration... visit(ModuleDeclaration and
>>>         visit(TypeDeclaration...
>>>
>>>         change to something like
>>>
>>>         visitExpression(ASTNode.. visitStatement(ASTNode etc
>>>
>>>         and each node would have to call the appropriate visit
>>>         method. AST's would probably have to be created from
>>>         factories so they can be configured for each language (ie
>>>         whether an type of node is a statement or expression).
>>>
>>>         Comments, other suggestions?
>>>
>>>         Mark
>>>         _______________________________________________
>>>         dltk-dev mailing list
>>>         dltk-dev@xxxxxxxxxxx
>>>         https://dev.eclipse.org/mailman/listinfo/dltk-dev
>>
>> _______________________________________________
>> dltk-dev mailing list
>> dltk-dev@xxxxxxxxxxx
>> https://dev.eclipse.org/mailman/listinfo/dltk-dev
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> dltk-dev mailing list
> dltk-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/dltk-dev
>

_______________________________________________
dltk-dev mailing list
dltk-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/dltk-dev
_______________________________________________
dltk-dev mailing list
dltk-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/dltk-dev
_______________________________________________
dltk-dev mailing list
dltk-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/dltk-dev
Prev by Date: Re: [Dltk-dev] a display console (instead of display view)
Next by Date: [Dltk-dev] Project meta data is out of date for technology.dltk
Previous by thread: RE: [Dltk-dev] AST Discussion
Next by thread: [Dltk-dev] Can the updated classpaths and .settings dir (with compiler settings) be checked in?
Index(es):
- Date
- Thread
Breadcrumbs