A parallel tree for comments would be
fine.
Regards
Jonathan Gossage
----- Original Message -----
Sent: Tuesday, December 17, 2002 11:43
AM
Subject: RE: [jdt-core-dev] leading and
trailing comments and whitespace for AST/DOM nodes
Here is the refactoring
view:
Currently we don't handle
comments in a special way. This means that in some cases we loose comments
or leave comments at the original
locaction when we move or somehow else modify source code. So we would
highly benefit from a general comment story
provided by the AST. Whitespace are less important for us since
we have a solution to preserve formatting
information (e.g. whitespace and comments) when modifying and rewriting an AST.
The following items are important for us:
- The rules source positions adhere to (e.g.
parent.start < children[0].start, .....) should not be broken or modified
when introducing support
for comments.
- simply adding the
comments to the source range of an ASTNode will break your current
refactorings. They assume that
[getStartPosition(), getStartPosition() - getLength() - 1] cover only the
statement relevant characters and no preceding or trailing comments. IMO changing this would break
the spec of the ASTNodes.
Introducing composite nodes as suggested by Jonathan would IMO lead to
problems where subnodes assume a special kind of child nodes. Consider a VariableDeclarationStatement which
contains a list of VariableDeclarationFragement. If we introduce a special node, what are the subnodes
of the declaration statement if we have preceding or trailing comments for declaration fragments ?
So instead of merging comments into the
existing AST we could build a "comment tree" and provide methods to
connect ASTNodes to comments and vice
versa. We could built the special comment tree on request and that tree should
handle all the cases described in
Jonathan's mail.
Dirk
| "Jonathan Gossage"
<jgossage@xxxxxxxx> Sent by: jdt-core-dev-admin@xxxxxxxxxxx
12/17/2002 10:47 AM Please respond to jdt-core-dev
| To:
<jdt-core-dev@xxxxxxxxxxx> cc:
Subject: RE: [jdt-core-dev]
leading and trailing comments and whitespace for AST/DOM
nodes |
>
>-----Original Message----- > >From:
jdt-core-dev-admin@xxxxxxxxxxx >
>[mailto:jdt-core-dev-admin@xxxxxxxxxxx]On Behalf Of >
>Jim_des_Rivieres@xxxxxxxxxx > >Sent: December 16, 2002 5:36
PM > >To: jdt-ui-dev@xxxxxxxxxxx; jdt-core-dev@xxxxxxxxxxx >
>Subject: [jdt-core-dev] leading and trailing comments and whitespace
for > >AST/DOM nodes > > > > > >We're
trying to decide how to deal with leading and trailing whitespace >
>and comments for AST/DOM nodes, and we need your input. >
> > >ref:
http://bugs.eclipse.org/bugs/show_bug.cgi?id=28268 > > >
>Summary of where we are right now: > >- source range extends from
1st character of 1st real token through > >last character of last
real token matched by grammar rule for node type; > >leading
whitespace and comments, and trailing whitespace, comments are > >NOT
INCLUDED in source range with the exception of Javadoc comments > >-
for BodyDeclarations, the Javadoc comment is treated like a token and >
>is represented by a Javadoc node > >-
Statement.get/setLeadingComment allows for a single comment before >
>the statement; however, AST.parseCompilationUnit has never
associated > >leading comments with any statement nodes it
creates > > > >So where do we go from here? At the very
least, we should > >(1) clarify this contract in the API spec >
>(2) delete (deprecate) Statement.get/setLeadingComment >
> > >This would at least gives us a minimal, consistent, approach
for > >leading and trailing comments. The question is: is it
worthwhile doing > >more? > >The general approach to date
has been that AST/DOM clients interested in > >finer-grained lexical
issues should rescan the source in the vicinity of > >the construct
to find what they're interesting in. This is reasonably >
>straightforward, given org.eclipse.jdt.core.compiler.IScanner >
>and accurate > >and consistent source ranges for all nodes and
ancestors > > > >We could add a second, "extended" source
range to certain node types > >like statements, body declarations,
import declarations, and package > >declarations that would >
>"round up" to the "natural" source line boundary to better align >
>with what > >a human author would > >consider the source
range for a construct. > > > >Q: If the API had this, would
clients use it (given that many of them > >already > >have
their own scanners and have to do this sort of thing other places >
>too)? If yes, what are > >"natural" boundaries that this API
should recognize? > > > >Your input would be
appreciated. > > > >Thanks, > >jeem >
> Ideally, I would like to see all comments and other whitespace
accounted for in the AST. This becomes important if you want to provide a
renderer that can take an AST and produce human readable source code.
Specifically I would like to see the following kinds of nodes for dealing
with whitespace.
1. A node that describes a consecutive run of
whitespace characters including runs of a single character. This would
always be a leaf node. This node would be ignored by compilers and tools
that are only interested in the Java content. 2. A node that describes
for a multi-line comment (i.e. /* */). This would also be a leaf node.
Again this node would be ignored by tools that are not interested. 3. A
node describing a single line comment (i.e // ...). This would be a leaf
node and would be ignored by tools that are not interested. 4. A composite
node that would be a parent to any node with associated comments. The
children would be the associated comment and white space nodes and the Java
node. To me the following rules for associating comments and white space
with Java constructs make sense.
a) Recognize and preserve a file
comment block at the start of a compilation unit. This could take the form
of a single multi-line comment or it could be a consecutive range of single
line comments and blank lines. This should result in a composite node with
one or more comment and whitespace nodes under it. b) Recognize any
consecutive run of blank lines, multi-line comments or single line comments
as a comment block to be attached to the following java node type using the
list of types you specified with the addition of field declarations. Here
there would be a composite node with the whitespace/comment nodes and the
Java node underneath. c) If a single or multi-line comment is found
immediately following, on the same line, any of the nodes defined above,
collect it and all single line comments and whitespace that immediately
follow. The intent is to deal with constructs such as the
following:
{...} // comment
// comment2
or statement; // comment
// comment2
or statement; /*
comment1 comment2
*/
d) Multiline or single line comments embedded within a statement
should simply generate a comment node without any composite node. For
example invoke( a, // comment 1
b /* comment2 */ ); would simply generate two comment nodes plus the
associated whitespace nodes.
Since there could be a substantial
space penalty in generating these nodes, consideration should be given to
making the generation optional. This would allow tools that need this kind
of information to have it without penalizing conventional tool
users.
The composite node I mentioned above is a specialized instance
of a more general capability that I would like to see in the AST. I would
like to introduce the concept of a intermediate node that could be used by
tools that generate source code fragments. Such a node would have two
sub-trees under it, one consisting of nodes that only have meaning to a
specific tool, and the other that contains the generated code as an AST
fragment. This type of node would allow tools to present constructs at a
high conceptual level to the developer while also giving compilers etc.
direct access to the generated source code.
Regards
Jonathan
Gossage
_______________________________________________ jdt-core-dev
mailing
list jdt-core-dev@xxxxxxxxxxx http://dev.eclipse.org/mailman/listinfo/jdt-core-dev
|