Hi,

Here are my thoughts regarding positions and AST. The major goal was to come up with a consistent view, even though this causes some more work for refactoring and for the implementation of the new AST. But I think it might be worth if we can come up with a consistent story for positions.

Some general statements

  1. sourceStart and sourceEnd should always cover the whole node. This is different to the current implementation where for some nodes declarationSourceStart and declararionSourcEnd covers the whole node and sourceStart and sourceEnd only covers the name (examples are: LocalVariableDeclaration, TypeDeclararion, ...)
  2. sourceStart and sourceEnd should also cover all subnodes
  3. whenever possible we should follow the grammar as defined in The Java Language Specification book. So if the grammar says that a production includes the semicolon then the AST node should include it too. For example the grammar defines a return statement like

  4.     return (expression) ;
    So the corresponding AST node should include the ;

Some statements from earlier discussions (mainly between Jim, Philippe, and me)

Open issues

Multiple local declarations

Currently multiple local declarations appear in the AST as n separate local declarations without any relationship to each others. This raises various questions:

Since the new AST isn't a 1:1 mapping of the compiler's AST anyway (we have the ExpressionStatement node) I opt to introduce new nodes as defined in the grammar. Since the semicolon doesn't belong to the variable declaration, it should be managed by the parent node that ties together multiple declarations. Here is an example:

int x= 10, x[]= null, i;

LocalVariableDeclaration node manages:
    the type (e.g. int)
    the positions of the commas (if needed)
    the actual variable declarators
    sourceStart= start of the type
    sourceEnd= ;

VariableDeclarator node manages:
    the variable name and its positions
    the initialization
    sourceStart= start of variable name
    sourceEnd= end of initialization. Doesn't include the comma.

If we want to do some optimization we could also have a node SingleLocalVariableDeclaration for declaration like int x; or int y= 10; The node would have the following fields:
    the type
    the variable name and its positions
    the initialization
    sourceStart= start of type
    sourceEnd= ;

Updates in for statements

Analogous to the local variable declaration, the comma to separate the update expressions can not be part of the expression (expressions don't contain a semicolon so they can't contain a comma either). To know the positions of the commas the for statement should manage them in a separate array.

The general rule is, that whenever language elements are separate using a comma (for example an interface list in the implements statement, arguments of a method declaration, ...) the node containing the separated nodes should manage the positions of the comma, if they are of any interest. In a first implementation we could leave these positions out and use the scanner to find them if they are of interest.

Treatment of semicolon

From our experiences with refactoring it is helpful in some cases to know where the position of the semicolon is. For example if the user extract a for statement and he doesn't select the action's semicolon we allow the extraction. So what can we do in these cases:

Answers to explicit questions from Olivier