Here are my thoughts regarding positions and AST. The major goal was to come up with a consistent view, even though this causes some more work for refactoring and for the implementation of the new AST. But I think it might be worth if we can come up with a consistent story for positions.
for (int i= 0; i < 10; i++)
foo();
sourceEnd of the for statement will include the semicolon of the expression statement.
Currently multiple local declarations appear in the AST as n separate local declarations without any relationship to each others. This raises various questions:
Since the new AST isn't a 1:1 mapping of the compiler's AST anyway (we have the ExpressionStatement node) I opt to introduce new nodes as defined in the grammar. Since the semicolon doesn't belong to the variable declaration, it should be managed by the parent node that ties together multiple declarations. Here is an example:
int x= 10, x[]= null, i;
LocalVariableDeclaration node manages:
the type (e.g. int)
the positions of the commas (if needed)
the actual variable declarators
sourceStart= start of the type
sourceEnd= ;
VariableDeclarator node manages:
the variable name and its positions
the initialization
sourceStart= start of variable name
sourceEnd= end of initialization. Doesn't include
the comma.
If we want to do some optimization we could also have a node SingleLocalVariableDeclaration
for declaration like int x; or int y= 10; The node would have the following
fields:
the type
the variable name and its positions
the initialization
sourceStart= start of type
sourceEnd= ;
Analogous to the local variable declaration, the comma to separate the update expressions can not be part of the expression (expressions don't contain a semicolon so they can't contain a comma either). To know the positions of the commas the for statement should manage them in a separate array.
The general rule is, that whenever language elements are separate using a comma (for example an interface list in the implements statement, arguments of a method declaration, ...) the node containing the separated nodes should manage the positions of the comma, if they are of any interest. In a first implementation we could leave these positions out and use the scanner to find them if they are of interest.
From our experiences with refactoring it is helpful in some cases to know where the position of the semicolon is. For example if the user extract a for statement and he doesn't select the action's semicolon we allow the extraction. So what can we do in these cases: