Re: [cdt-dev] LR parser and token generation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [cdt-dev] LR parser and token generation

From: Mike Kucera <mkucera@xxxxxxxxxx>
Date: Wed, 3 Mar 2010 11:17:18 -0500
Delivered-to: cdt-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/cdt-dev>
List-help: <mailto:cdt-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/cdt-dev>, <mailto:cdt-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/cdt-dev>, <mailto:cdt-dev-request@eclipse.org?subject=unsubscribe>

Your understanding of the situation is exactly correct. I think normally with LPG you would provide a grammar for both the lexer and the parser parts. But in our situation we have a preprocessor sitting between the lexer and the parser which complicates things terribly. So instead the LR parser reuses the lexer/preprocessor from the CDT core. This is also necessary because the CPreprocessor class has a lot of critical functionality, but it does make adding new tokes other than keywords difficult. Worst case scenario you might have to provide a patch to add support for the new token to the core.

Since the LR parser is not using a lexer generated by LPG there needs to be a token map that maps the tokens from the core to the tokens that LPG requires.

If all you need to support is the @ sign then you may be in luck. The CDT lexer has an option to support @ in identifiers, if this option is turned on then the @ sign alone should be returned as an identifier token which you can then intercept and turn into the LPG token type that you want.

Mike Kucera
Software Developer
Eclipse CDT/PTP
IBM Toronto
mkucera@xxxxxxxxxx

"Mario Pierro" ---03/03/2010 09:32:28 AM---Hello,

From:	"Mario Pierro" <Mario.Pierro@xxxxxxx>
To:	"CDT General developers list." <cdt-dev@xxxxxxxxxxx>
Date:	03/03/2010 09:32 AM
Subject:	[cdt-dev] LR parser and token generation

Hello, Another question on LR parser customization... I am trying to add some custom extensions to the C99 language as specified in the LR parser plugin. The extensions require both additional keywords and additional grammar rules. My ILanguage implementation extends the C99Language class, and provides the custom C99Parser via its getParser() method. Additional keywords are added via a custom ICLanguageKeywords implementation (as described inhttp://dev.eclipse.org/mhonarc/lists/cdt-dev/msg15788.html) which extends CLanguageKeywords and adds the new ones. >From what I understood, my custom parser will process tokens which have been produced by the CPreprocessor / Lexer classes - as the PDOM parser does - and use a customized version of the DOMToC99TokenMap class to map the preprocessor tokens (IToken interface) to the tokens in the generated C99Parsersym class. So if the parser defines new tokens, the CPreprocessor needs to know about them as well. If I got it right, this can be done by having the language class supply an implementation of IScannerExtensionConfiguration, which associates the extended keywords to token ids in the IExtensionToken interface in its addKeyword(char[], int) method. Alternatively, the lexer can ignore the extensions altogether, and the customized DOMToC99TokenMap class can determine if e.g. an "identifier" token supplied by the lexer is actually an "extended keyword" token in the parser. A customized LR parser will thus be dependent on the tokens generated by the preprocessor, no matter what its grammar specifies. Circumventing this might be difficult, some characters might never be recognized as the Lexer might not be generating any token at all (e.g. the '@' char). I would like to use the same grammar for the lexer and the parser, so that the token set is the same. Is this possible? Am I getting something terribly wrong here? Thank you for your help! /Mario _______________________________________________ cdt-dev mailing list cdt-dev@xxxxxxxxxxxhttps://dev.eclipse.org/mailman/listinfo/cdt-dev

Follow-Ups:
- RE: [cdt-dev] LR parser and token generation
  - From: Mario Pierro

References:
- [cdt-dev] LR parser and token generation
  - From: Mario Pierro

Prev by Date: Re: [cdt-dev] LR parser and LPG version
Next by Date: Re: [cdt-dev] LR parser and LPG version
Previous by thread: [cdt-dev] LR parser and token generation
Next by thread: RE: [cdt-dev] LR parser and token generation
Index(es):
- Date
- Thread

Breadcrumbs