Eclipse Community Forums: TMF (Xtext) » [xtext] is lexical analysis over syntactic analysis in xtext?

Help

Home

Home » Modeling » TMF (Xtext) » [xtext] is lexical analysis over syntactic analysis in xtext?

Show: Today's Messages :: Show Polls :: Message Navigator

[xtext] is lexical analysis over syntactic analysis in xtext? [message #660252]

Thu, 17 March 2011 14:35

hanys

Messages: 188
Registered: July 2009

Senior Member

Hello,

I have noticed that xtext tries first to identify lexical tokens and then it
tries to combine it to the grammar syntactic rules.
I think that this appoach is bad.

Does really identifying lexems happens before syntactic analysis?

example:
we had defined token FOR:
terminal FOR: "for".

But problem with xtext was that it identifies in the following example for
and terminal and not the other word:
////////// parsed file:start
....
format
....
////////// parsed file:end

xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
This is wrong IMHO.

And we had to do the following
ForKeyword: F O R;

terminal F: ('f' | 'F');

terminal O: ('o' | 'O');

terminal R: ('r' | 'R');

But the parsing of such grammar is slower I would say,

BR,
Jan

Report message to a moderator

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660276 is a reply to message #660252]

Thu, 17 March 2011 15:50

Jan Koehnlein

Messages: 760
Registered: July 2009
Location: Hamburg

Senior Member

The context free lexing (a priori lexing) is a property of the Antlr
parser generator we use in the backend of Xtext. It is a tradeoff for a
lot of beautiful things we get from using Antlr, such as error recovery.

As a result, you must be careful what kind of terminal rules you define
and in which order.

An easy workaround for your example should be to switch from a terminal
rule to a datatype rule (by leaving out the 'terminal' keyword). That
way, it won't be the lexer do decide whether its a token or an
identifier. 'for' is a keyword in your langauge anyway, isn't it?

Am 17.03.11 15:35, schrieb Jan:
> Hello,
>
> I have noticed that xtext tries first to identify lexical tokens and then it
> tries to combine it to the grammar syntactic rules.
> I think that this appoach is bad.
>
> Does really identifying lexems happens before syntactic analysis?
>
>
> example:
> we had defined token FOR:
> terminal FOR: "for".
>
> But problem with xtext was that it identifies in the following example for
> and terminal and not the other word:
> ////////// parsed file:start
> ...
> format
> ...
> ////////// parsed file:end
>
>
> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
> This is wrong IMHO.
>
>
> And we had to do the following
> ForKeyword: F O R;
>
> terminal F: ('f' | 'F');
>
>
> terminal O: ('o' | 'O');
>
>
> terminal R: ('r' | 'R');
>
> But the parsing of such grammar is slower I would say,
>
> BR,
> Jan
>
>

--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

---
Get professional support from the Xtext committers at www.typefox.io

Report message to a moderator

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660372 is a reply to message #660252]

Fri, 18 March 2011 07:37

Alexander Nittka

Messages: 1193
Registered: July 2009

Senior Member

Hi,

in addion to Jan's reply note that Xtext has a switch for case insensitive language. You should find information about that in the documentation.

Alex

Report message to a moderator

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660406 is a reply to message #660276]

Fri, 18 March 2011 10:12

hanys

Messages: 188
Registered: July 2009

Senior Member

Yes 'for' is our keyword.
Actually we are going realize editor for extended JavaScript.
What is your opinion. Is it possible with XText? I noticed that javascript
contains some automatic semicolon insertions in its syntax.

Thanks,
Jan

"Jan Koehnlein" <jan.koehnlein@itemis.de> wrote in message
news:ilta73$g72$1@news.eclipse.org...
> The context free lexing (a priori lexing) is a property of the Antlr
> parser generator we use in the backend of Xtext. It is a tradeoff for a
> lot of beautiful things we get from using Antlr, such as error recovery.
>
> As a result, you must be careful what kind of terminal rules you define
> and in which order.
>
> An easy workaround for your example should be to switch from a terminal
> rule to a datatype rule (by leaving out the 'terminal' keyword). That way,
> it won't be the lexer do decide whether its a token or an identifier.
> 'for' is a keyword in your langauge anyway, isn't it?
>
> Am 17.03.11 15:35, schrieb Jan:
>> Hello,
>>
>> I have noticed that xtext tries first to identify lexical tokens and then
>> it
>> tries to combine it to the grammar syntactic rules.
>> I think that this appoach is bad.
>>
>> Does really identifying lexems happens before syntactic analysis?
>>
>>
>> example:
>> we had defined token FOR:
>> terminal FOR: "for".
>>
>> But problem with xtext was that it identifies in the following example
>> for
>> and terminal and not the other word:
>> ////////// parsed file:start
>> ...
>> format
>> ...
>> ////////// parsed file:end
>>
>>
>> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
>> This is wrong IMHO.
>>
>>
>> And we had to do the following
>> ForKeyword: F O R;
>>
>> terminal F: ('f' | 'F');
>>
>>
>> terminal O: ('o' | 'O');
>>
>>
>> terminal R: ('r' | 'R');
>>
>> But the parsing of such grammar is slower I would say,
>>
>> BR,
>> Jan
>>
>>
>
>
> --
> Need professional support for Eclipse Modeling?
> Go visit: http://xtext.itemis.com

Report message to a moderator

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660408 is a reply to message #660406]

Fri, 18 March 2011 10:24

Jan Koehnlein

Messages: 760
Registered: July 2009
Location: Hamburg

Senior Member

Not sure, because I don't know extended JavaScript too well.
The challenge in such projects is usually to get the grammar right and
free of ambiguities. You might have to enable backtracking or use
syntactic predicates (Xtext2 only).

Am 18.03.11 11:12, schrieb Jan:
> Yes 'for' is our keyword.
> Actually we are going realize editor for extended JavaScript.
> What is your opinion. Is it possible with XText? I noticed that javascript
> contains some automatic semicolon insertions in its syntax.
>
> Thanks,
> Jan
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilta73$g72$1@news.eclipse.org...
>> The context free lexing (a priori lexing) is a property of the Antlr
>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>> lot of beautiful things we get from using Antlr, such as error recovery.
>>
>> As a result, you must be careful what kind of terminal rules you define
>> and in which order.
>>
>> An easy workaround for your example should be to switch from a terminal
>> rule to a datatype rule (by leaving out the 'terminal' keyword). That way,
>> it won't be the lexer do decide whether its a token or an identifier.
>> 'for' is a keyword in your langauge anyway, isn't it?
>>
>> Am 17.03.11 15:35, schrieb Jan:
>>> Hello,
>>>
>>> I have noticed that xtext tries first to identify lexical tokens and then
>>> it
>>> tries to combine it to the grammar syntactic rules.
>>> I think that this appoach is bad.
>>>
>>> Does really identifying lexems happens before syntactic analysis?
>>>
>>>
>>> example:
>>> we had defined token FOR:
>>> terminal FOR: "for".
>>>
>>> But problem with xtext was that it identifies in the following example
>>> for
>>> and terminal and not the other word:
>>> ////////// parsed file:start
>>> ...
>>> format
>>> ...
>>> ////////// parsed file:end
>>>
>>>
>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal symbol.
>>> This is wrong IMHO.
>>>
>>>
>>> And we had to do the following
>>> ForKeyword: F O R;
>>>
>>> terminal F: ('f' | 'F');
>>>
>>>
>>> terminal O: ('o' | 'O');
>>>
>>>
>>> terminal R: ('r' | 'R');
>>>
>>> But the parsing of such grammar is slower I would say,
>>>
>>> BR,
>>> Jan
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>

--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

---
Get professional support from the Xtext committers at www.typefox.io

Report message to a moderator

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660409 is a reply to message #660408]

Fri, 18 March 2011 10:36

hanys

Messages: 188
Registered: July 2009

Senior Member

Basically it's JSON + javascript.

Is there any documentation about " syntactic predicates "?

Thanks,

Jan

"Jan Koehnlein" <jan.koehnlein@itemis.de> wrote in message
news:ilvbf3$s4p$1@news.eclipse.org...
> Not sure, because I don't know extended JavaScript too well.
> The challenge in such projects is usually to get the grammar right and
> free of ambiguities. You might have to enable backtracking or use
> syntactic predicates (Xtext2 only).
>
> Am 18.03.11 11:12, schrieb Jan:
>> Yes 'for' is our keyword.
>> Actually we are going realize editor for extended JavaScript.
>> What is your opinion. Is it possible with XText? I noticed that
>> javascript
>> contains some automatic semicolon insertions in its syntax.
>>
>> Thanks,
>> Jan
>>
>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>> news:ilta73$g72$1@news.eclipse.org...
>>> The context free lexing (a priori lexing) is a property of the Antlr
>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>
>>> As a result, you must be careful what kind of terminal rules you define
>>> and in which order.
>>>
>>> An easy workaround for your example should be to switch from a terminal
>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>> way,
>>> it won't be the lexer do decide whether its a token or an identifier.
>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>
>>> Am 17.03.11 15:35, schrieb Jan:
>>>> Hello,
>>>>
>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>> then
>>>> it
>>>> tries to combine it to the grammar syntactic rules.
>>>> I think that this appoach is bad.
>>>>
>>>> Does really identifying lexems happens before syntactic analysis?
>>>>
>>>>
>>>> example:
>>>> we had defined token FOR:
>>>> terminal FOR: "for".
>>>>
>>>> But problem with xtext was that it identifies in the following example
>>>> for
>>>> and terminal and not the other word:
>>>> ////////// parsed file:start
>>>> ...
>>>> format
>>>> ...
>>>> ////////// parsed file:end
>>>>
>>>>
>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>> symbol.
>>>> This is wrong IMHO.
>>>>
>>>>
>>>> And we had to do the following
>>>> ForKeyword: F O R;
>>>>
>>>> terminal F: ('f' | 'F');
>>>>
>>>>
>>>> terminal O: ('o' | 'O');
>>>>
>>>>
>>>> terminal R: ('r' | 'R');
>>>>
>>>> But the parsing of such grammar is slower I would say,
>>>>
>>>> BR,
>>>> Jan
>>>>
>>>>
>>>
>>>
>>> --
>>> Need professional support for Eclipse Modeling?
>>> Go visit: http://xtext.itemis.com
>>
>>
>
>
> --
> Need professional support for Eclipse Modeling?
> Go visit: http://xtext.itemis.com

Report message to a moderator

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660453 is a reply to message #660409]

Fri, 18 March 2011 13:59

Jan Koehnlein

Messages: 760
Registered: July 2009
Location: Hamburg

Senior Member

Syntactic predicates are new in Xtext 2.0 and the documentation is not
yet finished. But there's a thread "syntatic predicates in Xtext 2.0" in
this newsgroup.

Am 18.03.11 11:36, schrieb Jan:
> Basically it's JSON + javascript.
>
> Is there any documentation about " syntactic predicates "?
>
>
>
> Thanks,
>
> Jan
>
>
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilvbf3$s4p$1@news.eclipse.org...
>> Not sure, because I don't know extended JavaScript too well.
>> The challenge in such projects is usually to get the grammar right and
>> free of ambiguities. You might have to enable backtracking or use
>> syntactic predicates (Xtext2 only).
>>
>> Am 18.03.11 11:12, schrieb Jan:
>>> Yes 'for' is our keyword.
>>> Actually we are going realize editor for extended JavaScript.
>>> What is your opinion. Is it possible with XText? I noticed that
>>> javascript
>>> contains some automatic semicolon insertions in its syntax.
>>>
>>> Thanks,
>>> Jan
>>>
>>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>>> news:ilta73$g72$1@news.eclipse.org...
>>>> The context free lexing (a priori lexing) is a property of the Antlr
>>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>>
>>>> As a result, you must be careful what kind of terminal rules you define
>>>> and in which order.
>>>>
>>>> An easy workaround for your example should be to switch from a terminal
>>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>>> way,
>>>> it won't be the lexer do decide whether its a token or an identifier.
>>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>>
>>>> Am 17.03.11 15:35, schrieb Jan:
>>>>> Hello,
>>>>>
>>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>>> then
>>>>> it
>>>>> tries to combine it to the grammar syntactic rules.
>>>>> I think that this appoach is bad.
>>>>>
>>>>> Does really identifying lexems happens before syntactic analysis?
>>>>>
>>>>>
>>>>> example:
>>>>> we had defined token FOR:
>>>>> terminal FOR: "for".
>>>>>
>>>>> But problem with xtext was that it identifies in the following example
>>>>> for
>>>>> and terminal and not the other word:
>>>>> ////////// parsed file:start
>>>>> ...
>>>>> format
>>>>> ...
>>>>> ////////// parsed file:end
>>>>>
>>>>>
>>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>>> symbol.
>>>>> This is wrong IMHO.
>>>>>
>>>>>
>>>>> And we had to do the following
>>>>> ForKeyword: F O R;
>>>>>
>>>>> terminal F: ('f' | 'F');
>>>>>
>>>>>
>>>>> terminal O: ('o' | 'O');
>>>>>
>>>>>
>>>>> terminal R: ('r' | 'R');
>>>>>
>>>>> But the parsing of such grammar is slower I would say,
>>>>>
>>>>> BR,
>>>>> Jan
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Need professional support for Eclipse Modeling?
>>>> Go visit: http://xtext.itemis.com
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>

--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

---
Get professional support from the Xtext committers at www.typefox.io

Report message to a moderator

Re: [xtext] is lexical analysis over syntactic analysis in xtext? [message #660509 is a reply to message #660409]

Fri, 18 March 2011 17:20

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

I don't think it is enough with syntactic predicates to specify a JS
parser, you probably also need semantic predicates (which are not
supported in Xtext). JS is a *bitch* to parse if you aim to correctly
cover the entire language.

Suggest you get the antlr book and look at some JS parser samples
written for antlr so you know what sort of challenges you will encounter
before you start.

Regards
- henrik

On 3/18/11 11:36 AM, Jan wrote:
> Basically it's JSON + javascript.
>
> Is there any documentation about " syntactic predicates "?
>
>
>
> Thanks,
>
> Jan
>
>
>
> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
> news:ilvbf3$s4p$1@news.eclipse.org...
>> Not sure, because I don't know extended JavaScript too well.
>> The challenge in such projects is usually to get the grammar right and
>> free of ambiguities. You might have to enable backtracking or use
>> syntactic predicates (Xtext2 only).
>>
>> Am 18.03.11 11:12, schrieb Jan:
>>> Yes 'for' is our keyword.
>>> Actually we are going realize editor for extended JavaScript.
>>> What is your opinion. Is it possible with XText? I noticed that
>>> javascript
>>> contains some automatic semicolon insertions in its syntax.
>>>
>>> Thanks,
>>> Jan
>>>
>>> "Jan Koehnlein"<jan.koehnlein@itemis.de> wrote in message
>>> news:ilta73$g72$1@news.eclipse.org...
>>>> The context free lexing (a priori lexing) is a property of the Antlr
>>>> parser generator we use in the backend of Xtext. It is a tradeoff for a
>>>> lot of beautiful things we get from using Antlr, such as error recovery.
>>>>
>>>> As a result, you must be careful what kind of terminal rules you define
>>>> and in which order.
>>>>
>>>> An easy workaround for your example should be to switch from a terminal
>>>> rule to a datatype rule (by leaving out the 'terminal' keyword). That
>>>> way,
>>>> it won't be the lexer do decide whether its a token or an identifier.
>>>> 'for' is a keyword in your langauge anyway, isn't it?
>>>>
>>>> Am 17.03.11 15:35, schrieb Jan:
>>>>> Hello,
>>>>>
>>>>> I have noticed that xtext tries first to identify lexical tokens and
>>>>> then
>>>>> it
>>>>> tries to combine it to the grammar syntactic rules.
>>>>> I think that this appoach is bad.
>>>>>
>>>>> Does really identifying lexems happens before syntactic analysis?
>>>>>
>>>>>
>>>>> example:
>>>>> we had defined token FOR:
>>>>> terminal FOR: "for".
>>>>>
>>>>> But problem with xtext was that it identifies in the following example
>>>>> for
>>>>> and terminal and not the other word:
>>>>> ////////// parsed file:start
>>>>> ...
>>>>> format
>>>>> ...
>>>>> ////////// parsed file:end
>>>>>
>>>>>
>>>>> xtext marked for in the word format as a LEXICAL TOKEN - terminal
>>>>> symbol.
>>>>> This is wrong IMHO.
>>>>>
>>>>>
>>>>> And we had to do the following
>>>>> ForKeyword: F O R;
>>>>>
>>>>> terminal F: ('f' | 'F');
>>>>>
>>>>>
>>>>> terminal O: ('o' | 'O');
>>>>>
>>>>>
>>>>> terminal R: ('r' | 'R');
>>>>>
>>>>> But the parsing of such grammar is slower I would say,
>>>>>
>>>>> BR,
>>>>> Jan
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Need professional support for Eclipse Modeling?
>>>> Go visit: http://xtext.itemis.com
>>>
>>>
>>
>>
>> --
>> Need professional support for Eclipse Modeling?
>> Go visit: http://xtext.itemis.com
>
>

Report message to a moderator

Previous Topic:	Caching extracted names
Next Topic:	Getting the Qualified Name and Global Name Collision

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Sep 26 22:20:29 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter