Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [asciidoc-lang-dev] Whitespace handling

Hi Sylvain,

I see that you have licensed your project under GPL3.0.  While I don’t think the terms of GPL3.0 apply sensibly to interpreted code, my personal interpretation is that any code that calls your parser in-process would be GPL3.0.  I’m curious what your interpretation or intent is.  In any case, although I was hoping to use your PEG grammar as a starting point for an ANTLR 4 grammar, I don’t feel comfortable looking at it due to possible license implications.

Despite this, I do appreciate your making your code available.

Thanks,
David Jencks

> On Mar 9, 2021, at 12:08 AM, Sylvain Leroux <sylvain@xxxxxxxxxxx> wrote:
> 
> On 09/03/2021 01:20, Lex Trotman wrote:
>> [...] Remember mine is an experiment, so using
>> plain code allows me to play with all sorts of things, several of which
>> turned out to be bad ideas, but now I know that :-).
> 
> Same thing here. The difference is I implemented my own PEG engine. But
> all this is highly experimental to say the least.
> 
>> 
>> Of course since the PEG is only text in the spec its cheap for me to
>> extend it :-)
>>  
>> 
>>> 
>>> I have found I needed to extend the PEG to handle nesting of sections
>>> and lists without writing out a limited depth set, how do you
>>    address it?
>>> 
>>    For now, I use two separate PEGs. One is for the inline parser, the
>>    other one for block-level parsing.
>> 
>>    The PEG for the inline parser is quite stable, and I don't encounter
>>    significant difficulties when adding new features.
>> 
>>    The PEG for the block-level parsing is a different beast, though. It
>>    replaces a (multiple times rewrote) hand-written parser. For now, it is
>>    used as a tokenizer rather than a recursive descent parser. And the
>>    actual block hierarchy construction is delegated to a stateful object in
>>    the spirit of the factory method pattern
>>    (https://en.wikipedia.org/wiki/Factory_method_pattern).
>> 
>>    I tried to implement a recursive grammar for the block-level parser. But
>>    I struggled at finding a way to match the delimiter in nested blocks
>>    like in:
>> 
>>      ====
>> 
>>       ======
>> 
>>       ======
>> 
>>      ====
>> 
>> 
>> Ahh yes another extension I did was that all non-terminals and tokens
>> can return a numeric value (if they match) and I added a simple syntax
>> for assigning that value if its needed, and comparing it to the
>> parameter passed to the non-terminal.  Tokens return a count of relevant
>> characters that varies from token to token but is effectively the level
>> for section/list tokens or length of delimiter for block delimiters. 
>> The (simplified) syntax  for a "section" would be:
>> 
>> Section(level) ::= token_level = <start_line_equals>
>> (:token_level==level:) Markup_text_line Section_contents *Section(level+1)
>> 
>> where <line_start_equals> is the token from the lexer, (:expression:) is
>> a test that will fail the non-terminal if it fails, and name= assigns
>> the value to name. These have the obvious implementation in code.
>> 
>>    I suspect there is an elegant way of doing that since PEG "can count",
>>    but I yet have to find how. FWIW, I discovered PEG with this project, so
>>    my knowledge of the technology is still fragile.
>> 
>> 
> 
>> I'm not aware that a formal PEG
>> (https://en.wikipedia.org/wiki/Parsing_expression_grammar) can count in
>> a way that it can check context, eg number of equals is less, equal or
>> more than current section level.  Of course any implementation of PEG in
>> a programming language probably can use that language for the purpose,
>> but the specification either needs some formal extension or the use of
>> (shudder) words.  I'm in no way pushing my extensions, it just "works
>> for me"(TM).
> 
> When I met the "PEG" acronym the first time, I searched it on Wikipedia.
> I didn't find the article very enlightening. Fortunately, someone in the
> V8 regex team pointed me toward the work of Roberto Ierusalimschy from
> the PUC-Rio. He, and his colleagues, wrote great articles about Parsing
> Expression Grammars. My PEG engine is an implementation of the virtual
> machine described in "A Text Pattern-MatchingTool based on
> ParsingExpression Grammars (2008)"
> [http://www.inf.puc-rio.br/%7Eroberto/docs/peg.pdf]
> 
> Take also a look at "Converting regexes to Parsing Expression Grammars
> (2010)" [http://www.inf.puc-rio.br/%7Eroberto/docs/ry10-01.pdf]
> 
> When I said "PEG can count", I mean it can find balanced expressions of
> arbitrary depth, something _formal_ regular expression cannot (though
> many implementations support some form of recursion as an extension).
> Since blocks in AsciiDoc are defined as nested balanced markups, I
> suspect we could find a way to express that without requiring extensions
> to PEGs. But my attempts at doing that were unfruitful. And I lack
> experience in the field to spot a possible flaw in my reasoning.
> 
>> 
>> At the moment the spec and the code are out of sync and the code
>> crashes, I want to fix that before I push to github "soon".
> 
> My code don't crash (thanks to JS), and all tests pass (most of the
> time), but I'm not sure this can be useful to anyone. Anyhow it's on
> github. If you want to take a look, the inline parser is here:
> https://github.com/s-leroux/Asciishaman/blob/ce269f0a10ea7241cc8c1608db32d4a829150fc7/lib/inline-parser.js#L12
> Feel free to comment ;)
> 
> 
> Regards,
> - Sylvain
> 
> 
> 
> _______________________________________________
> asciidoc-lang-dev mailing list
> asciidoc-lang-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/asciidoc-lang-dev



Back to the top