Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [che-dev] XML: line endings normalization problem

Hi.
Thank you for your answer!

> According to the spec, only CRLF instances that occur inside external parsed entities are normalized.

Yes it is true, but if we replace only CRLF with LF before parsing content then

before parsing: CRCRLF -> becomes CRLF
after parsing:  CRLF -> becomes single LF 

So we need to replace both \r\n -> \n & \r -> \n which at least gives us ability to 
stay in touch with content.

> The problem is that if the XML document is normalized like you suggest, 
> then the semantic content of the document may be affected. 
> The best example is that if a \r\n combination occurs inside a CDATA section, 
> then performing this global change also changes the semantic meaning of the XML document. 
> I would consider this a bug in XMLTree.

Java parser will parse all inclusions of \r\n as it is single \n event in CDATA section.

Do you have any other solution for this?

On Tue, Jul 14, 2015 at 9:19 AM, Okman, Lior <lior.okman@xxxxxxx> wrote:

Hi,

 

Actually, git client will do this normalization only if the core.autocrlf setting is set to true on Windows OS.

 

According to the spec, only CRLF instances that occur inside external parsed entities are normalized.

 

The problem is that if the XML document is normalized like you suggest, then the semantic content of the document may be affected. The best example is that if a \r\n combination occurs inside a CDATA section, then performing this global change also changes the semantic meaning of the XML document. I would consider this a bug in XMLTree.

 

--

Lior

 

From: che-dev-bounces@xxxxxxxxxxx [mailto:che-dev-bounces@xxxxxxxxxxx] On Behalf Of Evgenii Voevodin
Sent: Monday 13 July 2015 19:10
To: che developer discussions
Subject: [che-dev] XML: line endings normalization problem

 

Hi.

I met the problem parsing XML with XMLTree

on Windows OS. The problem is about how to parse line separators(\r\n)?

 

XMLTree provides abilities to modify and search information in xml document without affecting an

existing formatting, comments etc.

In the other hand XML spec says that all combinations of \r\n, or single \r should be

 replaced with \n while parsing XML document.

As XMLTree is built on top of standard java parser it follows spec as well.

 

So actually the problem:

when we have content which contains combination of \r\n or \r it will be replaced with \n

but source content still contains \r\n and it is impossible to detect which is the right

positions of xml document elements, therefore it is impossible to modify content properly.

 

The solution which i have found is to replace all \r\n and then \r with single

\n before parsing the actual content, then if it is necessary to write modified content

all inclusions of \n will be replaced with system line separator. This solution is pretty close

to git client solution, which replaces \r\n with \n on Windows OS as well.

 

Do you guys know any other solution for this "normalization problem"?

 


_______________________________________________
che-dev mailing list
che-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/che-dev



Back to the top