Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [che-dev] XML: line endings normalization problem

Hi,

 

Actually, git client will do this normalization only if the core.autocrlf setting is set to true on Windows OS.

 

According to the spec, only CRLF instances that occur inside external parsed entities are normalized.

 

The problem is that if the XML document is normalized like you suggest, then the semantic content of the document may be affected. The best example is that if a \r\n combination occurs inside a CDATA section, then performing this global change also changes the semantic meaning of the XML document. I would consider this a bug in XMLTree.

 

--

Lior

 

From: che-dev-bounces@xxxxxxxxxxx [mailto:che-dev-bounces@xxxxxxxxxxx] On Behalf Of Evgenii Voevodin
Sent: Monday 13 July 2015 19:10
To: che developer discussions
Subject: [che-dev] XML: line endings normalization problem

 

Hi.

I met the problem parsing XML with XMLTree

on Windows OS. The problem is about how to parse line separators(\r\n)?

 

XMLTree provides abilities to modify and search information in xml document without affecting an

existing formatting, comments etc.

In the other hand XML spec says that all combinations of \r\n, or single \r should be

 replaced with \n while parsing XML document.

As XMLTree is built on top of standard java parser it follows spec as well.

 

So actually the problem:

when we have content which contains combination of \r\n or \r it will be replaced with \n

but source content still contains \r\n and it is impossible to detect which is the right

positions of xml document elements, therefore it is impossible to modify content properly.

 

The solution which i have found is to replace all \r\n and then \r with single

\n before parsing the actual content, then if it is necessary to write modified content

all inclusions of \n will be replaced with system line separator. This solution is pretty close

to git client solution, which replaces \r\n with \n on Windows OS as well.

 

Do you guys know any other solution for this "normalization problem"?

 


Back to the top