Eclipse Community Forums: Eclipse Platform » ITextContentDescriber: BYTE_ORDER

Help

Home

Home » Eclipse Projects » Eclipse Platform » ITextContentDescriber: BYTE_ORDER_MARK overrides CHARSET

Show: Today's Messages :: Show Polls :: Message Navigator

ITextContentDescriber: BYTE_ORDER_MARK overrides CHARSET [message #335354]

Fri, 03 April 2009 15:28

Eclipse User

Originally posted by: fmisiak.curl.com

We have a plugin that provides a content describer for our file extensions.
If the file content has a BOM and our describer detects a specific
charset encoding then eclipse seems to give preference to the BOM and
will ignore the CHARSET.

It seems this override decision is taken in
org.eclipse.core.internal.content.ContentDescription.getChar set()

public String getCharset() {
byte[] bom = (byte[]) getProperty(BYTE_ORDER_MARK);
if (bom == BOM_UTF_8)
return CHARSET_UTF_8;
else if (bom == BOM_UTF_16BE || bom == BOM_UTF_16LE)
// UTF-16 will properly recognize the BOM
return CHARSET_UTF_16;
return (String) getProperty(CHARSET);
}

Somehow this use case is quiet common with our japanese partners (not
sure yet why these files end up having a BOM while the encoding is
shift-jis though).
The workaround is to force the encoding using eclipse's property dialog
on the file, it's not perfect because the editor will display a few
weirds characters at the beginning of the buffer, but at least you can
update and save the file.

Is there anyway to customize this override rule (method getCharset() above)?

-Francois

Report message to a moderator

Re: ITextContentDescriber: BYTE_ORDER_MARK overrides CHARSET [message #335392 is a reply to message #335354]

Mon, 06 April 2009 10:01

Dani Megert

Messages: 3802
Registered: July 2009

Senior Member

Francois Misiak wrote:
> We have a plugin that provides a content describer for our file
> extensions.
> If the file content has a BOM and our describer detects a specific
> charset encoding then eclipse seems to give preference to the BOM and
> will ignore the CHARSET.
>
> It seems this override decision is taken in
> org.eclipse.core.internal.content.ContentDescription.getChar set()
>
> public String getCharset() {
> byte[] bom = (byte[]) getProperty(BYTE_ORDER_MARK);
> if (bom == BOM_UTF_8)
> return CHARSET_UTF_8;
> else if (bom == BOM_UTF_16BE || bom == BOM_UTF_16LE)
> // UTF-16 will properly recognize the BOM
> return CHARSET_UTF_16;
> return (String) getProperty(CHARSET);
> }
>
>
> Somehow this use case is quiet common with our japanese partners (not
> sure yet why these files end up having a BOM while the encoding is
> shift-jis though).
> The workaround is to force the encoding using eclipse's property
> dialog on the file, it's not perfect because the editor will display a
> few weirds characters at the beginning of the buffer, but at least you
> can update and save the file.
>
> Is there anyway to customize this override rule (method getCharset()
> above)?
Your describer could ignore to set the BOM property but to me it looks
like something is wrong with your files in the first place.

Dani
>
> -Francois

Report message to a moderator

Previous Topic:	Notification on Editor close
Next Topic:	Plug-in not properly uninstalled / updated

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Sat Jul 13 17:08:03 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter