Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Eclipse Platform » ITextContentDescriber: BYTE_ORDER_MARK overrides CHARSET
ITextContentDescriber: BYTE_ORDER_MARK overrides CHARSET [message #335354] Fri, 03 April 2009 15:28 Go to next message
Eclipse UserFriend
Originally posted by: fmisiak.curl.com

We have a plugin that provides a content describer for our file extensions.
If the file content has a BOM and our describer detects a specific
charset encoding then eclipse seems to give preference to the BOM and
will ignore the CHARSET.

It seems this override decision is taken in
org.eclipse.core.internal.content.ContentDescription.getChar set()

public String getCharset() {
byte[] bom = (byte[]) getProperty(BYTE_ORDER_MARK);
if (bom == BOM_UTF_8)
return CHARSET_UTF_8;
else if (bom == BOM_UTF_16BE || bom == BOM_UTF_16LE)
// UTF-16 will properly recognize the BOM
return CHARSET_UTF_16;
return (String) getProperty(CHARSET);
}


Somehow this use case is quiet common with our japanese partners (not
sure yet why these files end up having a BOM while the encoding is
shift-jis though).
The workaround is to force the encoding using eclipse's property dialog
on the file, it's not perfect because the editor will display a few
weirds characters at the beginning of the buffer, but at least you can
update and save the file.

Is there anyway to customize this override rule (method getCharset() above)?

-Francois
Re: ITextContentDescriber: BYTE_ORDER_MARK overrides CHARSET [message #335392 is a reply to message #335354] Mon, 06 April 2009 10:01 Go to previous message
Dani Megert is currently offline Dani MegertFriend
Messages: 3802
Registered: July 2009
Senior Member
Francois Misiak wrote:
> We have a plugin that provides a content describer for our file
> extensions.
> If the file content has a BOM and our describer detects a specific
> charset encoding then eclipse seems to give preference to the BOM and
> will ignore the CHARSET.
>
> It seems this override decision is taken in
> org.eclipse.core.internal.content.ContentDescription.getChar set()
>
> public String getCharset() {
> byte[] bom = (byte[]) getProperty(BYTE_ORDER_MARK);
> if (bom == BOM_UTF_8)
> return CHARSET_UTF_8;
> else if (bom == BOM_UTF_16BE || bom == BOM_UTF_16LE)
> // UTF-16 will properly recognize the BOM
> return CHARSET_UTF_16;
> return (String) getProperty(CHARSET);
> }
>
>
> Somehow this use case is quiet common with our japanese partners (not
> sure yet why these files end up having a BOM while the encoding is
> shift-jis though).
> The workaround is to force the encoding using eclipse's property
> dialog on the file, it's not perfect because the editor will display a
> few weirds characters at the beginning of the buffer, but at least you
> can update and save the file.
>
> Is there anyway to customize this override rule (method getCharset()
> above)?
Your describer could ignore to set the BOM property but to me it looks
like something is wrong with your files in the first place.

Dani
>
> -Francois
Previous Topic:Notification on Editor close
Next Topic:Plug-in not properly uninstalled / updated
Goto Forum:
  


Current Time: Sat Jul 13 17:08:03 GMT 2024

Powered by FUDForum. Page generated in 0.05054 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top