ITextContentDescriber: BYTE_ORDER_MARK overrides CHARSET [message #335354] |
Fri, 03 April 2009 15:28 |
Eclipse User |
|
|
|
Originally posted by: fmisiak.curl.com
We have a plugin that provides a content describer for our file extensions.
If the file content has a BOM and our describer detects a specific
charset encoding then eclipse seems to give preference to the BOM and
will ignore the CHARSET.
It seems this override decision is taken in
org.eclipse.core.internal.content.ContentDescription.getChar set()
public String getCharset() {
byte[] bom = (byte[]) getProperty(BYTE_ORDER_MARK);
if (bom == BOM_UTF_8)
return CHARSET_UTF_8;
else if (bom == BOM_UTF_16BE || bom == BOM_UTF_16LE)
// UTF-16 will properly recognize the BOM
return CHARSET_UTF_16;
return (String) getProperty(CHARSET);
}
Somehow this use case is quiet common with our japanese partners (not
sure yet why these files end up having a BOM while the encoding is
shift-jis though).
The workaround is to force the encoding using eclipse's property dialog
on the file, it's not perfect because the editor will display a few
weirds characters at the beginning of the buffer, but at least you can
update and save the file.
Is there anyway to customize this override rule (method getCharset() above)?
-Francois
|
|
|
|
Powered by
FUDForum. Page generated in 0.03410 seconds