UTF-8 Handling [message #203916] |
Thu, 30 November 2006 05:11 |
Eclipse User |
|
|
|
Originally posted by: milton.corigo.com
Hi All,
We are reading from our database UTF-8 strings that are in the form of =
Ӓ. When I created a simple report to test if BIRT handle would =
handle UTF-8 it displayed the characters as simple ASCII and not the UTF=
-8 =
equivalent. I see that BIRT has language locale selection however there =
=
isn't an option for UTF-8. We typically display pages with this content =
=
type: <meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3DUTF-8">. When I view the report in a browser the meta tag is s=
et =
correctly but the strings are not being rendered properly.
Has anyone else gotten this to work?
Does the BIRT engine understand UTF-8?
Is there configuration that needs to happen?
Thanks in advance.
|
|
|
|
Re: UTF-8 Handling [message #208256 is a reply to message #207830] |
Mon, 25 December 2006 11:24 |
Eclipse User |
|
|
|
Originally posted by: milton.corigo.com
Gary,
The DB columns are varchar and already contain properly formatted UTF-8 =
=
strings in decimal numeric format (Ӓ =3D unicode char). What's =
happening is that BIRT is converting all ampersands into their HTML =
encoded equivalent, as in:
Lj =3D=3D> &#456;
The conversion is breaking the unicode strings and thus they won't =
display. What's needed is a way to set the encoding type in BIRT so when=
=
the report is generated the correct encoding is used. Currently, it is s=
et =
to "UTF-8" which in our case breaks our strings, when what we need is =
"text/html;charset=3DISO-8859-1".
In addition, when a report is fed to a browser the response object =
content-type should also be adjustable rather than be fixed to =
"Content-Type: text/html" as currently done. This would allow the browse=
r =
to be told specifically how to treat the content depending on its =
requirements. Again, in our case we need "Content-Type: =
text/html;charset=3DISO-8859-1" to essentially tell the engine to treat =
the =
chars as plain ASCII and to not encode them.
We had the same problem with Velocity which defaults the content type to=
=
"UTF-8" and was also breaking our strings, but was fixed by setting the =
=
content type to "text/html;charset=3DISO-8859-1".
The ampersand encoding doesn't make sense anyhow because & isn't a =
valid UTF-8 char and conflicts completely with the decimal numeric form =
of =
UTF-8. All the reporting systems we reviewed are doing the same thing an=
d =
all have issues with UTF-8. There must be a reason for this but I don't =
=
understand why.
I'm pretty certain that allowing adjustment of the encoding and content =
=
type should provide enough flexibility to solve the problem. It might be=
=
worthwhile to review the other properties used for encoding and response=
=
content type (i.e. charset, ...) and open up the system configuration as=
=
much as reasonable.
Finally, here's a link to a site =
(http://billposer.org/Software/uni2ascii.html) by Bill Poser that has a =
=
program for converting from ascii to unicode in its various forms. The =
description has a concise listing of the various unicode forms that I =
found helpful.
Cheers,
Milton
On Thu, 21 Dec 2006 04:55:17 +0700, Gary Xue <gxue@actuate.com> wrote:
> Milton,
> What data type is the DB column that you use to store your UTF-8 text?=
=
> And
> what DB platform / JDBC drivers are you using?
>
-- =
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
|
|
|
Re: UTF-8 Handling [message #208264 is a reply to message #208256] |
Mon, 25 December 2006 12:12 |
Eclipse User |
|
|
|
Originally posted by: milton.corigo.com
Here's another reference link: =
http://www.javaworld.com/javaworld/jw-04-2004/jw-0419-multib ytes.html?pa=
ge=3D1
On Mon, 25 Dec 2006 18:24:01 +0700, Milton Hagler <milton@corigo.com> =
wrote:
> Gary,
>
> The DB columns are varchar and already contain properly formatted UTF-=
8 =
> strings in decimal numeric format (Ӓ =3D unicode char). What's =
> happening is that BIRT is converting all ampersands into their HTML =
> encoded equivalent, as in:
>
> Lj =3D=3D> &#456;
>
> The conversion is breaking the unicode strings and thus they won't =
> display. What's needed is a way to set the encoding type in BIRT so wh=
en =
> the report is generated the correct encoding is used. Currently, it is=
=
> set to "UTF-8" which in our case breaks our strings, when what we need=
=
> is "text/html;charset=3DISO-8859-1".
>
> In addition, when a report is fed to a browser the response object =
> content-type should also be adjustable rather than be fixed to =
> "Content-Type: text/html" as currently done. This would allow the =
> browser to be told specifically how to treat the content depending on =
=
> its requirements. Again, in our case we need "Content-Type: =
> text/html;charset=3DISO-8859-1" to essentially tell the engine to trea=
t =
> the chars as plain ASCII and to not encode them.
>
> We had the same problem with Velocity which defaults the content type =
to =
> "UTF-8" and was also breaking our strings, but was fixed by setting th=
e =
> content type to "text/html;charset=3DISO-8859-1".
>
> The ampersand encoding doesn't make sense anyhow because & isn't a=
=
> valid UTF-8 char and conflicts completely with the decimal numeric for=
m =
> of UTF-8. All the reporting systems we reviewed are doing the same thi=
ng =
> and all have issues with UTF-8. There must be a reason for this but I =
=
> don't understand why.
>
> I'm pretty certain that allowing adjustment of the encoding and conten=
t =
> type should provide enough flexibility to solve the problem. It might =
be =
> worthwhile to review the other properties used for encoding and respon=
se =
> content type (i.e. charset, ...) and open up the system configuration =
as =
> much as reasonable.
>
> Finally, here's a link to a site =
> (http://billposer.org/Software/uni2ascii.html) by Bill Poser that has =
a =
> program for converting from ascii to unicode in its various forms. The=
=
> description has a concise listing of the various unicode forms that I =
=
> found helpful.
>
> Cheers,
> Milton
>
>
> On Thu, 21 Dec 2006 04:55:17 +0700, Gary Xue <gxue@actuate.com> wrote:=
>
>> Milton,
>> What data type is the DB column that you use to store your UTF-8 text=
? =
>> And
>> what DB platform / JDBC drivers are you using?
>>
>
>
>
-- =
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
|
|
|
Powered by
FUDForum. Page generated in 0.04551 seconds