Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Archived » BIRT » UTF-8 Handling
UTF-8 Handling [message #203916] Thu, 30 November 2006 05:11 Go to next message
Eclipse UserFriend
Originally posted by: milton.corigo.com

Hi All,

We are reading from our database UTF-8 strings that are in the form of =

Ӓ. When I created a simple report to test if BIRT handle would =

handle UTF-8 it displayed the characters as simple ASCII and not the UTF=
-8 =

equivalent. I see that BIRT has language locale selection however there =
=

isn't an option for UTF-8. We typically display pages with this content =
=

type: <meta http-equiv=3D"Content-Type" content=3D"text/html; =

charset=3DUTF-8">. When I view the report in a browser the meta tag is s=
et =

correctly but the strings are not being rendered properly.

Has anyone else gotten this to work?
Does the BIRT engine understand UTF-8?
Is there configuration that needs to happen?

Thanks in advance.
Re: UTF-8 Handling [message #207830 is a reply to message #203916] Wed, 20 December 2006 21:55 Go to previous messageGo to next message
Gary Xue is currently offline Gary XueFriend
Messages: 193
Registered: July 2009
Senior Member
Milton,
What data type is the DB column that you use to store your UTF-8 text? And
what DB platform / JDBC drivers are you using?

--
Gary Xue [Actuate Corporation | BIRT Committer]


"Milton Hagler" <milton@corigo.com> wrote in message
news:op.tjtco1b079djee@milton-ibm.corigo.local...
Hi All,

We are reading from our database UTF-8 strings that are in the form of
&#1234;. When I created a simple report to test if BIRT handle would
handle UTF-8 it displayed the characters as simple ASCII and not the UTF-8
equivalent. I see that BIRT has language locale selection however there
isn't an option for UTF-8. We typically display pages with this content
type: <meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">. When I view the report in a browser the meta tag is set
correctly but the strings are not being rendered properly.

Has anyone else gotten this to work?
Does the BIRT engine understand UTF-8?
Is there configuration that needs to happen?

Thanks in advance.
Re: UTF-8 Handling [message #208256 is a reply to message #207830] Mon, 25 December 2006 11:24 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: milton.corigo.com

Gary,

The DB columns are varchar and already contain properly formatted UTF-8 =
=

strings in decimal numeric format (&#1234; =3D unicode char). What's =

happening is that BIRT is converting all ampersands into their HTML =

encoded equivalent, as in:

&#456; =3D=3D> &amp;#456;

The conversion is breaking the unicode strings and thus they won't =

display. What's needed is a way to set the encoding type in BIRT so when=
=

the report is generated the correct encoding is used. Currently, it is s=
et =

to "UTF-8" which in our case breaks our strings, when what we need is =

"text/html;charset=3DISO-8859-1".

In addition, when a report is fed to a browser the response object =

content-type should also be adjustable rather than be fixed to =

"Content-Type: text/html" as currently done. This would allow the browse=
r =

to be told specifically how to treat the content depending on its =

requirements. Again, in our case we need "Content-Type: =

text/html;charset=3DISO-8859-1" to essentially tell the engine to treat =
the =

chars as plain ASCII and to not encode them.

We had the same problem with Velocity which defaults the content type to=
=

"UTF-8" and was also breaking our strings, but was fixed by setting the =
=

content type to "text/html;charset=3DISO-8859-1".

The ampersand encoding doesn't make sense anyhow because &amp; isn't a =

valid UTF-8 char and conflicts completely with the decimal numeric form =
of =

UTF-8. All the reporting systems we reviewed are doing the same thing an=
d =

all have issues with UTF-8. There must be a reason for this but I don't =
=

understand why.

I'm pretty certain that allowing adjustment of the encoding and content =
=

type should provide enough flexibility to solve the problem. It might be=
=

worthwhile to review the other properties used for encoding and response=
=

content type (i.e. charset, ...) and open up the system configuration as=
=

much as reasonable.

Finally, here's a link to a site =

(http://billposer.org/Software/uni2ascii.html) by Bill Poser that has a =
=

program for converting from ascii to unicode in its various forms. The =

description has a concise listing of the various unicode forms that I =

found helpful.

Cheers,
Milton


On Thu, 21 Dec 2006 04:55:17 +0700, Gary Xue <gxue@actuate.com> wrote:

> Milton,
> What data type is the DB column that you use to store your UTF-8 text?=
=

> And
> what DB platform / JDBC drivers are you using?
>



-- =

Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Re: UTF-8 Handling [message #208264 is a reply to message #208256] Mon, 25 December 2006 12:12 Go to previous message
Eclipse UserFriend
Originally posted by: milton.corigo.com

Here's another reference link: =

http://www.javaworld.com/javaworld/jw-04-2004/jw-0419-multib ytes.html?pa=
ge=3D1

On Mon, 25 Dec 2006 18:24:01 +0700, Milton Hagler <milton@corigo.com> =

wrote:

> Gary,
>
> The DB columns are varchar and already contain properly formatted UTF-=
8 =

> strings in decimal numeric format (&#1234; =3D unicode char). What's =

> happening is that BIRT is converting all ampersands into their HTML =

> encoded equivalent, as in:
>
> &#456; =3D=3D> &amp;#456;
>
> The conversion is breaking the unicode strings and thus they won't =

> display. What's needed is a way to set the encoding type in BIRT so wh=
en =

> the report is generated the correct encoding is used. Currently, it is=
=

> set to "UTF-8" which in our case breaks our strings, when what we need=
=

> is "text/html;charset=3DISO-8859-1".
>
> In addition, when a report is fed to a browser the response object =

> content-type should also be adjustable rather than be fixed to =

> "Content-Type: text/html" as currently done. This would allow the =

> browser to be told specifically how to treat the content depending on =
=

> its requirements. Again, in our case we need "Content-Type: =

> text/html;charset=3DISO-8859-1" to essentially tell the engine to trea=
t =

> the chars as plain ASCII and to not encode them.
>
> We had the same problem with Velocity which defaults the content type =
to =

> "UTF-8" and was also breaking our strings, but was fixed by setting th=
e =

> content type to "text/html;charset=3DISO-8859-1".
>
> The ampersand encoding doesn't make sense anyhow because &amp; isn't a=
=

> valid UTF-8 char and conflicts completely with the decimal numeric for=
m =

> of UTF-8. All the reporting systems we reviewed are doing the same thi=
ng =

> and all have issues with UTF-8. There must be a reason for this but I =
=

> don't understand why.
>
> I'm pretty certain that allowing adjustment of the encoding and conten=
t =

> type should provide enough flexibility to solve the problem. It might =
be =

> worthwhile to review the other properties used for encoding and respon=
se =

> content type (i.e. charset, ...) and open up the system configuration =
as =

> much as reasonable.
>
> Finally, here's a link to a site =

> (http://billposer.org/Software/uni2ascii.html) by Bill Poser that has =
a =

> program for converting from ascii to unicode in its various forms. The=
=

> description has a concise listing of the various unicode forms that I =
=

> found helpful.
>
> Cheers,
> Milton
>
>
> On Thu, 21 Dec 2006 04:55:17 +0700, Gary Xue <gxue@actuate.com> wrote:=

>
>> Milton,
>> What data type is the DB column that you use to store your UTF-8 text=
? =

>> And
>> what DB platform / JDBC drivers are you using?
>>
>
>
>



-- =

Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Previous Topic:Adding client side script to report?
Next Topic:URI as an parameter
Goto Forum:
  


Current Time: Thu Nov 07 07:21:18 GMT 2024

Powered by FUDForum. Page generated in 0.04551 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top