Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[udig-devel] Re: Why utf-8 (0.8 feedback)

Hey Jesse,

I read your answer to be: "ASCII was not enough so I guessed UTF-16
would be better, seemed one step beyond UTF-8".

	The above statement is totally *wrong*.

UTF-16 is not bigger than UTF-8! It's a completely different mapping of
the same charcter set (think ASCII vs. EBCDIC), and is not used that
often. UTF-8 maps the whole Universal Character System! UTF-8 is
brilliant! There are a few (very few) reasons to use something else but
probably not ones that are relvant. I read your answer to be, ASCII was
not enough so I guessed UTF-16 would be better.

If you don't have time for this, pick UTF-8. Check that it works on
windows and you should be fine. XML people have thought a lot about this
and UTF-8 is becoming the world-wide standard.

If you didn't know what UTF-16 was, you owe yourself a few hours of
education. Seriously. This is not meant as a criticism. We all need to
understand Unicode. You can start with Joel's column which is, for me,
not quite enough. Budget a few hours, over separate days, on the FAQ.
Think of this as refreshing your education on what ASCII was all
about :-).

        
        The Absolute Minimum Every Software Developer Absolutely,
        Positively Must Know About Unicode and Character Sets (No
        Excuses!)
        http://www.joelonsoftware.com/articles/Unicode.html
        
        UTF-8 and Unicode FAQ for Unix/Linux
        http://www.cl.cam.ac.uk/~mgk25/unicode.html
        
        Unicode
        http://en.wikipedia.org/wiki/Unicode

ciao,
adrian

        Jody Garnett wrote:
        
        > Adrian Custer wrote:
        >
        >> 8) Why are .udig files utf-16?
        >> Wouldn't utf-8 be more readable, compact and generally desirable?
        >>  
        >>
        > Unsure - may be an over site or it may be something required as we 
        > move between windows and linux. The whole point of project files it be 
        > able to email them to others.
        >
        > Lets ask jesse as this is EMF generated code.
        >
        I only concerned about the restrictions of UTF-8  because there is a 
        more limited number of characters permitted but I don't know too much 
        about the specification.  I acknowledge that it is more readable and 
        compact but how much would it impact our ability to internationalize?  
         From a programming point it is not important it requires a total of 1 
        line of change.  If we could have a vote so I can get some opinions 
        that'd be great.
        The default was ASCII and I found that certain URLs couldn't be encoded 
        correctly so I recieved a number of bug reports about maps not being 
        saved correctly.  I bumped it to UTF-16 to virtually guarantee that that 
        problem would no long occur but maybe that was overkill.
        
        Jesse



Back to the top