Forum Moderators: phranque

Message Too Old, No Replies

Why don't we all use UTF-8?

         

Karma

8:49 am on Apr 27, 2007 (gmt 0)

10+ Year Member



Hi,

This may be a dumb question, so please enlighten me if so, but if UTF-8 can display any character, what's the point in using other character sets?

phranque

9:19 am on Apr 27, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



it makes little difference for plain ASCII strings.

UTF-8 is a variable length (1-4 bytes per character) character encoding.
from a programming point of view, it's more work to handle the multi-byte and/or variable length cases.
in some cases UTF-8 can consume more space than other possible encodings.
there could also be database compatibility/conversion issues.

if 100% of your website can be displayed in Latin1, it's extra work for no apparent gain.

Karma

9:57 am on Apr 27, 2007 (gmt 0)

10+ Year Member



Is the argument that UTF-8 encoding uses more space still an issue in 2007?

Also, I can't think of many examples where a website programmer would need to go down to byte level programming? For the vast majority of web developers, I see UTF-8 a more future proof character set for their websites and databases.

lammert

10:14 am on Apr 27, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I use UTF-8 on some of my websites (mainly multilingual sites where languages like Russian are used) and ISO8859-1 on others (the older sites). The main problem with UTF-8 is that for most languages you won't see the difference, and many tools generate code and data for one-byte coding systems only. It is more work to get a UTF-8 site working, it is more work to keep everything UTF-8 compatible and most sites are focussed on one language only so the webmaster and user don't see the difference.

phranque

10:19 am on Apr 27, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Is the argument that UTF-8 encoding uses more space still an issue in 2007?

3 bytes vs 2 bytes per character is a 50% percent increase...

Also, I can't think of many examples where a website programmer would need to go down to byte level programming?

counting characters?

For the vast majority of web developers, I see UTF-8 a more future proof character set for their websites and databases.

certainly makes sense if planning for future translation/internationalization.

Karma

10:32 am on Apr 27, 2007 (gmt 0)

10+ Year Member



Many thanks all, I think I understand a little better now. It turns out that I do need to use UTF-8, something I simply didn't think about when I first started my website (hence the future proofing topic).

Does anyone know what impact switching from latin1 to UTF-8 will have on the search engines etc?