Forum Moderators: open

Message Too Old, No Replies

Poor unicode implementation in Chrome?

         

ergophobe

2:24 am on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was looking at a page being served up as UTF-8 that was showing "boxes" for unrenderable characters.

I thought the webmaster might have made a mistake, but since the site is the best reference dictionary in French and is wonderfully done, that seemed unlikely. Sure enough, I looked in Opera 10, Firefox 3.6, IE8, Safari for Win - all render the text correctly. If I paste it from Chrome into MS Word it renders correctly. Only Chrome chokes.

Anyone else seen this?

Unfortunately, WebmasterWorld is ISO-8859-1 only, so it is going to massacre this when I paste it in. You'll have to paste this into an html doc with language set to UTF-8 and have a look for yourself

[ε̃ʒeʀe], (il) ingère [ε̃ʒε:ʀ].

BTW, the translation to entities is happening WebmasterWorld side - I'm pasting this in as the straight characters, which is how it is coded on the page itself. So the entity translation is server side on WebmasterWorld.

drhowarddrfine

3:40 am on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It doesn't matter and I'm not aware of any such issues in Chrome. What does matter is how it is served. Chrome is fairly new and, perhaps, it is being served differently than other browsers. What is helpful is look at the http headers and see what is being sent there.

[edited by: tedster at 4:59 am (utc) on Mar 23, 2010]

ergophobe

7:36 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It doesn't matter


What doesn't matter? It matters to me.

What does matter is how it is served


There are many places along the chain where character encoding can go wrong. See this thread and the links from there
[webmasterworld.com...]

Chrome is fairly new and, perhaps, it is being served differently than other browsers


Nope. Nothing to indicate that and in any case, if I visit with Firefox but have it send the UA string for Chrome


Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.1.249.1025 Safari/532.5


That renders just fine. And in any case, why would you render Unicode correctly for every browser but not Chrome?

http headers and see what is being sent there


It doesn't have anything to do with that. Chrome correctly interprets the page as UTF-8, which it is. What Chrome doesn't do is render the text correctly. In other words, it seems to only be able render a subset of the Unicode code points and that subset excludes code points that every other major browser does render correctly.

I would say that Chrome is not fully Unicode compatible yet.

drhowarddrfine

9:30 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are many places along the chain where character encoding can go wrong.
Yes but it doesn't take away the fact that the http header overrules everything else.
why would you render Unicode correctly for every browser but not Chrome?
Servers don't render anything but they do look at what browsers send them. An incorrectly configured or unaware server may send the wrong headers to a different browser.
I would say that Chrome is not fully Unicode compatible yet.
Then you are saying Safari isn't either since they both use the same rendering engine. Granted, changes could be made that affect this but I'm not aware yet.

EDIT: There are no bug reports for "unicode" and "french" on the bug list. Nor are there any reports about this issue that I was able to find.

ergophobe

10:40 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>>http header overrules everything else.

Only for telling the browser which encoding to use. As I keep saying, Chrome sees the doc as UTF-8, it just doesn't render it correctly.

>>Servers don't render

I understand that. I should have said "encode it one way for one browser, but not for another".

>>wrong headers to a different browser

Even though I'm sending the same UA string in Firefox as in Chrome? Or more to the point even if I save the file locally and open it with the browser.

>>no bug reports for "unicode" and "french"

It's not an issue with French characters. It's an issue with phonetic transcription characters which are far less common.

If you search for "unicode" and "chrome" you will in fact find a variety of bug reports.

You'll also find reports of similar problems for Safari
[webmasterworld.com...]

There were definitely Unicode glitches in Safari 3. I don't know about Safari 4.

The strange thing in my case is that Safari renders the page correctly. It could be minor differences in the underlying rendering agent - chrome is the latest release, but my safari version is probably a bit behind.

encyclo

11:59 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can find no difference between IE8, Firefox 3.6 or Chrome 4.1.249.1036 (under Windows 7) for the above text - either as "straight" UTF-8 or using entity references.

What font are you using?

ergophobe

12:55 am on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



encyclo - not my page, just one I'm visiting.

font-family: Arial, Helvetica, sans-serif;

Chrome 4.1.249.1036 (under XP).

Which bring up something I meant to mention. Let's assume it's an issue on my system with fonts. If the font is available to all the other browsers, why not Chrome?

[update: I fired up a Vista machine and it looks okay in Chrome, so it has something to do with fonts on the other machine]

encyclo

1:24 am on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Chrome 4.1.249.1036 (under XP).


OK, it's the "under XP" part that makes a difference - I can see the problem also when I test with an XP machine, with the same version of Chrome. The missing glyphs occur in the six common fonts I tried (Times New Roman, Verdana, Arial, Trebuchet MS, Comic Sans MS and Courier New) - so it is not a font issue as such. This bug does not occur in Windows 7, so it the bug is related to the XP version of Chrome only.

XP is dead anyway, time to upgrade ;)

ergophobe

2:19 am on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>>upgrade

I have a rule - never upgrade OS on a functioning machine. The laptop with XP is doing fine thank you, aside from a few Unicode code points in Chrome.

When it feels dog slow, I'll upgrade hardware and OS.

drhowarddrfine

3:21 am on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you search for "unicode" and "chrome" you will in fact find a variety of bug reports.
If you search through webkit's and Chrome's official reporting site you won't.

XP is dead anyway, time to upgrade ;)
Hmm. I'll alert the 30,000 restaurants that use XP on our system.

The laptop with XP is doing fine thank you, aside from a few Unicode code points in Chrome.
Since the issue is only with one site, I don't think the issue is with Chrome. I installed Chrome on the XP box in the office and checked a few internation sites. No issues.

When it feels dog slow, I'll upgrade hardware and OS.
I don't run Windows. Generally FreeBSD but one Linux box. Everything from a PIII with 192Mb to some high end thing with 2Gb. People upgrading their Windows machines give me these things all the time, including their laptops. I've not bought a machine since 2000.

penders

10:08 am on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Chrome 4.1.249.1036 (under XP).


Yes, I see the same. Chrome under XP does not display those glyphs correctly. Other browsers are OK.

Edit:

However, if I explicitly specify a unicode font, such as "Lucida Sans Unicode" then Chrome(XP) does display the glyphs OK. (Which is how Notepad++ and Windows "Character Map" behave.)

encyclo

4:44 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



XP is dead anyway, time to upgrade ;)

Hmm. I'll alert the 30,000 restaurants that use XP on our system.


Please note the smiley after my comment, which is usually associated with humor.

;)

Since the issue is only with one site, I don't think the issue is with Chrome.


It's not just with one site, this is reproduceable with Chrome under XP. In fact, the same issue is present with IE6 - but that's a (nearly) dead browser (no smiley required here). So maybe an OS/font-handling problem - Google may well appreciate receiving a bug report seeing as other browsers can handle the issue better. Other than that, pender's solution above is surely the best one - use CSS to specify a few safe unicode fonts.

ergophobe

5:22 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



if I explicitly specify a unicode font, such as "Lucida Sans Unicode"


Good test! That makes sense. Lucida Sans Unicode is probably the most reliable Unicode font I've found on an XP machine.

I should have thought of it, because I recently needed to insert phonetic transcription characters into a Word doc. The doc was in Times New Roman, but I had to switch to Lucida Sans Unicode to get them to render properly.

I have some test code using hex and decimal entities in TNR and LSU fonts and that confirms what you report and encoding doesn't really matter (within obviousl limits - if the underlying doc text is UTF-16, things go nuts).

Since the issue is only with one site,


You are simply not reading my responses or the other responses here. It concerns the glyphs I gave in the first post and it happens no matter how and where you view them. I have viewed them locally. Encyclo has viewed them on a test page he created. Penders has obviously viewed them on a test page he created and played with fonts.

For the last time, this has nothing to do with the server, the server headers, or anything except the fact that these glyphs in the most common fonts render correctly in all browsers including Safari, but not in Chrome.

checked a few internation sites. No issues.


I appreciate your efforts - installing Chrome and having a look. I really do.

But would it be in order to say one more time that this does not concern foreign language characters, it concerns the glyphs that I mention in the first post, which are phonetic transcription characters?

I spend most of my day on "international" sites and this is a very specific issue that concerns the exact glyphs I mention. You can cruise "international" sites from now until next summer and not see this.

Try viewing this code on a page

<div style="font-family:'Times New Roman'">
<p>[&#949;&#771;&#658;e&#640;e], (il) ingère [&#949;&#771;&#658;&#949;:&#640;]. </p>
<p>Hex codes: [&#x3b5;&#x303;&#x292;e&#x280;e] (il) ingère [&#x3b5;&#x303;&#x292;&#x3b5;:&#x280;]</p>
</div>
<div style="font-family:'Lucida Sans Unicode'">
<p>[&#949;&#771;&#658;e&#640;e], (il) ingère [&#949;&#771;&#658;&#949;:&#640;]. </p>
<p>Hex codes: [&#x3b5;&#x303;&#x292;e&#x280;e] (il) ingère [&#x3b5;&#x303;&#x292;&#x3b5;:&#x280;]</p>
</div>


PLEASE NOTE this is not an entirely representative test because I'm talking about pages that use those characters as characters, not as entities, but WebmasterWorld won't let me paste those in.

It will, however, demonstrate what penders is talking about in his post.

kaled

10:17 pm on Mar 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've only skimmed this thread, however...

If this is a font issue, then maybe different browsers are using different fonts, presumably as a result of different interpretation of CSS rules. In this case, the fault is not with the browser but with the website (for not being explicit as to which font to use) and/or the operating system (for not including all the required glyphs in the font being used or not including the specified font at all).

Kaled.

ergophobe

1:10 am on Mar 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Kaled,

Actually, the tests above use rules that stipulate only one font, not a font stack. And the site in question allows the user to specify any of five fonts (Helvetica, Verdana, Arial, Times, Times New Roman).

So the font specifications are fairly precise. In the case of all the five fonts listed above, none of them have these glyphs and if you try to use them in Word, those glyphs will not render correctly.

So in that sense, in light of what we're finding in this thread, the title of the thread is misleading. Chrome's Unicode implementation is, in some sense, the expected result.

What's curious is that these glyphs render fine in all other browsers. Do all other browsers fall back to a Unicode safe font when they encounter code points that don't render in the current font?

penders

10:39 am on Mar 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What's curious is that these glyphs render fine in all other browsers. Do all other browsers fall back to a Unicode safe font when they encounter code points that don't render in the current font?


Wikipedia... [en.wikipedia.org...]
Some web browsers, such as Mozilla Firefox, Opera, Safari and Internet Explorer (from version 7 on), are able to display multilingual web pages by intelligently choosing a font to display each individual character on the page. They will correctly display any mix of Unicode blocks, as long as appropriate fonts are present in the operating system.


(Just to note, my machine (any browser) fails to display the last 6 characters/glyphs from the table on that page)

Firefox appears to choose a different (more appropriate) font to the other browsers. My default browser font is 'Times New Roman' (a serif font). Firefox is choosing another serif font to display those missing glyphs, whereas Opera, IE8 and Safari have picked a sans serif font (what looks similar to 'Lucida Sans Unicode' - although it is not the same!). I had to zoom the page to really see the difference. So, it looks like I have at least 3 fonts on my machine that contain these missing glyphs?!

Aside... if I specify 'New Century Schoolbook' as the preferred font (even with some defaults) then Chrome(XP) displays nothing (literally nothing)!? Other browsers are OK and display the text correctly in 'New Century Schoolbook'. Other (non-web) fonts appear to render OK in Chrome! Hhhmmm... this appears to be something to do with the utf-8 encoding?!

ergophobe

4:28 pm on Mar 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for that info penders.

I've done a fair bit with Unicode processing/conversion, but I haven't paid that much attention to how browsers handle fonts that aren't Unicode safe when confronted with Unicode characters.