I am building a new website, everything is written in utf-8. I have several js and css files, along template html files. The output is in xml and xhtml, and I have tried to write everything strictly.
Generally, my perl program reads from a csv text database, mixes some of it with html templates, which also contain css and javascript refereces to files I wrote or libraries like reflection and prototype etc.
I have two servers to run and test these. First is local on my own computer running Linux Mint, Apache2, perl 5.8.8 etc. Second is a Debian which final results will be hosted on.
1. when I browse an html file (containing the same css and js etc, whithout passing the template through the perl file (there is one static index.html containg the same as what perl generates as index.html), there is no problem, everything is stable and a thousand refreshesh on any of two servers shows no difference or unstability.
2. Then when I browse the same code generated by my perl program, every now and then, when I refresh the page, it shows a little difference (change in font size, or line height etc.).
I guess this is maybe because I mix many files and a few of them might not be utf-8 (although I have tried to read all and save as utf-8 on Linux), and that the generated output misses some of the css info, but why only sometimes?! why this doesn't happen every time?
Can anybody please help?!
Paymaan.
Can it be because maybe some of my javascript (maybe, only maybe scriptacalous or prototype), are not in utf8 while the rest is?
If your perl program generates utf-8 encoded document but your server header content-type defaults to charset=iso-8859-1 for instance, then your layout may change rather unpredictably as browser doesn't know how to render those invisible BOM bytes.
You have to ensure that default content-type headers output in both your servers is set to utf-8.
For instance, google for server header checker. Enter the full URI path for the document that you are checking server headers for. In your case that will be one of the documents/pages that is showing layout differences - that is passing the template through the perl file before html is generated and served to the browser. Test must show Content-Type: text/html; charset=utf-8.
If server headers are set to utf-8 then you might need to check some other bottle neck.
Say, check if your perl program correctly pulls and represents utf-8 encoded data from your database? BTW, are you sure your database is correctly setup to store utf-8 encoded data? A possibility is that the SQL files are written in Unicode with a BOM, which your MYSQL cannot interpret accordingly?!
by the way, this thread probably has quite a bit more useful information than you need about utf-8:
Character encoding, entity references and UTF-8 [webmasterworld.com]
my perl program reads from a csv text database
are you sure your database is correctly setup to store utf-8 encoded data?
i misread the OP to mean text in csv files, not necessarily an actual db.
the db must indeed be configured for utf8.
if you are using mysql this may help:
MySQL :: MySQL 5.0 Reference Manual :: 9.1 Character Set Support [dev.mysql.com]
Both sites show the same behaviour. Then About database. It is written by myself and is some kind of CSV, better say it is a pipe separated text file in utf-8.
Then yes, all my html/xhtml files contain the correct content type both for xml header and for html header, something like this:
--------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
-------
included on every page.
So what else might be the reason?
Something really simple came to my mind, I use xhtml for all tags, is there any "text/xhtml; charset=utf-8" ever exists?
Date: Wed, 31 Dec 2008 21:47:56 GMT
Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.4 with Suhosin-Patch mod_perl/2.0.3 Perl/v5.8.8
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
200 OK
i just checked on a IIS-hosted site i've been working on and i thought i was ok because i have the following meta tag in the head:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
if i do a "lwp-request -eSd 'http://example.com/'" i get the following Content-Type headers:
Content-Type: text/html
Content-Type: text/html; charset=utf-8
however if i use the firefox Web Developer plugin the response headers only show:
Content-Type: text/html
so it looks like IIS is adding its own HTTP Response header and that is getting precedence over the meta tag.
maybe the same thing is happening with your apache servers.
phranque, yes, not just IIS but every web server (incl. Apache) response header is getting precedence over the meta tag if BOM's are at hand, because those byte order marks are sent just after server headers i.e. before any meta tag included in html head section.
Paymaan, if you have no access to web server settings where your site will be hosted you can make a small modification to your template/perl program mix. You might want to include something like this:
<?perl
Header (type=>'content',val=>'text/html; charset=utf-8'); # HTTP Content-type header
?>
Hope that will resolve your issue.
lwp-request -eSd 'http://example.com/'
GET http://example.com/ --> 200 OK
Connection: close
Date: Thu, 01 Jan 2009 18:07:03 GMT
Accept-Ranges: bytes
ETag: "7ac0e6-367f-9c6bb1c0"
Server: Apache/2.0.54 (Debian GNU/Linux) FrontPage/5.0.2.2635 mod_python/3.1.3 Python/2.3.5 PHP/4.3.10-16 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_perl/1.999.21 Perl/v5.8.4
Content-Length: 13951
Content-Type: text/html
Content-Type: text/html; charset=utf-8
Last-Modified: Sun, 28 Dec 2008 10:13:35 GMT
Client-Date: Thu, 01 Jan 2009 18:13:44 GMT
Client-Response-Num: 1
Link: </css/site3print.css>; /="/"; media="print"; rel="stylesheet"; type="text/css"
Link: </css/site3.css>; /="/"; media="screen"; rel="stylesheet"; type="text/css"
Almost the same happened to the http://localhost/ server, and its content type matches. Also that webdeveloper plugin in Firefox gives utf-8 in both servers as content encoding type, so do you still think this can be a BOM problem?
ANd wildbest, you pointed something which I guess it might lead us to something useful, I have my start page in two versions, one is an static all html page, and the one is the same but passes though the perl script. The static page is always the same with no problem, you made me think that my perl script gives the wrong header before templates, so that is the problem? Let me check this and I will give my report in a few minutes.
[edited by: phranque at 7:10 am (utc) on Jan. 2, 2009]
[edit reason] exemplified/unlinked urls [/edit]
sub html_header
{
if ($HEADER != 1)
{
$HEADER=1;
print "content-type: text/html; charset=utf-8\n\n";
}
and firefox now says the content-type is text/html and charset is utf-8, Also I have this line in the start of my perl script is "use utf8".
But still the same problem happens in my localhost, any other suggestions?
I have access to my own localhost Apache server, but the main server is not easily accessible and I prefer resolve the problem without altering the server, if possible at all.
Something else, I have two content types produced, one inside templates, one in Perl script, may this be a source for problem?
if ($HEADER != 1)
From what you've posted I can see that your web server header conetent-type is:
Content-Type: text/html
This must read:
Content-Type: text/html; charset=utf-8.
checks to see if the header is sent before or not
It's a wrong approach to avoid "headers already sent" error!
If you have some heavy scripting/checks before html output you have to open a buffer, do whatever you have to do, get buffer contents into variable, clean buffer, send headers, print that variable. Voila...
If no problem happens, it will give the header and out put. That's all. So where I am doing a mistake? please describe so I can understand.
Also I have this line in the start of my perl script is "use utf8".
Has nothing to do with the output of your perl program, it enables UTF8 in the source code.
Quoted from the UTF8 function:
Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.
[perldoc.perl.org...]
The first thing my code sends to out put is that line of Content-Type:, now including charset=utf-8, but the unstability still exists with dynamic pages
I still think this is a browser/server/caching problem. The fact that the pages are delivered dynamically by your script might be contributing to the problem. But the fact that it seems to happen only occasionaly in thousands of page refreshes seems like its nothing to worry about either. Few people are probably going to refresh your page like that over and over.
The static page is probably cached locally on your computer, and the dynamic page is not. Everytime you refresh the dynamic page the page and the style sheet has to be fetched and parsed, and there is a bit of a timing issue sometimes where the page loads before the style sheet loads. There might be something in your html/ccs that also contributes to the problem.
If it was once in a thousand times as you say, there was no concern, actually IE on windows shows worse than that when it is gets the page first and images and CSS later and I am not too much concerned with IE problem.
But this one we talk happens something like every 5 to 10 times, and when it happens, while no major problem happens, most of the page content except one div are shifted about a half a centimeter upwards. And this is seen obviously, which makes me very uncomfortable.
actually IE on windows shows worse than that when it is gets the page first and images and CSS later and I am not too much concerned with IE problem.
But this one we talk happens something like every 5 to 10 times
That problem is like IE loads the pages, sometimes all the paragraphs and images are not laid out correctly (text over image, over table, and an image over another image!), and when you refresh the page, everything is fixed then on.
The main problem, and zoom level, all zoom levels are normal, and I don't use 1px div's. I only simulate an spacer in only one of templates, and that is with a png image. Problem happens aon every dynamic page.
i would imagine the css and javascript content should also be served utf8 encoded.