Forum Moderators: coopster

Message Too Old, No Replies

php insert chinese in database problem

         

kbts

7:22 am on Jun 23, 2008 (gmt 0)

10+ Year Member



Hi,

In phpMyAdmin:
MySQL connection collation is utf8_unicode_ci.
My tables in the database also has collation utf8_unicode_ci.
In my webpages, I have <meta http-equiv="content-type" content="text/html; charset=utf-8"/> within <head><head/>

But when I enter some Chinese via PHP (actually, some input from a form), it shows up as gibberish (weird symbols) in the database. But the Chinese shows up correctly in the webpages. Before, the Chinese showed up as (e.g. &#23498;). I don't know what caused it to show as gibberish now.

Note: If I do an Insert of chinese characters in the database through phpMyAdmin, it stores as the actually Chinese characters.

I want to store the Chinese as &#23456; format (because I believe that's how they're usually stored in the database. If not, please tell me). Can anyone give me some idea of how to store in &#23456; format?

Any help is appreciated.
Thank you,
kbts

[edited by: eelixduppy at 2:23 pm (utc) on June 23, 2008]
[edit reason] disabled smileys [/edit]

RonPK

11:25 am on Jun 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Character sets can be messy business. What happens if you run this before the query:

mysql_query("SET NAMES 'utf8'");

penders

1:23 pm on Jun 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I want to store the Chinese as &#23456; format (because I believe that's how they're usually stored in the database. If not, please tell me).

Just a thought... I'm not sure that this is necessarily a good idea? &#23456; is an HTML entity reference, which is OK if you want to retrieve and display the text in an HTML page, but not much else. What if you want to search for this character? You will first need to convert it into the numeric HTML entity reference, but where do you draw the line? Also, &#23456; will take up 8 bytes, whereas the UTF-8 encoded character will certainly be <= 4.

You would need to use the HTML entity reference '&#23456;' if you weren't using a unicode character encoding (UTF-8 in this case), but the big advantage of using UTF-8 is you don't need to.

kbts

6:49 pm on Jun 23, 2008 (gmt 0)

10+ Year Member



Thank you both for your responses.

penders, for utf-8 encodings, is it stored as "gibberish" in the mySQL database? I just want to make sure I'm doing the right thing.

Thanks for your help,
kbts

penders

11:05 pm on Jun 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



penders, for utf-8 encodings, is it stored as "gibberish" in the mySQL database?

I would guess you might see "gibberish" if you try to view the UTF-8 encoded content (from the database) on a NON- UTF-8 encoded page. I can't really say whether this is correct, but you seem to imply it is being stored OK...

...it shows up as gibberish (weird symbols) in the database. But the Chinese shows up correctly in the webpages.

If the chinese characters show up in the webpage, then I can't see at the moment how it differs from this...
Note: If I do an Insert of chinese characters in the database through phpMyAdmin, it stores as the actually Chinese characters.

Are you using the same method to view the content of the database in both instances?

kbts

11:43 pm on Jun 23, 2008 (gmt 0)

10+ Year Member



Please let me restate what's happening now:

If I select something from the database to be displayed on a webpage, the Chinese shows up fine.

If I insert something into the database via an html form, the Chinese shows as gibberish when I view it with phpMyAdmin.

If I insert something into the database via phpMyAdmin, and then viewing that record in phpMyAdmin, it shows up as the Chinese character (non-gibberish).

My question is, can I assume everything is "OK" if the Chinese shows up correctly on the webpage (while the Chinese shows up as gibberish if I view via phpMyAdmin)?

Thanks,
kbts

npwsol

4:15 pm on Jun 24, 2008 (gmt 0)

10+ Year Member



It sounds like an encoding issue. If it is showing up correctly, then you should have no problems. Just make sure you set the character set on the web pages you are to be showing the characters on.

It could be the way your forms are uploading or the way PHP is parsing the characters. It sounds like it is using a regular ASCII character set (iso-88whatever) at some point in the process / database storage / display process. Since the characters are displaying correctly and you say the database has UTF settings, I would check in the form processing.

I don't believe PHP currently has native support for UTF-8, which means strings will be handled by default as ISO-8859-1. I don't work with character sets in PHP, so I can't tell you any more, except that I think the PHP string encodes it as an ASCII string on input.

kbts

12:13 am on Jun 29, 2008 (gmt 0)

10+ Year Member



Thanks for your help. I've already put the following in php before I did any data processing:

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

I believe they tell php to use UTF-8 as the encoding.

Anyone have other suggestions for this?

Thank you,
kbts

kbts

9:06 pm on Jun 29, 2008 (gmt 0)

10+ Year Member



The solution is, need to add the following before any php data processing:

$connection = @mysql_connect(DATABASE_HOST, DATABASE_USER, DATABASE_PASSWORD);

# Set character_set_results
mysql_query("SET character_set_results=utf8",
$connection);

# Set character_set_client and character_set_connection
mysql_query("SET character_set_client=utf8",
$connection);
mysql_query("SET character_set_connection=utf8",
$connection);

Regards,
kbts

RonPK

9:46 pm on Jun 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Allow me to refer to the manual, at [dev.mysql.com...] :

A SET NAMES 'x' statement is equivalent to these three statements:
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;