Forum Moderators: open

Message Too Old, No Replies

Importing Excel into MySQL puts question marks instead of apostrophes

character encoding issues with importing from Excel to mySQL

         

tensai288

4:41 pm on Apr 5, 2009 (gmt 0)

10+ Year Member



Hi,

I'm using Navicat to import an Excel file into a mySQL database. When I go to view my webpage, question marks appear instead of apostrophes, hyphens and other special characters.

I know this is a character encoding issue, with the format Excel uses causing the issue however I'm not sure how to solve it.

I want to still use MS Excel (workbooks) to manage my database content so I'm not sure how to get round the character encoding issue without using a text file etc.

Anyone have any experience with solving this?

Cheers

rocknbil

6:25 pm on Apr 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First, be sure it's not an encoding issue with your server or output documents. Paste a chunk of the errant code into a web document, using the doctype you normally use, upload to your server, call from a browser (be sure to check FireFox; the MS browser will be "forgiving" of proprietary MS characters.)

If it's still hosed up, you may just have to change your document encoding or change that the proper encoding is being sent from your server. More information [webmasterworld.com].

If it's fine that way and only messed up after storing it in your database, the issue is probably related to the mysql server collation or database and/or table collation [dev.mysql.com] you are using. Alter the tables, change the collation; you may have to re-import the data (but maybe not.)

willybfriendly

7:17 pm on Apr 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know this is a character encoding issue, with the format Excel uses causing the issue however I'm not sure how to solve it.

Ditto Bill's response, and I will add, are you sure it is not Word stuff pasted into Excel?

Curly apostrophes, em-dashes, etc. will drive a sane man daft. There are a few recent threads discussing this stuff.

tensai288

8:21 pm on Apr 5, 2009 (gmt 0)

10+ Year Member



Thanks for the replies guys

willbfriendly - i do believe it is Word stuff pasted into Excel being the problem, however pasting it first into Notepad, and then into Excel doesn't fix the problem. I have to manually change the apostrohpes in notepad/or in excel in order to get them to appear correctly online.

Going to read the threads rocknbil has linked to now.

willybfriendly

11:31 pm on Apr 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am not familiar wtih Navicat.

If you are using PHP, you can try something like this before you store the data:

$text = mb_convert_encoding($text, "EUCJP-WIN", "UTF-8");

Also, see here [webmasterworld.com] for a discusssion where I was dealing with a similar issue. I ended up with the following PHP function which does what I needed (you would probably have to tweak it for your needs).

function cleanUpHTML($text)
{
//discard unwanted tags
$text = strip_tags($text, '<p><b><i><ol><ul><li>');
//strip header stuff
$text = stristr($text, '<P');
//strip all attributes (Word garbage)
$text = preg_replace("/<(\w)[^>]*?>/s", "<$1>", $text);
//get rid of useless non breaking spaces
$text = preg_replace("/&nbsp;/", "", $text);
//get rid of empty p's
$text = preg_replace("/<p><\/p>/i", "", $text);
$text = mb_convert_encoding($text, "EUCJP-WIN", "UTF-8");
return $text;
}