This is true, but I honestly can't figure out how to fix it. As an example, I have a form with a simple textarea; someone on their mobile device types up what appears to be a simple description that includes a single or double quote, and their phone plugs in the curly quote instead.
(I checked, it mostly occurs with mobile; desktop users only have it when they're copying something from another website that doesn't use UTF-8)
So they type the curly quote in the textarea, then click submit. On my end, it goes through a Perl script that eventually adds it to MySQL (which shows "Server charset: UTF-8 Unicode (utf8)"). And then MySQL has it stored with characters like
.
Then on the user's end, the page loads and shows them the quote they used, so they think it's all good. But on desktop, the user just sees a diamond or a square block for the unrecognized character.
Other than the series of regex to replace those characters, I don't know how to change the charset in Perl. I found this from about 10 years ago, but I haven't tested it and don't know if it's still applicable:
use Encode::Detect::Detector;
my $unknown = "\x{54}\x{68}\x{69}\x{73}\x{20}\x{79}\x{65}\x{61}\x{72}\x{20}".
"\x{49}\x{20}\x{77}\x{65}\x{6e}\x{74}\x{20}\x{74}\x{6f}\x{20}".
"\x{b1}\x{b1}\x{be}\x{a9}\x{20}\x{50}\x{65}\x{72}\x{6c}\x{20}".
"\x{77}\x{6f}\x{72}\x{6b}\x{73}\x{68}\x{6f}\x{70}\x{2e}";
my $encoding_name = Encode::Detect::Detector::detect($unknown);
print $encoding_name; # gb18030
use Encode;
my $string = decode($encoding_name, $unknown);
If that's still a working solution, is it better than using a handful of regex to fix them as they show up?
For an immediate solution, is there any reason why I shouldn't just add another regex to the end of the list, like:
# the invisible character is in the first section, simply copied and pasted
s###g;