Forum Moderators: phranque
s{\b(?:
\w+(?:&#\d+;)+\w*(?:&#\d+;)*|
(?:&#\d+;)+\w+(?:&#\d+;)*\w*|
(?:&#\d+;){3,}
)\b}
{****}xg; I guess anything found on the standard keyboard without using tricks to add umlauts or somethingIf your users are in the US, that’s ASCII. But if they have a non-English keyboard (any French-Canadians use your site?) certain common modified characters will be right there on the keyboard. Other common non-keyboard characters are things like currency: € £ ¢ and so on.
// HTML
<div id="contenteditable">
thïš ïš ä prétty<b></b> thöröûgh štrïng
</div>
<textarea name="data" id="data" hidden></textarea>
// JQuery
// I know jQuery isn't necessary here, but I'm already using it for other things
String.prototype.encodeHTML = function () {
return this.replace(/[\u0080-\u024F]/g,
function (v) { return '&#' + v.charCodeAt() + ';'; }
);
}
$('#data').text(
$('#contenteditable').html().encodeHTML()
);
// at this point, the textarea contains:
// thïš ïš ä prétty<b></b> thöröûgh štrïng
// Perl
$_ = param('data');
%chars = (
'239' => 'i',
'353' => 's',
# and so on
);
# convert approved entities to letters
s{&#(\d+);}
{$asciiChars{$1}}g;
# filter anything else
s{\b(?:
\w+(?:&#\d+;)+\w*(?:&#\d+;)*|
(?:&#\d+;)+\w+(?:&#\d+;)*\w*|
(?:&#\d+;){3,}
)\b}
{****}xg; can you suggest how I can view the entire list of decimal references for 0100-024FCriminy. I knew That Other OS was weird about text input, but is there really nothing that lets you view the characters?
Yup, the British keyboard has the £ sign in the location occupied on US keyboards by #. (! Is that why phone menus inexplicably call # the “pound sign”?)