Forum Moderators: open

Message Too Old, No Replies

MySQL REGEXP within a Perl script

Sending a param to a REGEXP

         

GoNC

10:46 pm on Oct 5, 2014 (gmt 0)

10+ Year Member



This is in a Perl script using the DBI module, so I wasn't really sure whether to put this under Perl or MySQL.

The first part of the script checks the database to see if there's already a match; if so, $dupe is set to a true value:

##
$var = param('submitted_content');
$dupe = 0;

my $temp = $dbh->prepare("SELECT COUNT(1) FROM table WHERE col=?");
..$temp->execute($var);
..$dupe = $temp->fetchrow_array();
##

That's not the exact code, of course, but pretty close. It's been working for a few years, so please ignore any typos.

Now, though, I'm wanting to test for people trying to create a value that LOOKS like a match, but isn't; ie, exchanging a capital "i" with a lowercase "L", or a capital "o" with a number "0".

This is what I've written to follow the above, but not tested:

##
if (!$dupe && $var =~ /il10\./i) {
..# Remove opening or trailing .
..if ($var =~ /^\.|\.$/) { $t_user = $var =~ s/^\.|\.$//; }

..# Replace common hacks with _ wildcard
..$t_user = $var =~ s/[il10\.]/_/gi;

..my $temp = $dbh->prepare("SELECT COUNT(1) FROM table WHERE col REGEXP ?");
..$temp->execute($t_user);
..$dupe = $temp->fetchrow_array();
}
##

I'm not sure that this will work, though; shouldn't using the ? wildcard automatically escape the _ so that it's no longer a regex wildcard?

I'm sure that I could just send "...REGEXP $t_user", but then of course I'm opened up to all kinds of problems.

Of course, it's not really perfect, anyway; even if the code was fine, then a submission of "live10" would match "a very", which really isn't even close to a duplicate.

Can you guys suggest a better way to catch these types of intentional copies?

# Note: the .. are added above to space the code out, hopefully making it more legible.