Forum Moderators: open

Message Too Old, No Replies

MySQL, using regex pattern in the data

         

csdude55

6:14 pm on Sep 16, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As I'm moving raw data to MySQL, I've hit upon one little snag that I'm hoping you guys and gals can help me fix.

In the original script, I was comparing input data to a regex in Perl, like so:

if ($text =~ m{^(
csdude\d+ |
nickmns |
phranque |
lucy
)}x) { ... }


So it would match "csdude5" or "csdude55", but not "csdudeabc" or "that csdude is soooo cool".

With "nickmns", "phranque", and "lucy" as rows of data in MySQL, I'm selecting using LIKE (in Perl). So inputting, say, "nick" would match "nickmns":

# untested, so please ignore any typos
$input = 'nick';

@arr = $dbh->selectall_array(<<EOF,
SELECT foo, bar FROM table WHERE column LIKE '?%'
EOF
undef,
$input);


But if I enter "csdude" as a row in MySQL and then use that statement, "csdudeabc" would match "csdude", which I do not want. I only want it to match "csdude" if the input data ends with a number.

I can't use REGEXP '?\d+' because the number-only limitation doesn't apply to every row, just this one.

I could put the regex in MySQL, then select all of them and build a pattern; eg:

# is there a better way to select a column and join it all
# to a | delimited string?
@arr = $dbh->selectall_array("SELECT pattern FROM table");

foreach (@arr) {
if ($pattern) { $pattern .= '|'; }
$pattern .= join('|', @$_);
}

if ($input=~ /($pattern)/) {
,,,
}


But in practice I have about 300 rows, so that's a pretty large pattern. Plus it would require a second MySQL query, so I would consider this a last resort.

Before I go that route, is there another way to do this?

robzilla

7:18 pm on Sep 16, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm a little disappointed that I didn't make it into your regex/table.

# is there a better way to select a column and join it all
# to a | delimited string?

This one I can answer, you could use GROUP_CONCAT:
GROUP_CONCAT(pattern SEPARATOR '|')

As for your other problem, no ideas come to mind right now, it's a little messy.