Forum Moderators: coopster & phranque

Message Too Old, No Replies

Fun with named groups, replace the match with the pattern name?

         

csdude55

5:25 am on Oct 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is there a way to get the matched pattern's name within the regex? Eg,

$_ = "hey ipsum, let's all go to the bar!";

s{
(?<lobby>bar) |
(?<lorem>ipsum)
} { # do something }xgi;

# Desired result:
# hey lorem, let's all go to the lobby!



In the more complicated reality, what I really have is an array like this:

@matches = (
'blank',
'[o0]',
'[$s]',
'[t+]',
'(?!this|that)',
'[dr]'
);


and I want to substitute all of those in a string with it's corresponding index; eg:

$this = "that'[$s] some[t+]hing";

# Desired result:
# that2 some3hing


My theory was that if I could replace the match with the name then I could name them something strategic (like (?<F::1>[o0]), substitute them all, then go through one more time and remove the F::.

I got it to work in a loop easily enough, but in practice that loop ends up making me run close to 300 expressions! So I'm hoping to do it all in one and eliminate the loop :-)

lucy24

10:01 pm on Oct 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can it be done with a two-dimensional array? (Keeping in mind that I don’t speak perl.)

Conceptually: if some part of teststring matches arrayname[n,1] then use arrayname[n,0] for the output.

csdude55

11:12 pm on Oct 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't think this is Perl specific, for the most part regexes seem to be similar enough across the board. I might not be saying this right, but I guess if they're POSIX? Whatever that means... LOL

In your concept, though, I would still have to do a substitution in a loop, right? Loop through the array and then substitute on each index? That's essentially what I'm doing now, but I'm hoping to eliminate the expressions in a loop because the quantity creates a little bit of a performance bottleneck.

lucy24

12:20 am on Oct 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, I don’t see how it can be done without a loop. At most, you could first find the match--using your (blahblah|foo|bar|widget|muggle) RegEx--and then take an extra step to compare the resulting $1 or \1 against an array. That seems like two steps forward and one step back, though at least you wouldn't be looping through 300 Regular Expressions.

“Looping through 300 Regular Expressions” sounds like the title of a work of art, but I can’t decide which art.

csdude55

7:32 pm on Oct 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



(Keeping in mind that I don’t speak perl.)

Ya know, @lucy24, I kinda get the feeling that I might be the last person in the world using Perl! LOL I noticed that, with 1 exception, I'm the only one that has started any threads in this subforum for almost 3 years!

Which blows my mind, really. Perl was the first language I learned, and it's WAY more powerful than any of the others I've used! I honestly think that if it weren't for the cgi-bin requirement (so that the user could use /home/example/www/index.cgi) then it would still be the most popular.

But anyway.

“Looping through 300 Regular Expressions” sounds like the title of a work of art, but I can’t decide which art.

Kinda reminds me more of "Death by a Thousand Cuts" :-O LOL

I'm finding that I use this same type of loop repeatedly across scripts, and combined it's definitely a bottleneck. There HAS to be a better way! Just 15 minutes ago I found where I'd done this same thing several years ago:

%asciiChars = (
'592' =>'a',
'596' =>'c',
# 101 more of these
'347' =>'s'
);

if (/&#\d+;/) {
foreach $key (keys %asciiChars) {
s/&#$key;/$asciiChars{$key}/g;
}
}


At least in that one I could easily test for matches before getting in the loop, but if a submission included &#347; then it still resulted in 104 expressions :-/

I was able to make it better using:

@matches = /&#(\d+);/g;

foreach $key (@matches) {
s/&#$key;/$asciiChars{$key}/g;
}


Now it only runs the expressions that have prematched; meaning, if a submission includes &#347; then it just runs the 2 expressions (the one to set @match, and then the one in the loop).

The problem that prompted this thread doesn't have that option, but I MAY have found a workaround... I'll update the earlier thread:

[webmasterworld.com...]