Forum Moderators: coopster & phranque

Message Too Old, No Replies

Regex variable length lookbehind

         

csdude55

6:23 pm on Sep 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm using a negative lookbehind regex, like so:

if ($foo =~ /(?<!most\s)wanted/i) {
...
}


The goal is to match if $foo contains "wanted", but only if it is not preceded by "most\s".

The error I'm getting is:

Variable length lookbehind not implemented in regex m/(?<!most )wanted/

(Note, I removed the \s for testing and it didn't help)

I've found this being discussed as far back as 2013, and maybe it's a bug in Perl? I'm not sure:

[stackoverflow.com...]

Either way, can you suggest a fix, or a better way to do what I'm wanting?

I found that I could use \K for positive lookbehind, but not negative:

[regular-expressions.info...]

The only alternative I can think of is to set another variable to remove "most wanted" from $foo, then test; eg:

($bar = $foo) =~ s/most wanted//g;

if ($bar =~ /wanted/i) { ... }


But I've used this in several sections, which means several unnecessary placeholder variables. I'm trying to avoid unnecessary variables when I can to speed things up, so before I do that is there a better way?

lucy24

6:35 pm on Sep 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Quick test before getting complicated: replace \s with the literal strings that it represents; presumably there's a finite number of possibilities, so either "most " or "most[  \n\t]" or whatnot. Does it still raise a fuss?

The current release of SubEthaEdit has decided that " " (nbsp) isn't a space character, and hence also doesn't count as \W. Is it possible the current perl has taken \s to include \r\n (Windows line break), meaning that it can be either one character or two, which makes lookbehinds unhappy?

csdude55

6:48 pm on Sep 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Unfortunately, I removed the whitespace entirely and it still throws the same error :-( You can paste this to [jdoodle.com...] if you want, that's what I'm using right now for testing (set to version 5.22):

#!/usr/bin/perl

$foo = "this is the most wanted thing ever";

($foo = $bar) =~ s/most wanted//g;

# removed the whitespace, but also tested with . and putting the space outside of the ( )
if ($foo =~ /(?<!most)wanted/i) {
print "yes";
}

else { print 'no'; }


I'm getting the same error on my server (using v. 5.16.3), so this is a relatively safe / faster way to test.

csdude55

8:50 pm on Sep 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I thought this would work to eliminate the placeholder variable, but no luck. No error or anything, it just doesn't give the result I wanted:

$foo = "this is the most wanted thing ever";

if (($foo =~ s/most wanted//g) =~ /thing/i) {
print 'yes';
}

else { print 'no'; }



I expected it to print "yes", and then if I changed "thing" to "wanted" then it would print "no". But I'm getting "no" in both cases :-( I'm guessing that the if() is seeing a "true" or "false" instead of the result? Oh well, that would have been an easy enough fix had it worked :-/

csdude55

12:27 am on Sep 24, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, I might have come up with a workaround using the /r modifier. I remember that it's a late-comer (maybe as of v.5.14?), but applying it gives the result I expected:

$foo = "this is the most wanted thing ever";

if (($foo =~ s/most wanted//gr) =~ /thing/i) {
print 'yes';
}

else { print 'no'; }


I don't love this, because running 2 regexes (regii?) has to be a lot slower than I wanted. And it's going to get messy if I have several of them in a single string. But until a better option is presented, it works.