Forum Moderators: coopster & phranque

Message Too Old, No Replies

Stripping "utm " code from submitted URL

         

csdude55

7:50 am on Jul 30, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm wanting to remove any variable in a string that looks like "utm_whatever=whatever". As far as I can tell, there could anywhere from 0 to 5 of utm_whatever variables in the string, they could be delimited by & or &, and I guess that "whatever" could contain any type of character. So it could potentially look like:


?var1=true&utm_campaign=Cmpn+Name,%20Bad&utm_content=55a8aefe04&utm_medium=social&utm_source=facebook&var2=true


In which case, all I would want is:


?var1=true&var2=true


This is what I wrote:


# Remove utm_whatever
$uri =~ s#utm_\w+=.+(&(amp;)*)*##gi;

# Trim repeating &; maybe not necessary since I included (&(amp;)*)* above
$uri =~ s#&&+#&#;

# Trim trailing ? or &; am I right that I don't have to escape ? in brackets?
$uri =~ s#[?&]$##;


Do you guys see any reason why this would catch something other than utm_whatever, or cause any other problems?

Also, can you suggest a way to not require 3 separate commands?

phranque

1:07 pm on Jul 30, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



maybe not necessary since I included (&(amp;)*)* above

as far as i can tell not necessary.

am I right that I don't have to escape ? in brackets?

yes.

http://perldoc.perl.org/perlrecharclass.html#Special-Characters-Inside-a-Bracketed-Character-Class
Characters that may carry a special meaning inside a character class are: \ , ^, - , [ and ], and are discussed below. They can be escaped with a backslash, although this is sometimes not needed, in which case the backslash may be omitted.

lucy24

4:32 am on Jul 31, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



$uri =~ s#utm_\w+=.+(&(amp;)*)*##gi;

Why doesn't the .+ part (immediately after the literal = sign) result in capturing--and hence deleting--the entire remainder of the string? I'd feel safer with [^&] instead. Or [^&\n] if required by context.

using [ code ] markup instead of [ quote ] to get rid of unwanted smiley