Forum Moderators: coopster & phranque

Message Too Old, No Replies

Can these 3 maps be compressed to one?

or, other common tracking IDs to remove?

         

csdude55

6:26 pm on Jul 3, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't really used map very often, so this is really more for my own education. Here's the code I'm using:

# I'm using URI::Find, so @_ is defined automagically
@_ = ('https://www.example.com/?fbclid=whatever#utm_campaign=foo&utm_source=bar',
'https://www.example.com/?fbclid=whatever#utm_campaign=foo&utm_source=bar');

# remove utm_whatever, ocid, trkid, gclid, fbclid, data-whatever, role, cite, itxt, and itxt-whatever
map s#(\?|&(amp;)?)(utm_\w+?|ocid|trkid|gclid|fbclid|data-[\w-]+?|role|cite|itxt[\w-]*?)=[^&]+#$1#gi, @_;

# remove repeating &
map s#(&(amp;)?&(amp;)?)+#&#, @_;

# remove trailing ? or &
map s#(\?|&(amp;)?)+$##, @_;


These obviously have to be done in order; remove the offending params, which might leave repeating &... then remove the repeating &, which might leave a trailing ? or &... then remove the trailing ? or &.

Is there a way to do all of this with one map, though, instead of using 3 different ones?

fishmonger

2:53 pm on Jul 4, 2019 (gmt 0)

5+ Year Member



What is your real goal? What do you want to end up with?

Is this what you're looking for? [metacpan.org ]

Or maybe this module: [metacpan.org ]

csdude55

6:26 pm on Jul 4, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the example, the end result would simply be:

@_ = ('https://www.example.com/',
'https://www.example.com/');


But for the sake of clarity, if the original URL had params that weren't part of the list then they would remain:

@_ = ('https://www.example.com/?fbclid=whatever#utm_campaign=foo&utm_source=bar&foo=bar',
'https://www.example.com/?fbclid=whatever#utm_campaign=foo&utm_source=bar&foo=bar');

# results in
@_ = ('https://www.example.com/?foo=bar',
'https://www.example.com/?foo=bar');


I see how I could use those modules to split up the query string, do a loop to strip them out, then join it back together. But I'm not sure if that's better (faster) than the maps?

fishmonger

6:47 pm on Jul 4, 2019 (gmt 0)

5+ Year Member



Don't worry about the speed yet. If needed, you can profile it and optimize it later.

Follow the standard unix rules:
1) make it work
2) make it work right
3) make it work faster

If/when you need to profile it, use: Devel::NYTProf - Powerful fast feature-rich Perl source code profiler
[metacpan.org...]

csdude55

8:16 pm on Jul 4, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



After some poking around, it does look like they can be mashed in to a single statement:

map {
# remove tracking id
s#(\?|&(amp;)?)(utm_\w+?|ocid|trkid|gclid|fbclid|data-[\w-]+?|role|cite|itxt[\w-]*?)=[^&]+#$1#gi;

# remove repeating &
s#(&(amp;)?&(amp;)?)+#&#;

# remove trailing ? or &
map s#(\?|&(amp;)?)+$##
} @_;


I haven't tested, but I think this isn't EXACTLY the same. I'm pretty sure that my first code would run the first regex on $_[0], then $_[1], then the second regex on $_[0], then $_[1], and then the third regex on $_[0], then $_[1].

In comparison, this one would run all 3 regexes (I checked, that's the proper plural) on $_[0], then all 3 on $_[1].

In my case that's just fine, and since it's marginally smaller (8 whole bytes! LOL) the download time on the program would be slightly faster. But it might not be the same for all instances, if you need the first regex to run on each index of the array before moving on to the next.

fishmonger

8:48 pm on Jul 4, 2019 (gmt 0)

5+ Year Member



since it's marginally smaller (8 whole bytes! LOL) the download time on the program would be slightly faster

Are you on a 56k dial-up connection? That would be the only reason why the 8 bytes would matter.

Why use map if you're going to throw away its return value? In this case the for loop would be more efficient.

Using the modules would add readablity and clarity on what is being done, which is better for maintainability.

fishmonger

9:06 pm on Jul 4, 2019 (gmt 0)

5+ Year Member



IMO, instead of using multiple regex's, loading the query string into a hash and then taking a slice of it to extract what you want would be cleaner and more maintainable.

csdude55

10:46 pm on Jul 4, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are you on a 56k dial-up connection? That would be the only reason why the 8 bytes would matter.

I actually do have a lot of users that are on dial-up (rural US), so speed is always my #1 concern. I've learned that every fraction of a second that I can shave off of load times will result in more pages per session... which results in more ad revenue.

But this one was mainly for my own education because I rarely use map, so I'm trying to figure out the best ways to use it.

Have you seen any bench tests on whether it's faster or slower than a for loop? That would be interesting to see.

fishmonger

12:56 pm on Jul 5, 2019 (gmt 0)

5+ Year Member



You can test the difference with the Benchmark module. [metacpan.org ]