Forum Moderators: phranque

Message Too Old, No Replies

R=301 Breaks htaccess RewriteRule

R=301 Breaks htaccess RewriteRule

         

Kevio

2:00 am on Mar 28, 2012 (gmt 0)

10+ Year Member



Greetings, first time poster. I can usually figure these things out with my pal Google, but no luck this time.

In my htaccess, I have the folllowing:

RewriteRule ^catalog/foo/([-a-zA-Z0-9]+)$ /catalog/foo/index.php?model=$1 [L]


Basically, I'm taking my long, database-generated URLs (domain.com/catalog/foo/index.php?model=123) and making it pretty (domain.com/catalog/foo/123). Works like a charm. However, when I add [R=301,L] at the end, it no longer does what it should - that is, the URL reverts back to the long ugly one, even if I manually enter the short, pretty one. This:

RewriteRule ^catalog/foo/([-a-zA-Z0-9]+)$ /catalog/foo/index.php?model=$1 [R=301,L]


...does not work - it shows the original, ugly URL.

I have other instances of 301 rewrites using R=301 in my htaccess file, and they work as they should.

Any idea what I'm missing? My goal is basically just to have the 301 in there to keep the Big Giant Heads at Google happy. Thanks a bunch for any insight!

lucy24

3:03 am on Mar 28, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In my htaccess, I have the folllowing:

RewriteRule ^catalog/foo/([-a-zA-Z0-9]+)$ /catalog/foo/index.php?model=$1 [L]

Basically, I'm taking my long, database-generated URLs (domain.com/catalog/foo/index.php?model=123) and making it pretty (domain.com/catalog/foo/123).

Uhm... No, you're not. Your Rule as written takes a short pretty URL and serves content from a longer uglier URL. So far so good. But the moment you change your Rewrite ([L] alone) to a Redirect ([R=301,L]) you've sent the user right back to the long ugly URL. In other words, exactly the opposite of what you meant to do.

Read any random 10-12 threads in the Apache forum. At least three or four of them will bec oncerned with the identical question.

:: wandering off to compose boilerplate seeing as how this question has come up at least three times in the past 24 hours ::

lucy24

4:39 am on Mar 28, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



OK, here we are again. First draft of boilerplate, after time out* to do laundry while narrowly avoiding short but ferocious hailstorm:

The Redirect-to-Rewrite Two-Step

Problem: Your dynamically generated pages have long, ugly, hard-to-memorize URLs, probably containing query strings. You want them to have short pretty URLs.

The Solution comes in two parts.

Part 1. Redirect
When a user asks for the long ugly URL, redirect to the short pretty URL. Basic pattern:

RewriteCond %{THE_REQUEST} \?
RewriteCond %{QUERY_STRING} queryname=([a-z]+)
RewriteRule longcomplicatedURL http://www.example.com/blahblah/%1? [R=301,L]


The %1 is captured from the original query string, and the final ? means that you now get rid of the query string. In real life it will usually be a little more complicated, but that's the basic process.

Example:
user asks for
www.example.com/directory/morestuff/index.php?model=volvo


They get bounced over to
www.example.com/cars/volvo


Part 2. Rewrite
You get an incoming request for a short pretty URL-- either from a new arrival or from someone who was redirected in Part 1. The server can't tell the difference.

RewriteRule blahblah/([a-z]+)$ longcomplicatedURL?queryname=$1 [L]


This time around, you're capturing part of the request and changing it into a query string.

Example:
user asks for
www.example.com/cars/volvo


They may think that's what they're getting-- it's what the browser's address bar says-- but behind the scenes the page content is really coming from
www.example.com/directory/morestuff/index.php?model=$1


Now you see why Part 1 had to look at THE_REQUEST. It's for insurance. If something happens later on, your long complicated URL might pass through mod_rewrite again. If it does, you need to be sure it doesn't get re-redirected. Otherwise there will be an infinite loop.

Now wait a minute! Does this mean that if someone starts out asking for "longcomplicatedURL", they go through this whole rigamarole and then they end up right back where they started?

Yup. But they don't know it. They only know what the browser's address bar tells them. Even robots-- yes, even google-- can't tell that they're being rewritten.

The Redirect part of the package-- Part 1-- is not technically necessary. The Rewrite-- Part 2-- will function without it. But redirecting everyone to the same URL means that everyone is now on the same page ... and it avoids nasty things like Duplicate Content.

But you're not done yet.

Part 0.
Before you do anything with Part 1 and Part 2, go over your current site carefully. Make sure that your own links point only to the short pretty URL. Requests for the long complicated URL should come only from outside-- from people with outdated bookmarks, or old links from other sites. Your own site will use only the pretty URLs.



* I'm just saying that so you won't think it took me an hour and a half to write this.

Kevio

1:10 pm on Mar 28, 2012 (gmt 0)

10+ Year Member



Thanks for the detailed info - I had basically come to a similar conclusion, that I only had part of what I needed. So, I have modified the directive:

RewriteCond %{THE_REQUEST} ^/gallery/process/getByModelID.php$
RewriteCond %{QUERY_STRING} ^model=([-a-zA-Z0-9]+)
RewriteRule (.*) http://www.mydomain.com/gallery/%1? [R=301,L]

RewriteRule ^gallery/([-a-zA-Z0-9]+)$ /gallery/process/getByModelID.php?model=$1 [L]


...and my clean URLs work as they should, whether I type them in or follow a link (I've changed all the links on the site to point to the clean URLs). However, when I do a header test on the "long" URL, I get a 200 response with no 301 redirect. So, it doesn't appear that the actual redirection is occurring, and I've tried quite a few different methods to get it to redirect.

Any suggestions? And thanks again for the primer!

g1smd

1:23 pm on Mar 28, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



THE_REQUEST doesn't begin with a slash. This is the literal request from the browser.

It begins with GET POST HEAD TRACE etc and ends with HTTP/1.0 or HTTP/1.1

The pattern you used is wrong, specifically the anchoring.

This question is a regular here, so pick any ten random threads and you'll soon find a post that has the right format. :)

Kevio

11:54 pm on Mar 28, 2012 (gmt 0)

10+ Year Member



OK, I think I've got it now, and I wanted to follow up so hopefully my experience will help someone else. Thanks again for chiming in.

First of all, here's my final code:

RewriteCond %{THE_REQUEST} catalog/stuff/index.php
RewriteCond %{QUERY_STRING} model=([a-z0-9]+) [NC]
RewriteRule (.*) http://www.mydomain.com/catalog/stuff/%1? [R=301,L]

RewriteRule ^catalog/stuff/([-a-zA-Z0-9]+)$ /catalog/stuff/index.php?model=$1 [L]


Anyone see any issues with that? I've run the old URLs through several HTTP Header Testers, and they return a 301 and point to the "clean" URL as they should.

A dumb question - when I type in the "old" URL that contains the model=123 string in the browser, does Rewrite actually change it in the address bar, or is that just behind-the-scenes? That is, when I manually type in the old URL, the page loads, but the address bar retains the old URL. No biggie (I don't think), just curious.

Now, my biggest problem - other than lack of experience and some syntax - was that I was using various online tools to check my work. For example, the htaccess tester at madewithlove.be. Well, that's a nifty tool, but it doesn't support all the various flags and commands, and was returning kludgy data where there was none. Additionally, several HTTP Header Testers showed weird results - testing the same page on different testers returned any combination of 200, 301 & 302 errors, especially before I tweaked my syntax. Point is, I personally wouldn't rely on these tools to *exactly* give correct results - some seem to work really well, some, notsomuch. This combination of things caused me to just start chasing my tail, trying to fix things that actually appear to be working as they should. Just something to think about.

So, once again, thanks to you guys that chimed in and pointed me in the right (re) direction - and if you see any issues with my redirects, let me know.

g1smd

12:02 am on Mar 29, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use the Live HTTP Headers extension for Firefox and remember to clear the browser cache before each test.

You should have two RewriteRules in total. The one with the R flag redirects the user to a new URL and the browser address bar should change to reflect that. The other is a rewrite and it merely allows the browser to ask for a friendly URL but the server software then pulls content from a different place inside the server without revealing what that location is.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /catalog/stuff/index\.php\?model=([a-z0-9]+)\ HTTP/ [NC]
RewriteRule ^catalog/stuff/index\.php$ http://www.example.com/catalog/stuff/%1? [R=301,L]


RewriteRule ^catalog/stuff/([-a-z0-9]+)$ /catalog/stuff/index.php?model=$1 [NC,L]


This assumes a valid old URL has only one attached parameter. If there are more in your request the redirect will fail and the page will be served at the duplicate content URL with multiple parameters as requested.

Escape literal periods.

Use example.com in this forum.

Kevio

12:12 am on Mar 29, 2012 (gmt 0)

10+ Year Member



And to answer my own question:

A dumb question - when I type in the "old" URL that contains the model=123 string in the browser, does Rewrite actually change it in the address bar, or is that just behind-the-scenes? That is, when I manually type in the old URL, the page loads, but the address bar retains the old URL. No biggie (I don't think), just curious.


...I had so many pages cached from all my testing, my browser was evidently just showing the cached version (I guess). Clearing the cache now makes the browser show the clean URL in the address bar, even if I manually type in the ugly one.

Come get me, Googlebot! :)

lucy24

12:17 am on Mar 29, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This part
RewriteRule (.*) shortprettyURL etc.

Needs to be more narrowly constrained, or it will pick up everything: images, stylesheets etc. They'll be deflected by the Condition, but it saves time and resources if you never even send them that far. Remember*, mod_rewrite moves two steps forward, one back: First it looks at the Rule. If and only if the Rule might apply, it then goes back and looks at the Conditions, stopping as soon as it hits one that fails.

At a minimum, reword the rule so it ends in \.php. Or rather,
(/(index\.php)?)$
Don't cut and paste! This wording may not be exactly right for the circumstances.

When testing, don't only test requests that are supposed to work. Try requests that are supposed to not work-- including garbage requests just to be safe.

Edit:
Clearing the cache now makes the browser show the clean URL in the address bar, even if I manually type in the ugly one.

And there in a nutshell is the miracle of rewriting.


* At least, "remember" as used by my now-adult son: "I'm sure I meant to tell you about this, I just forgot" ;)

Kevio

12:29 am on Mar 29, 2012 (gmt 0)

10+ Year Member



Thanks, Lucy - I'm on it. :)

On escaping the periods - makes perfect sense to do this, but I've had people tell me "Meh, it's not that important." Is it? I'm doing it anyway (I hadn't gotten that far when I pasted my snip - was trying to keep it as clean as possibly for readability while testing). I'm just wondering if there are pros/cons?

g1smd

1:38 am on Mar 29, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



.
is any character.

\.
is a literal period.

If you forget to add the escaping there may be certain URL requests that will end up triggering an infinite loop inside the server.

lucy24

4:20 am on Mar 29, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Forgetting to escape a period generally means that you will pick up everything you meant to pick up-- but also a lot of things you didn't mean to pick up. This may or may not be significant depending on where the . is:

If your Rule says

blahblah.php$ dostuff


it will work on requests for blahblah.php -- but also on blahblahaphp, blahblah2php, blahblah/php and so on. No biggie, probably.

But in an IP like
1.23.1.23
especially if it isn't anchored at both ends, the difference between . and \. could be HUGE. You might end up locking out your supervisor, your grandmother and your Congressman when all you meant to block was a nasty robot from Hong Kong.