Forum Moderators: phranque

Message Too Old, No Replies

redirect unknown addresses based on URL

         

Jason_85

7:12 am on Sep 24, 2010 (gmt 0)

10+ Year Member



Hi Everyone,

Anyway, I have been doing a lot of tinkering with Apache lately and while I'm learning a lot, there are some things that I simply don't understand, and for which I can't find the documentation.

Here is my biggest problem; I want to redirect all users of write something like this:

www.mysite.com/some/unknown/url

to www.othersite.com/some/unknown/url

where some/unknown/url DOES NOT exist and would otherwise generate a 404 error. That is to say, I want to create a 301 redirect to a different site ONLY when the original URL could not be found. Is this something that is handled by ErrorDocument?

A follow-up question would be how difficult it would be to hide the redirect so that the URL still appears as "www.mysite.com/some/unknown/url" despite actually coming from "www.othersite.com/some/unknown/url"

Thanks in advance for your time :) I'll make an effort to contribute to the discussions in the coming weeks.

Jason

P.S. Earlier I tried to say hello on the wall to introduce myself, but I kept getting this message:

"status: 10:You have reached a private forum or discussion. Please login to view threads in this forum.(jason_85)"

Then I'd log in and it would just take me back to the main forum.

jdMorgan

1:25 pm on Sep 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Redirecting non-existent resource requests will result in the client seeing a 301 or 302 server status for all such requests instead of the 404-Not Found or 410-Gone error response. This will make the URL-space of your domain appear to be infinite, and result in reduced spidering/crawling from search engines, and possibly a reduction in the rankings of your pages in search results (because the crawlers may not visit all your pages and credit all of your internal links).

Therefore, this approach is NOT recommended.

If you have server configuration access, you can reverse-proxy requests to another server, and thus 'hide' the fact that the content is being served from elsewhere. Without server config access, you could use a script to get content from the other server to serve in response to requests on this server, but such a script is well beyond the scope of this forum -- or any forum, really. Try a search for reverse-proxy scripts.

The bottom line is that it's really not possible to recommend a solution if we don't know what basic problem you are attempting to solve.

"The Wall" is part of our paid-subscribers "Supporters" sub-site. You could post in our public Community Center [webmasterworld.com] instead.

Jim

Jason_85

2:22 pm on Sep 24, 2010 (gmt 0)

10+ Year Member



Thanks for the reply, I think the first option you suggested is more up my alley. The thing is that I have a lot of random and not particularly targeted traffic coming into a couple of dozen domains that I don't want to park just yet. Most of the requests land on 404 pages, presumably from sites of previous owners, and I want to simply redirect them to a parked domain that I have. Can you let me know what doing this is called? I don't have the time to redirect these individually (ie. putting in all the rewrites manually for the common 404s), so I just want to redirect all 404s to the parked domain. I searched for "apache htaccess redirecting non-existent resource requests" but couldn't find anything, can you let me know where I can start, or if it's simple can you give me the htaccess script?

Jason_85

9:56 am on Sep 25, 2010 (gmt 0)

10+ Year Member



I was trying to get started on a solution, so I added this to my .htaccess file:

ErrorDocument 404 http://www.externalsite.com


but this has absolutely no effect (I'm running a drupal site). Here is the content of my .htaccess - if that helps:

# Protect files and directories from prying eyes.
<FilesMatch "\.(engine|inc|info|install|module|profile|test|po|sh|.*sql|theme|tpl(\.php)?|xtmpl|svn-base)$|^(code-style\.pl|Entries.*|Repository|Root|Tag|Template|all-wcprops|entries|format)$">
Order allow,deny
</FilesMatch>

# Don't show directory listings for URLs which map to a directory.
Options -Indexes

# Follow symbolic links in this directory.
Options +FollowSymLinks

#try to get a custom 404
ErrorDocument 404 http://www.saidtopic.com

# Force simple error message for requests for non-existent favicon.ico.
<Files favicon.ico>
# There is no end quote below, for compatibility with Apache 1.3.
ErrorDocument 404 "The requested file favicon.ico was not found.
</Files>

# Set the default handler.
DirectoryIndex index.php

# Override PHP settings. More in sites/default/settings.php
# but the following cannot be changed at runtime.

# PHP 4, Apache 1.
<IfModule mod_php4.c>
php_value magic_quotes_gpc 0
php_value register_globals 0
php_value session.auto_start 0
php_value mbstring.http_input pass
php_value mbstring.http_output pass
php_value mbstring.encoding_translation 0
</IfModule>

# PHP 4, Apache 2.
<IfModule sapi_apache2.c>
php_value magic_quotes_gpc 0
php_value register_globals 0
php_value session.auto_start 0
php_value mbstring.http_input pass
php_value mbstring.http_output pass
php_value mbstring.encoding_translation 0
</IfModule>

# PHP 5, Apache 1 and 2.
<IfModule mod_php5.c>
php_value magic_quotes_gpc 0
php_value register_globals 0
php_value session.auto_start 0
php_value mbstring.http_input pass
php_value mbstring.http_output pass
php_value mbstring.encoding_translation 0
</IfModule>

# Requires mod_expires to be enabled.
<IfModule mod_expires.c>
# Enable expirations.
ExpiresActive On

# Cache all files for 2 weeks after access (A).
ExpiresDefault A1209600

# Do not cache dynamically generated pages.
ExpiresByType text/html A1
</IfModule>

# Various rewrite rules.
<IfModule mod_rewrite.c>
RewriteEngine on

RewriteBase /
RewriteRule ^en/reactivate/2839475362728397-ijh897$ http://saidtopic.com/en/reactivate/2839475362728397-ijh897 [R=301,L]

# Rewrite URLs of the form 'x' to the form 'index.php?q=x'.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
</IfModule>

# $Id: .htaccess,v 1.90.2.3 2008/12/10 20:04:08 goba Exp $


Note that it says:
# Override PHP settings. More in sites/default/settings.php
# but the following cannot be changed at runtime.


Is it possible that drupal is somehow "overriding" .htaccess settings using some php scripts? I didn't think this would be possible but it doesn't hurt to ask... Thanks for the help :)

g1smd

10:06 am on Sep 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Using a domain name in the ErrorDocument directive results in a 302 redirect to a different URL when a resource does not exist at the originally requested URL.

That will definitely get your site flagged as having "low technical quality" in Google's algorithm and rapidly have a negative effect on your search results.

Jason_85

12:18 pm on Sep 25, 2010 (gmt 0)

10+ Year Member



Hi g1smd,

Our sites are not public access and we don't really care about SEO, do you know how i do this? (ie. redirect 404s to another location)?

g1smd

12:25 pm on Sep 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A 404 response can only be returned for the originally requested URL. By definition, a server redirect will return a 30x response instead.

On this occasion, I'd be tempted to add both a meta robots noindex directive, as well as a 3 second (or more) meta refresh directive to the error page.

This is one of the very few occasions when a meta refresh might be useful. I'm still not 100% happy with that, but I would certainly not use a 302 redirect to the other site.

Jason_85

12:56 pm on Sep 25, 2010 (gmt 0)

10+ Year Member



So how would I do this? My problem is that when I try to make use of the errordocument directive, it just seems to get ignored...

jdMorgan

4:10 pm on Sep 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is a fundamental problem here, assuming that all of these old domains are pointed to the server where you are truing to install this code.

The problem is that your definition of "requested URLs which should be redirected to the 'parked domain' is precisely the same as that required by Drupal to function -- That is, "The requested URL does not resolve to an existing file."

If this is not clear, see the code provided by Drupal at the end of your rewriterules. This code says, "If the requested URL does not resolve to a physically-existing file, and it does not resolve to a physically-existing directory, and it is not "favicon.ico", then rewrite this request to Drupal's index.php script, passing the requested URL-path as a query string value for the name "q".

Therefore, if you attempt to base your redirection only on the URL-path not resolving to an existing file, you will "break" Drupal, and your site will stop working.

So you're going to need to give the server more information about these requests in order for it to be able to distinguish "non-existent" requests that should be redirected to the parked domain from those that should be passed to Drupal.

If possible, enumerate the hostnames "domain names" of the "previously-owned domains," and include them as a requirement for redirection to the parked domain. You would then put code like this above the Drupal rule:

RewriteCond %{HTTP_HOST} old-domain1\.com [NC,OR]
RewriteCond %{HTTP_HOST} old-domain2\.org [NC,OR]
RewriteCond %{HTTP_HOST} old-domain3\.co\.uk [NC,OR]
RewriteCond %{HTTP_HOST} old-domain1\.info [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ http://www.parked-domain.com/ [R=301,L]

That will pass requests for requested URLs within the old domains which do not resolve to existing files or directories on this server to the parked domain, using the same URL-path path as originally requested.

You may not even need the "exists" checks any more, depending on exactly what you want to do, and since they are server-resource-intensive functions, they should NOT be done unless absolutely required.

I doubt that this code will do exactly what you want. There is still the issue that although the old-domain requests will now be redirected to the parked domain, most or all of them are likely to throw a 404 on that parked domain as well. At the same time, you can't redirect *all* types of resource requests to a page on the parked domain, because, for example, you might get a request for robots.txt on the old domain, and redirecting that to a "landing page" on the parked domain would be a rather huge mistake...


I'd suggest that you think about the following "types" of old-domain URL requests, and decide what you want to do with them:

  • pages -- extensions of htm, html, shtm, shtml, php, and "none"
  • images -- gif, jpg, jpe, jpeg, png, ico, etc.
  • documents - pdf, doc, etc.
  • multimedia - swf, flv, avi, mp4, wmv, mp3, wma, etc.
  • server files - robots.txt, labels.rdf, sitemap.xmp, w3c/p3p.xml, etc.

    Then you can either handle the proper disposition of these requests on your main server using multiple redirection rules, or you cna handle them on your parked-domain server using multiple internal rewrites.

    Jim
  •