Forum Moderators: phranque

Message Too Old, No Replies

.htaccess tracking 403/404 requests.

How can I see what page they were looking for?

         

Badger37

8:56 am on Aug 6, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi,
I use the following code in my .htaccess file to redirect visitors looking for non-existent pages to my own error page:

ErrorDocument 404 [mysite.co.uk...]
ErrorDocument 403 [mysite.co.uk...]

This works fine. The only thing that's missing is the ability to see what address the visitor was actually trying to find. Unfortunately my host doesn't provide access to the raw logs, so I use the popular AXS script.

Does anyone know if it's possible to get this info without seeing the raw log?

Thanks.

gertrijs2

10:00 am on Aug 6, 2006 (gmt 0)

10+ Year Member



Make your 404 page a php script, for example error.php, and include something like this inside the body:


<?php
$url = $_SERVER[ "REQUEST_URI" ];
$referrer = $_SERVER[ "HTTP_REFERER" ];
if ( $referrer == "" )
$referrer = "Unknown";
if ($url!= "/sitemap.rdf"
&& stristr($url, '/_vti_bin/') == FALSE
&& stristr($url, '/siteinfo.xml') == FALSE
&& stristr($url, '/MSOffice/clt') == FALSE
){
mail("badpage@youremail.com", "Page Not Found",
"Requested Page: " . $url
. "\r\nReferred By: " . $referrer
. "\r\nRemote Addr: " . $_SERVER["REMOTE_ADDR"] . " (" . $_SERVER["REMOTE_HOST"] . ")"
. "\r\n"
. "Cookies: \r\n"
. implode(",", $_COOKIE)
. "\r\nRequest URI: " . $_SERVER["REQUEST_URI"]
. "\r\n"
,"From: badpage@youremail.com" );
}
?>

that will send you an email for the problem pages.
obviously you also need to output some html.

Gert

jdMorgan

3:20 pm on Aug 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Off-topic, but urgent:

Your ErrorDocument code is ill-formed, and will result in a 200-OK response for all Not Found and Forbidden resources. This can lead to serious problems with search ranking and results.

The proper form for ErrorDocument specifies a local URL_path, not a canonical URL. If a canonical URL is specified, Apache will produce a 302-Found redirect response, and the client will then receive a 200-OK when it re-requests the document.

You can easily confirm this behaviour using the "Live HTTP Headers" extension for FireFox/Mozilla, or by using an online server headers checker.

Your code should read:


ErrorDocument 404 [b]/error.htm[/b]
ErrorDocument 403 [b]/error.htm[/b]

For more information, read the detailed notes in the documentation for the ErrorDocument [httpd.apache.org] directive.

Jim

Badger37

6:04 pm on Aug 6, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks for the replies.

Gert:
I won't need an email, but I'm looking at your code.
Can you say what the sitemap.rdf, _vti_bin (MS Frontpage?), siteinfo.xml and the MSOffice statements do?

Jim:
I had tried a relative address when I started setting up 404 pages for my sites some years ago. But then the only thing that seemed to work was using an absolute address. I've just tried /error.htm and it now seems to work?!

NB. I appreciate your post, but maybe the SE's are cleverer than 'we' think as I use this syntax on over 80 domains and they all do very well on Google etc.

Thanks again - I'm still tinkering with both of your suggestions.

Badger37

12:33 pm on Aug 8, 2006 (gmt 0)

10+ Year Member Top Contributors Of The Month



Having an email sent for each 404 isn't really what I was after.

I've just realised I would have been better posting this in the "Tracking and Logging" forum.

Is it possible for a mod to move it?