Forum Moderators: open
Since then we gradually see our ask.com traffic fading away.
I emailed them last week, but sofar no reply.
Has anyone else seen this, or have an explanation for it?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([/0-9a-z._\-]*)[^/0-9a-z._\-](\?[^\ ]*)?\ HTTP/ [NC]
RewriteCond %{DOCUMENT_ROOT}/%1 -f [OR]
RewriteCond %{DOCUMENT_ROOT}/%1 -d
RewriteRule .* http://www.example.com/%1 [R=301,L]
The original query-string attached to the URL (if any) is retained.
If you modify the [groups] in the pattern above, make sure that they match exactly -- with the obvious exception of the "^" negation operator in the second group.
Jim
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([/0-9a-z._\-]*)[^/0-9a-z._\-\?\ ][^?\ ]*(\?[^\ ]*)?\ HTTP/ [NC]
You'll also see this happening with a trailing period on the requested URL when the person posting the link puts a period at the end of it -- as in, "For more info, see http://example.com/widget.html." However, in this case the period is not hex-encoded, because it is a valid character to include in a URL, unlike a space.
Jim
I once had a problem on a site (server-side configuration error on the part of the host) that put Slurp into an endless loop; I wrote Yahoo and actually got a human reply back that was very nice, and they did pass it on. I was afraid of getting banned because of it, but they apparently dealt with it, no problems ensued.
If there's a problem, it never hurts to try to communicate.
Jim, is it the same type of solution for that type of thing happening?
Besides, this is happening for every page on the site and not just pages linked to via the message board.
I have filled in a feedback form but like the OP no reply.
jdmorgan - does that piece of code stop all uses of % translations?
I have a database look up which translates spaces such as:
/search_users.php?username=J%20Doe&widget=light%20blue
would the script fail if I used your code?
[edited by: Frank_Rizzo at 12:47 pm (utc) on Dec. 29, 2007]
The original query-string attached to the URL (if any) is retained.It will only remove %nn tails on the URL-path-part. A query string is not part of a URL, but rather, data attached to a URL to be passed to the resource at that URL.
Marcia:
You can use a different code snippet to remove spurious query strings - It's been posted elsewhere on WebmasterWorld several times. But, as highlighted by Frank's question above, you do need to be sure that it won't break your site. :)
Jim
We did experience a data error which caused us to crawl badly-formed urls from a small number of sites. We identified the issue and corrected it on Dec 29th. Thanks for flagging and please let us know if you see any further problems.
Best regards,
Vivek Pathak
Infrastructure Product Manager
Ask.com
regards!
Btw, gonna ask this in another subforum, but.. is that crawler: 78.137.163.133 coming from digiweb.ie with
"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20070308 Minefield/3.0a1"
by any chance a crawler from ask.com? I wonder, cause some
ask crawlers indeed come from .ie if I am correct, and always
use Minefield..
This 78.137.163.133 is all over the web in visible access logs
etc, ( ip-78-137-163-133.dedi.digiweb.ie ).
Since it does no identify itself, I now block it via .htaccess
but surely would be not doing that if I knew if was from
ask.com..
[edited by: Drreggae at 11:27 am (utc) on Jan. 4, 2008]
host 78.137.163.133
ip-78-137-163-133.dedi.digiweb.ie
whois -h whois.ripe.net 78.137.163.133
inetnum: 78.137.160.0 - 78.137.163.255
netname: DIGIWEB-HOSTING-NET
descr: Digiweb Hosting [3]
country: IE
Also, the same IP has been asking for robots.txt with a blank user agent.
Very nice.
Not.
[edited by: incrediBILL at 9:09 pm (utc) on Jan. 6, 2008]
For example if you have both of these on your server:
http://example.com/red/
http://example.com/red-widget/
When you link to http://example.com/red-widget/ it gets 301'd to http://example.com/red/
If you don't have http://example.com/red/ it works as expected.