Forum Moderators: phranque

Message Too Old, No Replies

How to redirect old urls with query string to 404 page?

         

born2run

1:28 pm on Oct 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi so on my site with Apache server I have this url with query string:

https://www.example.com/?page=2835

Which is going to my site's homepage.

I want all such query string urls above to 404 page. Is there a htaccess code that can do this task?

I've already tried the following:

==
RewriteCond %{QUERY_STRING} "post_type=" [NC]
==

The above code is incomplete as I don't know what the RewriteRule will be in this case. Thanks!

not2easy

1:47 pm on Oct 13, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This is a WordPress site, right? You do not want to leave pages on your site with URLs that go to a 404 error page. Clearly you do not want to delete your home page so it should not really return a 404 error. Do you see visitors to these URLs? Have these alternate URLs somehow been indexed?

born2run

2:21 pm on Oct 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi no it's a basic apache server website with no cms. Yes it seems to be an old URL I found in the google search results for my site. My other question is why this url format:

https://www.example.com/?page=2835

Is going to my site's home page and not a 404? Is this the way apache server works by default? Thanks!

NickMNS

2:42 pm on Oct 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



why this url format: https://www.example.com/?page=2835 Is going to my site's home page

Because the URL is "https://www.example.com/" the remainder is the query parameter "?page=2835". So it is telling your server go to the URL and then feed it the parameter. Your server says, ok show the page for the URL, but I don't need the param so I simply ignore it.

not2easy

2:45 pm on Oct 13, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Apache can be configured to do different things, it depends on how it is configured either in .htaccess or in the httpd.conf file.

You could test that by using a made up page number to see whether that goes to the homepage also. Maybe "https://www.example.com/?page=123456" might also go to the home page. Because this depends on how the domain is hosted and how the host server is configured, it might help to ask your host about the why/how.

On the other hand, if you wish to use a rule to remove 'all' query strings, that can cause "soft 404" errors in GSC. It helps to have a specific set of conditions that are causing problems you want to solve rather than solving for all possibilities. If it is to resolve that single URL query string, then it makes sense.

NickMNS

2:51 pm on Oct 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I should add:
This is normal and desired behavior. You can go to any URL and add "?anything=nothing", and it will have absolutely no effect on the page rendered. FB and others leverages this to spy on its users, when someone clicks on a link from within FB, they append a query param to the URL "?fbclid=someBase64EncodedString". This has no impact on your server, but FB can then track through its widgets exactly where users go after leaving the FB ecosystem.

w3dk

3:57 pm on Oct 13, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month




RewriteCond %{QUERY_STRING} "post_type=" [NC]


Although this is obviously checking for a "post_type" URL parameter (anywhere in the query string), rather than "page" (as the start of the qurey string) as in your example URL.

To serve a 404 for any request to the homepage (only) that contains a query string that starts "page=" then you could do the following:


RewriteCond %{QUERY_STRING} ^page=
RewriteRule ^$ - [R=404]


Or, to simply redirect to the homepage, removing the query string:


RewriteCond %{QUERY_STRING} ^page=
RewriteRule ^$ / [QSD,R=301,L]


(Assuming Apache 2.4+)

born2run

3:59 pm on Oct 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ok guys thanks so much! I'll leave that url as is then. Looks like it's normal behavior.

w3dk

4:01 pm on Oct 13, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Looks like it's normal behavior.


It might be "expected", but that doesn't make it desirable. Unless you 404 the request or 301 redirect (as I mentioned above) then that URL will likely remain in the search results, which isn't necessarily a good thing for SEO.

lucy24

4:15 pm on Oct 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How to redirect old urls with query string to 404 page?
Without even reading the thread (although I did) the answer is simple: Do not redirect anything to the 404 page. Serving a 404 (or 410) response is not the same thing as redirecting to the 404 page, although it may look the same to a human user.

If you used to have URLs with query strings, and no longer do, the solution involves one or more of three things:
--use the existing query string to issue a page-for-page redirect to a new queryless URL
--redirect from with-query URL to the identical URL minus query (this is less likely on a WP site)
--serve a blanket 410 to any and all requests containing a query

phranque

3:25 am on Oct 14, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I've already tried the following:

==
RewriteCond %{QUERY_STRING} "post_type=" [NC]

the url with a query string is different from the url without, so if you aren't serving different content, you should redirect the request.
assuming you use no query strings anywhere, try something like this:
# if the query string contains anything
RewriteCond %{QUERY_STRING} .
# redirect the requested path to the canonical hostname without a query string
RewriteRule (.*) https://www.example.com/$1? [R=301,L]

No5needinput

4:46 pm on Oct 14, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



I use

RewriteCond %{QUERY_STRING} !=""
RewriteCond %{REQUEST_URI} !^/folder1/file1.*
RewriteCond %{REQUEST_URI} !^/folder2/file2.*
RewriteRule ^(.*)$ /$1? [R=301,L]


Lines 2 and 3 allow query string uri's to admin.cgi?blahblah etc. Everything else with a query redirects to the vanilla uri. Works perfectly in .htaccess - for me anyway.