Forum Moderators: phranque

Message Too Old, No Replies

problem with URL contains a form with a GET method

         

antonnb

11:33 pm on May 5, 2022 (gmt 0)

10+ Year Member



Hi

the theme i used has a search function on the header top, the code is something like this

<form role="search" method="get" class="search-form" action="<?php echo esc_url( home_url( '/' ) ); ?>">
<label>
<span class="screen-reader-text"><?php echo esc_html_x( 'Search for:', 'label', 'gridbox' ); ?></span>
<input type="search" class="search-field"
placeholder="<?php echo esc_attr_x( 'Search …', 'placeholder', 'gridbox' ); ?>"
value="<?php echo esc_html( get_search_query() ); ?>" name="s"
title="<?php echo esc_attr_x( 'Search for:', 'label', 'gridbox' ); ?>" />
</label>
<button type="submit" class="search-submit">
<?php echo gridbox_get_svg( 'search' ); ?>
<span class="screen-reader-text"><?php echo esc_html_x( 'Search', 'submit button', 'gridbox' ); ?></span>
</button>
</form>


Also I have yoast schema graph json-LD installed (just a bit of part of the generated schema code)

@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https://example.com/?s={search_term_string}"},"query-input":"required name=search_term_string"}]


I run site audit using sitebulp desktop and this warning triggered

URL contains a form with a GET method

URLs that contain a form element with the method set to GET, which creates submission URLs with the form data in the query string. This presents a potential vulnerability for a large number of URLs to be created and/or cached, which could cause issues with crawl efficiency or index bloat


i have like 100 posts but excluded index coverage reach 10K most of them are url with parameter...sample url

https://example.com/page/3?s={search_term_string}/page/7/page/10/page/2/page/10/page/10/page/10/page/3/page/10

https://example.com/tag/egg?s=search_term_string

https://example.com/?s={search_term}/page/10

https://example.com/page/8?s=/page/1


my question is

does my excluded coverage is caused by the search form and schema JSON above?

does blocking robot with Disallow: /*?* is the correct approach?

if using Disallow: /*?*, what about url that has already been indexed since robots will not be able to access

should i modified the search function into something lik this (add nofollow rel)
<form role="search" method="get" class="search-form" rel="nofollow" action="<?php echo esc_url( home_url( '/' ) ); ?>">


is there any alternative solution aside from using robots.txt

my apologize for my english

[edited by: phranque at 11:51 pm (utc) on May 5, 2022]
[edit reason] exemplified urls [/edit]

phranque

12:06 am on May 6, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



should i modified the search function into something lik this (add nofollow rel)

i would instead change the form element to use the POST method:
<form role="search" method="POST" class="search-form" action="<?php echo esc_url( home_url( '/' ) ); ?>">


that solves the problem that caused the site audit warning.

does my excluded coverage is caused by the search form and schema JSON above?

this is likely and changing to the POST method should also prevent additional urls with search query strings from being crawled.

does blocking robot with Disallow: /*?* is the correct approach?

probably not.
you might consider using a link rel canonical element to "solve" these urls.

the correct answer actually depends on the difference in content displayed between the url with search query string vs without a query string.

antonnb

5:00 pm on May 9, 2022 (gmt 0)

10+ Year Member



thank you @phranque

one last question

the correct answer actually depends on the difference in the content displayed between the url with a search query string vs without a query string.


https://example.com/tag/egg?s vs https://example.com/tag/egg?amp


is this what you mean by "the difference in the content displayed" , ?

one URL having ?s ended up as a search result page with no-index tag

while ?amp page return as a regular tag page with a canonical tag.if so what is your suggestion for dealing with this variation of url

not2easy

5:26 pm on May 9, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You could disallow crawling of /?s= for search queries and/or/*?s= for search queries on archive pages. I'm assuming you aren't indexing tag pages which are collections of posts and pages.

antonnb

1:01 am on May 10, 2022 (gmt 0)

10+ Year Member



@not2easy

Thank you

phranque

1:18 am on May 10, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



https://example.com/tag/egg?amp

if there is a question mark in the url what follows is a query string.
this is the url without the query string:
https://example.com/tag/egg


one URL having ?s ended up as a search result page with no-index tag

noindex is good in this case.
it would have been cleaner to use a specific search url such as:
https://example.com/search?s=search-term

instead of:
https://example.com/whatever/this/url/is?s=search-term


You could disallow crawling of /?s= for search queries ...

wouldn't this still count toward "excluded index coverage"?

not2easy

3:25 am on May 10, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



wouldn't this still count toward "excluded index coverage"?
You want that content to be excluded because it serves the same content as you have at other URLs. The site is a WordPress site from all indications so the same content is served up at multiple URLs if you permit indexing of all versions. The Search results are not static, unique pages, especially with the /tag/ syntax. Tags are arbitrary terms that you can connect to your content (pages or posts) and helps with search, but you don't want those indexed. In WP you select what you want to use for permalinks and those are what should be on the sitemap. Google understands WP and has no problem ignoring /tag/ and /archive/ or even /category/ URLs.

phranque

5:08 am on May 10, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You want that content to be excluded because it serves the same content as you have at other URLs.

it serves search results based on the value of the 's' query parameter, ignoring the preceding part of the url.
you want this content and url noindexed.

tag and archive pages are irrelevant to the questions asked here.