Forum Moderators: phranque

Message Too Old, No Replies

Sitemap Warning on Url blocked

Some urls shows had been block but cant able to see in robots.txt

         

Irfan Ansari

8:13 am on Feb 27, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



As i check my google webmaster its shows me that https://www.example.com/sitemap.xml had 50 warnings and than click on that 50 warning links its shows me few urls like https://www.example.com/bangalore/residential/example-gold-county/amenities and https://www.example.com/bangalore/residential/example-gold-county/amenities
(Note : Not able to find in Robots.txt file were this urls belongs and connection in y robots.txt with these urls


This is My Current Robots.txt format : https://www.example.com/robots.txt

User-agent: *
Disallow:/cgi-bin/
Disallow:/backoffice/
Disallow:/beta/
Disallow:/old/
Disallow:/backup_4Dec2017/
Disallow:/blog/
Disallow:/backoffice/data_content/projects/example_17_bangalore/2d_floor_plan/floor_plan_06_24_2016_17_42
Disallow:/backoffice/data_content/projects/example_anandam_nagpur/2d_floor_plan/floor_plan_06_24_2016_18_11
Disallow:/backoffice/data_content/projects/example_101_gurgaon/2d_floor_plan/floor_plan_06_24_2016_17_40
Disallow:/backoffice/data_content/projects/example_platinum_mumbai/2d_floor_plan/floor_plan_12_01_2017_16_47
Disallow:/backoffice/data_content/projects/example_palm_grove_chennai/2d_floor_plan/floor_plan_06_24_2016_17_52
Disallow:/backoffice/data_content/projects/example_aria_gurgaon/2d_floor_plan/floor_plan_06_24_2016_18_15
Disallow:/backoffice/data_content/projects/example_azure_chennai/2d_floor_plan/floor_plan_11_09_2017_30_32
Disallow:/backoffice/data_content/projects/example_alpine_mangalore/2d_floor_plan/floor_plan_11_29_2017_36_36
Disallow:/backoffice/data_content/projects/example_bkc_mumbai/2d_floor_plan/floor_plan_06_24_2016_18_26
Disallow:/backoffice/data_content/projects/example_central_mumbai/2d_floor_plan/floor_plan_06_24_2016_18_28
Disallow:/backoffice/data_content/projects/example_bayview_mumbai/2d_floor_plan/floor_plan_06_24_2016_19_26
Disallow:/backoffice/data_content/projects/example_waldorf_mumbai/2d_floor_plan/floor_plan_06_24_2016_21_29
Disallow:/backoffice/data_content/projects/example_frontier_gurgaon/2d_floor_plan/floor_plan_06_24_2016_18_31
Disallow:/backoffice/data_content/projects/example_horizon_pune/2d_floor_plan/floor_plan_06_24_2016_18_32
Disallow:/backoffice/data_content/projects/example_icon_gurgaon/2d_floor_plan/floor_plan_06_24_2016_18_35
Disallow:/backoffice/data_content/projects/example_oasis_gurgaon/2d_floor_plan/floor_plan_06_24_2016_18_36
Disallow:/backoffice/data_content/projects/example_woodsman_estate_bangalore/2d_floor_plan/floor_plan_06_24_2016_19_41
Disallow:/backoffice/data_content/projects/example_eternia_chandigarh/2d_floor_plan/floor_plan_06_24_2016_19_16
Disallow:/backoffice/data_content/projects/example_platinum_bangalore/2d_floor_plan/floor_plan_06_24_2016_19_17
Disallow:/backoffice/data_content/projects/example_platinum_kolkata/2d_floor_plan/floor_plan_12_01_2016_16_38
Disallow:/backoffice/data_content/projects/example_prakriti_kolkata/2d_floor_plan/floor_plan_06_24_2016_19_23
Disallow:/backoffice/data_content/projects/example_serenity_mumbai/2d_floor_plan/floor_plan_06_24_2016_19_25
Disallow:/backoffice/data_content/projects/example_summit_gurgaon/2d_floor_plan/floor_plan_06_24_2016_19_26
Disallow:/backoffice/data_content/projects/planet_example_mumbai/2d_floor_plan/floor_plan_06_24_2016_21_15
Disallow:/backoffice/data_content/projects/example_garden_city_ahmedabad/2d_floor_plan/floor_plan_14_11_2016_19_41
Disallow:/search/searchform
Disallow:*old*
Disallow:*beta*

User-agent: Googlebot
Disallow: /backoffice/
Allow: /backoffice/data_content/projects/

User-agent: Googlebot-Image
Disallow: /backoffice/
Allow: /backoffice/data_content/projects/

Sitemap: https://www.example.com/sitemap.xml

[edited by: phranque at 12:58 pm (utc) on Feb 27, 2018]
[edit reason] exemplified domain [/edit]

keyplyr

5:02 am on Feb 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, Google will report disallowed files as errors or warnings. This is because Googlebot received a 4** server response when it attempted to retrieve these files. These files are "blocked" in your robots.txt.

So if you are blocking the file (or file path) in robots.txt, then don't ask Googlebot to crawl that same file (or file path) by including it in your sitemap.xml.

However, even if you remove that file (or file path) from your sitemap.xml, Googlebot will likely find it on its own and still report the error in Webmaster Tools (Google Search Console.)

Google gives us the impression that any "error" is our fault, even if we are blocking a file intentionally.

Irfan Ansari

5:15 am on Feb 28, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



So can you help me with the actual robots,txt to use in my website...or any other alternative keyplr

lucy24

7:31 am on Feb 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You haven't yet explained what you are trying to do.

Do you want to remove all "errors" and "warnings" from GSC? It can't be done. Sometimes you have to simply ignore what they tell you.

Do you want Google to stop complaining about files that are listed in the sitemap but disallowed in robots.txt? Then you have to either remove them from the sitemap, or remove them from robots.txt. Only you can decide which option is best for your site. But since the only real purpose of a sitemap (.txt or .xml) is to tell robots about URLs they might otherwise have overlooked, why mention the inaccessible files at all?

Incidentally: Your robots.txt file as quoted in the first post has a very brief section for Googlebot-Image, and another very brief section for Googlebot. All the other stuff applies only to robots that are not the Googlebot. So changing all those many lines will not affect your problem in any way.

keyplyr

9:44 am on Feb 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...All the other stuff applies only to robots that are not the Googlebot
Googlebot will consider any file blocked that says "disallow" whether "Googlebot" is mentioned elsewhere in robots.txt or not; just tested it in GSC.

@Irfan Ansari - so lucy24 and I are both telling you roughly the same thing. Either ignore what GSC says are errors, or remove any file from sitemap.xml that you disallow in robots.txt.

phranque

12:32 pm on Feb 28, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



50 warning links its shows me few urls like https://www.example.com/bangalore/residential/example-gold-county/amenities

i don't see anything in your robots.txt that would disallow this url from being crawled.

Google will report disallowed files as errors or warnings. This is because Googlebot received a 4** server response when it attempted to retrieve these files. These files are "blocked" in your robots.txt.

crawlers are not actually "blocked" by robots.txt but are "disallowed" from crawling.
crawlers that are compliant with the robots exclusion protocol agree not to make requests for url paths for which they are disallowed from crawling.
thus googlebot would never get a response (4xx or otherwise) for such urls.

Either ignore what GSC says are errors, or remove any file from sitemap.xml that you disallow in robots.txt.

in other words, "errors" reported by GSC are only errors if you think they should be.
if i was asking googlebot to crawl a url with a sitemap.xml entry while simultaneously disallowing it from crawling through a robots.txt directive, i would want to be informed of that inconsistency.

keyplyr

7:15 pm on Feb 28, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



crawlers are not actually "blocked" by robots.txt but are "disallowed" from crawling
Yes phranque, that's why it was put in quotes.

phranque

1:58 am on Mar 1, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



that's why it was put in quotes.

but you also mentioned a "4**" status code - not in quotes...

it's this part that simply doesn't happen after compliant crawlers discover disallowed url paths:
when it attempted to retrieve these files

keyplyr

2:21 am on Mar 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You are absolutely correct.

At first, I unconsciously assumed the page did not exist and would receive a 4** server response. Guess I didn't consider that anyone would tell Googlebot about a file by listing it in sitemap.xml and then disallow Googlebot from crawling that same file in robots.txt.