Forum Moderators: open

Message Too Old, No Replies

MicrosoftPreview

robots.txt

         

dstiles

9:13 am on Sep 25, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't use robots.txt much as it's generally pretty useless except for a few bots, but I'm trying to persuade MicrosoftPreview to leave me alone. Trouble is, adding MicrosoftPreview to robots.txt seems to not work. Anyone know what does work, please?

phranque

10:22 am on Sep 25, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



have you tried the Bing Webmaster Tools robots.txt tester [bing.com] tool?

lucy24

4:14 pm on Sep 25, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Things with “Preview” in the name tend to hold themselves exempt from robots.txt, because they’re triggered in some way by human action--whether or not this is actually true in any one specific case. You may have no choice but to block them by name: that is, [F] 403 as opposed to robots.txt Disallow.

dstiles

9:23 am on Sep 27, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



phranque - testing is not the problem; I know the robots.txt files are valid.

Lucy - yes, agreed. I was just hoping to clean up the logs a bit. :(

tangor

12:21 am on Sep 28, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



robots.txt is strictly voluntary honor system ... and we know there's not that much honor out there these days.

Is this a large number of hits per month (hundreds to thousands) or less than hundred? In what kind of context does it appear, human or bot?

dstiles

8:26 am on Sep 28, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've known robots.txt is useless for decades.

Not a lot of hits, just annoying and from bot.

engine

9:37 am on Sep 28, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



For reference, here's Microsoft's own page explaining about MicrosoftPreview [bing.com...]

MicrosoftPreview MicrosoftPreview generates page snapshots for Microsoft products. Note that MicrosoftPreview has "desktop" and "mobile" variants.


If it's not taking notice of your robots.txt i'd double-check the configuration, and if it's ok, i'd contact Microsoft (link on that Microsoft page).

I always consider that robots.txt can be an aid to the bad actors out there, and individual page control can often be more effective, even if it's a palaver.

lucy24

3:52 pm on Sep 28, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've known robots.txt is useless for decades.
It sounds as if you’ve been expecting it to do something it was never meant to do, and by its nature can’t do. It is just what the extension says: a purely informational text file. Its dual purposes are, #1, to tell search engines and other legitimate entities to keep the ### out of certain areas, and, #2, to aid in deciding whether to authorize a new robot that would otherwise be blocked by default. (Sure, as robots get smarter, header-based access controls will go the way of checking for “Mozilla” at the beginning of the UA, but we’re not there yet.)

dstiles

7:52 am on Sep 29, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



engine - I've already read that but it doen't say what the robots.txt keyword is other than the obvious, which does not seem to work.

lucy - I know what it is and I know it's not what it should be. It can help good bots but is useless at shutting out bad ones, hence the complex coding in apache and php to reject the baddies. That control should have been built-in early in internet's history.

engine

8:47 am on Sep 29, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



dstiles, did you use the report link to let them know of the problem?

blend27

1:15 pm on Oct 8, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



.. Things with “Preview” in the name

I smacked a RewriteCond in .htaccess file to 403 all things Preview a decade ago just for the sake of Bing SERP sucking up the bandwidth sort of speak. Real Visitor Traffic from Bing actually increased.

lucy24

5:08 pm on Oct 8, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It may warrant a separate thread, but does any search engine currently have a Preview function for human use? I remember G*** did it a few years ago, but I don't see any sign that it still exists.

engine

3:41 pm on Oct 9, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Not exactly sure what you mean by preview, Lucy?

not2easy

4:51 pm on Oct 9, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I believe that refers to some older UAs - this snippet from 2016 shows a BingPreview for example: "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b"

lucy24

5:27 pm on Oct 9, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not exactly sure what you mean by preview
For a while, each SERP included an option to see a thumbnail preview of the site. I don't think it lasted very long.

:: detour to old logs ::

Oh. I find it all the way back to early 2011 (the oldest logs I have). Didn't realize it went back that far. Google's for example is
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/a.b.c Safari/537.36
(where “a.b.c” reflects where Google started putting the most recent Chrome in most of their UAs). But I can't remember when I last saw the Preview option in a SERP.