Forum Moderators: phranque

Message Too Old, No Replies

How to prevent indexing of a pop-over

         

NickMNS

4:50 pm on May 11, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I want to add a pop-over to the top of pages to display a brief message to users. The pop-over will display for 5 to 7 seconds and then fade. Is there a way to ensure that the text is not indexed with the actual page content?

The technicals: the pop-over will be done using jQuery. I will prepend a div to the body tag and then delay for 5 secs and fade for 1.

Need I worry about an interstitial penalty? This is not an interstitial as it appear ahead of the content and the user can scroll down to the content. On desktop it has a height of less than 100px, but on mobile, with the current message, it takes up about half the page.

keyplyr

1:49 am on May 12, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Put the pop-over file in a roboted-out directory.

To ensure it doesn't get indexed, I not only disallow the directory in robots.txt but also use an htaccess in that directory with:
Header set X-Robots-Tag "noindex, noarchive"

phranque

3:56 am on May 12, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



a way to ensure that the text is not indexed with the actual page content

Put the pop-over file in a roboted-out directory.

To ensure it doesn't get indexed, I not only disallow the directory in robots.txt but also use an htaccess in that directory with:
Header set X-Robots-Tag "noindex, noarchive"

the X-Robots-Tag will only prevent indexing of the urls being requested from this directory (the popover text) rather than affecting the url of the actual page content.
also the bot will never see the X-Robots-Tag header if it has been excluded from crawling the popover text directory.

lucy24

4:16 am on May 12, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



but also use an htaccess in that directory with:

How does the header work if the contents of the directory exist only as included material within pages that live elsewhere?

keyplyr

4:33 am on May 12, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Every time I post about additional insurance to robots.txt, everyone jumps in saying the bot will never see it if it's blocked in robots.txt.

That's the point... It's insurance :)

Just a fail safe in case the file is requested *prior* to the SE bot requesting robots.txt. It happens.

the X-Robots-Tag will only prevent indexing of the urls being requested from this directory (the popover text) rather than affecting the url of the actual page content
Yes

How does the header work if the contents of the directory exist only as included material within pages that live elsewhere?
It's a pop-over. The file is not writing to the page. The content of the pop-over file is native only to the directory/file.

That's how I understood NickMNS's OP.

phranque

8:59 am on May 12, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Every time I post about additional insurance to robots.txt, everyone jumps in saying the bot will never see it if it's blocked in robots.txt.

That's the point... It's insurance

Just a fail safe in case the file is requested *prior* to the SE bot requesting robots.txt. It happens.

in that case it's important to point out the distinction between crawling (robots exclusion protocol) and indexing directives.

It's a pop-over. The file is not writing to the page. The content of the pop-over file is native only to the directory/file.

i'm assuming the request for the content in this directory is an ajax request to fill an element (the popover) in the originally requested document.

at this point i don't think google is respecting the noindex directive for the ajax content since it would be indexing this content for a different url.
i would be interested to see an authoritative answer on this specific question.

having discovered the "ajax url", if googlbot saw the noindex directive upon a subsequent standalone crawl of the "ajax url", it would then respect that and not index the "ajax url".

if on the other hand the ajax url was excluded from crawling by googlebot, the "ajax url" may get indexed with perhaps no meaningful title and the standard "no description available - robots.txt" snippet.
the content from the "ajax url" certainly wouldn't get crawled or indexed with the originally requested url containing the popover, so that may be an answer with the caveats above and below.
i'm not sure how google would consider an attempt to show googlebot requests different content for that page than presumably human user agent requests.

phranque

9:28 am on May 12, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



this was discussed a couple of times previously here, including in a recent thread:

How to hide content from Google in Ajax Lightbox [webmasterworld.com]

noindex part of page? possible? [webmasterworld.com]

interesting discussions but not really an answer to the noindexed ajax content question.

NickMNS

2:38 pm on May 12, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@keyplyr
Put the pop-over file in a roboted-out directory.

My implementation is to use an include to inject the jQuery code into the pages server side, so that the message appears as the page loads. I guess I could do this with an ajax call, but the problem still remains that the content will be displayed on the page at first render.

Now this message will appear on many pages so it is entirely possible and likely that Google will interpret this as boilerplate content at some point. The issue here is that I will be rolling out these pages gradually and at first with few pages indexed I doubt that Google will be able to flag this as boilerplate content.

So my options are:
- just go for it and don't over think it, it will sort itself out in the long run.
- ditch the pop-over completely
- use ajax but instead of loading the pop-over on page load, trigger it on scroll.

@phranque thanks for the links, the were informative for sure but they seem to arrive at the same non-conclusion.

NickMNS

11:33 pm on May 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just an update. I deployed the pages in question and I included the pop-over message to the users as an ajax call on scroll. When I fetch and render the page it does not come up. So all is good. But I just checked indexing with the site: command and Google has indexed the page called by the ajax. I guess I should have no-indexed it, as suggested by Keyplyr.

Note to self: pay attention to Keyplyr's advice he knows his stuff....

keyplyr

11:34 pm on May 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Note to self: pay attention to Keyplyr's advice he knows his stuff....
Exactly what I tell myself :)

phranque

1:20 am on May 15, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



My implementation is to use an include to inject the jQuery code into the pages server side

there's nothing you can do for robots exclusion or indexing directives that would affect a server side include implementation.
the user agent has no way to determine if any specific markup or javascript was generated by a ssi or a script.

tangor

1:37 am on May 15, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the user can see it, most bots can, too. Dumb disappeared more than a few years ago.

NickMNS

1:55 am on May 15, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@phranque, the server side includes adds the jQuery to the page. But the actual pop-over code, the html, is only requested by the ajax call which in turn is triggered by a scroll event. So, based on the fetch and render it appears to successfully prevent Googlebot from seeing the content. The problem is that the ajax call includes a url that Googlebot follows and then indexes.

If I click on the link site:serp to the pop-over it displays only the pop-over.

@tangor in general I agree with your statement, but in this case it seems to be the result of a naive but voracious implementation where it simply crawls all the urls it finds.

tangor

2:26 am on May 15, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@NickMNS, I respectfully say "I rest my case."

Bots are no longer dumb. Any script, and code, any page.... all translucent to them.

I do have to ask, why do you want this pop over not visible to g? These days I can't think of any valid reason to keep stuff away from them OTHER than what I KEEP BEHIND A PASSWORD.

NickMNS

3:23 am on May 15, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Currently it is a message to inform users that the site is in beta testing, ie: not fully functional. I would hate to have Google index a page and then show a title followed by the description this is in beta testing. Because, once it is no longer in beta, how can I guarantee that the that text will be removed.

Also, in the future it could be used to display short general messages to the users, such as Merry x-mas or something to that effect.

One more question, the page that was indexed only consists of a few lines of html, no head, no body. How am I supposed to add a meta "noindex" tag? I guess the only choice then the robots.txt.

phranque

3:38 pm on May 15, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the server side includes adds the jQuery to the page. But the actual pop-over code, the html, is only requested by the ajax call which in turn is triggered by a scroll event. So, based on the fetch and render it appears to successfully prevent Googlebot from seeing the content. The problem is that the ajax call includes a url that Googlebot follows and then indexes.

i wouldn't count on googlebot fetch and render continuing to ignore any particular event-triggered ajax calls indefinitely.

the page that was indexed only consists of a few lines of html, no head, no body. How am I supposed to add a meta "noindex" tag?

you can send the X-Robots-Tag HTTP header with the ajax responses to noindex those urls.

Using the X-Robots-Tag HTTP header:
https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag#using-the-x-robots-tag-http-header

NickMNS

1:42 pm on May 16, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@phranque, thanks for tip with the header. I'm taking a few days off I'll try and implement this when I get back.