Forum Moderators: phranque

Message Too Old, No Replies

Testing website creating duplicacy

Testing website creating duplicacy

         

jameswilsonjw455

12:56 pm on Oct 19, 2021 (gmt 0)

Top Contributors Of The Month



Hello Everybody

What to do to solve the issue of duplicate content (text,images) on our own testing sites? Our actual website is www.abc.com
But before launching anything on actual website; web developers work on testing sites www.def.com, www.ghi.com. These testing sites are indexed & appearing in google searches. This might effect my actual website ranking
Please help

robzilla

1:03 pm on Oct 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You could limit access to the test domains based on IP address, e.g. only the IP address(es) your company uses, or "block" access to crawlers only using robots.txt [developers.google.com].

jameswilsonjw455

1:23 pm on Oct 19, 2021 (gmt 0)

Top Contributors Of The Month



Hey robzilla
Thanks for your reply
I will block all search engine bots from crawling in robots.txt file and I can also add meta robots tag with noindex and nofollow values

not2easy

2:19 pm on Oct 19, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



For testing sites, it really is best to limit access to viewing by means that prevent all robots rather than ask compliant robots not to look. Scraper bots can scrape test sites, they simply ignore robots.txt requests.

If you are set up on an Apache server you can use either configuration or .htaccess to deny all IPs except those of your authorized visitors and the server will prevent all visitors except those on your list.

NickMNS

2:30 pm on Oct 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also to add to not2easy's post. Testing usually implies that the work is not finished, which could mean that there still exist security vulnerabilities that have not been addressed or patched. Beyond bots you will want to keep out everyone other those people that are working on the site and need access. A hacker could create themselves a back door and then wait for you to go into production before exploiting it.

phranque

6:28 pm on Oct 19, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the best way to protect development and staging sites from being crawled is to use Basic HTTP Authentication [datatracker.ietf.org] or similar.

this has the effect of presenting a 401 status code to user agents sending requests that haven't sent sufficient credentials.
this challenge is used by the browser to request a username and password from the visitor before resending the request.
subsequent requests to the affected "realm" are then sent with the supplied credentials.

assuming apache, this describes the configuration:
Password protect a directory using basic authentication [cwiki.apache.org]

phranque

6:32 pm on Oct 19, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



also, welcome to WebmasterWorld [webmasterworld.com], jameswilsonjw455!

robzilla

7:36 pm on Oct 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I will block all search engine bots from crawling in robots.txt file and I can also add meta robots tag with noindex and nofollow values

It's pointless to do both.

phranque's suggestion is also a good one.

NickMNS

9:19 pm on Oct 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@phranque This is exactly what I need for a project I am working on now. I had been restricting access to only a short list of IP's. But now I'm at the stage of needing to provide demos to outside participants. So setting a password authentication is the perfect solution. It looks pretty simple to set up too.

In case anyone is using Nginx, here is the how to:
[docs.nginx.com...]