Forum Moderators: not2easy

Message Too Old, No Replies

Archive.today/.li/.is.

         

QuaterPan

8:35 am on Jun 12, 2018 (gmt 0)



I just discovered about a site , called Archive.today (exists also with the extension .li .is etc... this site has scrapped 5.000+ pages of my site. This site claims to archive pages at the request of users, but it has nothing to do with private copy/use since the pages are accessible to everybody, including robots! This is how i found out because some of my "good" pages are now showing at Google from this site!

And, all links are replaced by their own , which makes that, if you click on a link, instead of going to the page / site, it will archive this destination and serve this archive!

The site mentions that they are not going to remove anything. How can it be legal? And why is Google indexing it?

keyplyr

9:00 am on Jun 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They will remove your pages if you ask them. However those pages will be right back in the archive unless you block their UA and IP ranges.

Why is Google indexing them? That's what Google does.

QuaterPan

9:09 am on Jun 12, 2018 (gmt 0)



They will remove your pages if you ask them.

I did write a nice email (no menace, no big words), asking if they would mind removing my pages, and they said "no way*". (they remove only #*$!, and warez)

However those pages will be right back in the archive unless you block their UA and IP ranges.

All my pages, include the time and ip of the client accessing them. So I looked at their "archive" to find their IP , and time, then i went back to my logs, and they use random UA, which mimic real web browsers, and for their IP , I can't find any range or pattern , it comes from all around the World, I guess they use proxy/vpn/etc...

* in fact, they didn't say "no way", they were polite, and said that since the archive is (supposedly) at the request of an individual, they can't delete it, by respect to the will of this "individual". But I still think that this is fishy to have these archives publicly available, and to "encourage" search engine robots to crawl and index them. If they were honest in their intellectual approach, they would block robots...

keyplyr

9:27 am on Jun 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Try to identify the company(s) that owns the ranges. Most server farms will have a lot of ranges, often in different locations.

I'm not suggesting you're wrong and they don't use VPNs, but it's more likely they use one or more cloud server nodes. Identify them, then you can block all those ranges.

You could send them and their upstream provider a C&D notice to remove your property. Follow the format of the Digital Millennium Copyright Act (DMCA Pub. L. 105-304) and if they don't respond, you could ask Google to remove the offending pages from their index.

QuaterPan

2:33 pm on Jun 12, 2018 (gmt 0)



I shouldn't have written to them. Since I did, 500+ more pages have been suddenly archived today, and I doubt that this was done by individuals really interested in keeping a copy of all these pages suddenly !

In the logs, I can guess these hits, and they came from "anywhere" and "anything", with "fake" referrers and UA, which could be possible. I have to confess that this is "well" done.

They should use their skill and knowledge to build more fair services.

lucy24

4:05 pm on Jun 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



with "fake" referrers
Take a closer look. I've got a list as long as your arm of things that trigger the bad_ref flag. Sometimes they're painfully obvious.

not2easy

4:15 pm on Jun 12, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Much depends on the source and destination countries. The DMCA is a US law that applies if they are hosted in the USA, and may or may not apply in other jurisdictions. Look up their host and see where they are hosted, then you can find the appropriate laws for that country. "Safe Harbor" hosts comply with removal requests if procedure is followed. Their host should get a letter after a request is ignored or denied from the site. Keep your records.

keyplyr

7:24 pm on Jun 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The DMCA is a US law that applies if they are hosted in the USA, and may or may not apply in other jurisdictions
However, this format (as I suggested) is what Google wants in order for them to process a removal request. Been there, done that.

not2easy

7:54 pm on Jun 12, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



@keyplyr - hope you didn't think I was disputing your advice, I know you've dealt with this kind of problem yourself and know what you're talking about. I was offering generic information about DMCA, I have seen cases when it does not handle the issue and that was due to physical location, so I mentioned it.

There are two issues: Requesting the site (and then their host) removes your content, and requesting that Google removes the content from their index. Removal from Google comes after contacting the site and hosting provider. Sometimes the host handles the removal which makes removal from Google much simpler. Years ago Google was very proactive in removals, today there are hoops to jump through.

Google offers support and tools for Removal Requests [support.google.com] that has much more detailed information.

keyplyr

8:06 pm on Jun 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No problem not2easy, just pointing out that no matter where the offending machinery resides, Google itself wants a specific format. Thanks for posting that link; makes it simpler.

As far as the host actually respecting the C&D, the Berne Convention, the Universal Copyright Convention (UCC), the Digital Millennium Copyright Act (DMCA) and also under the European Union's Intellectual Property Rights Enforcement Directive (IPRED) cover most all countries, except some Asian regions.

However, most all upstream net blocks will support a properly formatted remove request.

QuaterPan

8:47 am on Jun 14, 2018 (gmt 0)



Thank you for your messages, but I worry about retaliation now...

Leosghost

11:49 am on Jun 14, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They appear to have Yahoo as their host..domain name resolves to a Yahoo server in USA ( for their domain name fishing in .is )..apparently their variant .li is currently for sale at what was their Swiss registrar ..Reading their blog ( the site's, not the registrar's) they are not very happy with the actions of the Swiss company..Yahoo ( owned now by Oath who are owned by Verizon ) who are based in California USA should comply with a DMCA request.

Good luck..

btw..as they are hosted in a Yahoo server farm, it is likely that they scrape from a Yahoo range or ranges ..you could just block all of Yahoo's ranges..of course you'll need a way to hear back from Yahoo corp in reply to your DMCA.

Leosghost

11:51 am on Jun 14, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Apologies for the typo..that should be
( for their domain name finishing in .is )
..