I had a
previous post [webmasterworld.com] about this a month or two ago, thought it was corrected to no avail.
Basically - We have a website which is small in size (7-10 pages max) but commands its presence in a moderately popular market. The site has (and still does) rank well in the top 3 results for many highly competitive phrases, for more than 5-7 years. We have a strong presence in the Google instant answer box, and throughout the serp.
Recently, I started finding backlinks from randomly hacked pages on live domains and after reviewing them, found (assumed) that someone had copied our site from the source code (right-click, view source, copy) and pasted it into a blank html document, changed ONLY the canonical URL to their URL, and let it fly. They are pointing thousands of low quality links to the new pages, which are being indexed with our exact copy content. Again, the only change is the canonical URL, which bit us a month or two ago, when Google used their URL as the cache copy in serps.
Skip ahead a month or so, and we've found about 30 more copies of our homepage on new domains.
A few facts:
- ONLY the homepage is being copied. No subpages.
- They are naming the URL after our website domain, which is a close EMD for the market were in.
-- Ex: Ours: thiswebsitedomain.tld
-- Ex: Theirs: hackeddomain.TLD/thiswebsitedomain.html
- The
ONLY change they are making to the source code, is the canonical URL, which they point to the hackeddomain.TLD/thiswebsitedomain.html page.
- I was able to get all previous versions shut down with the use of DMCA. Several of the unhappy webmasters made it clear they thought it was us hacking their sites for backlinks. /sigh
- They have a JS referrer script in the header (hosted on a Google property no less) that redirects any search visitors to the search engine itself.
Some of the domains they previously used almost made it seem like a personal vendetta against the site and it's rankings. Domains were named abstract phrases like: "youllneverrankagainmywebsitename.tld", "thiswillknockyououtofserpsmydomainname.tld, etc. Now, they are using obviously compromised domains to handle the page postings, which tells me it's more than a script kiddie at work. All domains they previously used were behind a privacy registration, thus no info could be found about who the owners were.
As mentioned earlier, I assumed they were doing it by hand, Copy our html source > paste html source > change canonical url, publish. After this recent round, I decided to try and learn more since its obviously more than that. The hack pages are updating as our content updates.
I updated our site yesterday after finding a new page again, and after viewing the hacked page a second time, the content updated itself to reflect what we currently have on our site. In other words, its updating automatically. Chasing down the logs, I found that the referrer is the domain, and a php script (php 4.5, etc) which is likely just scraping the site and re-posting to the hacked page.
I've been around hosting/servers enough in the last 15+ years to spot hacks. I have checked my site for vulnerabilities and as far as I can tell, there are none. It's hosted on a shared VPS, and aside from a minimum standard load of WP plugins I have known and trusted for years, there appear to be no vulnerabilities. No changes to WP core files, nothing in the file system that looks out of place, etc.
I have a list of roughly 30 or so domains that currently have the copy in a hacked page. It's getting to be too much to rely on DMCA, which is just not scalable IMO.
Now... for questions and hopefully, good opinions.
1 - How do I quickly get our content off these sites?
I've considered
temporarily setting my homepage to a simple 301 redirect. ie: Header( “HTTP/1.1 301 Moved Permanently” );
Header( “Location: mydomainname.tld” );
Then, visit each of the 30 pages, refresh them to grab the redirect and move along. This should work for a temporary fix and will only take 2-3 minutes to do all the pages. After the pages have indexed the fresh redirect, revert my page back to the original.
2 - How do I trap the script and stop it from copying the homepage going forward?
Any help is appreciated... We actually have this issue on 2 websites that are similar in nature, but hosted on different servers, etc. Its not WPMU, or any other connecting factor, other than they are registered under our parent company LLC.
Thanks in advance.