Not sure why I'm posting this.
Perhaps I'm just confused by the capability and the lack of non-standard procedures.
This creature (for lack of a better word) had offered snapshots of less than a dozen of hundreds of pages. Each snapshot offered a reference date. I explored the most recent addition.
Their FAQ offers the following explanation:
"What software do you run and how data is stored ?
The archive runs Apache Hadoop and Apache Accumulo. All data is stored on HDFS, textual content is duplicated 3 times among servers in different datacenters and images
are duplicated 2 times. All datacenters are in Europe."
Their FAQ also provides a brief explanation as to why they do not comply with robots.txt
The FAQ also provides that the tool used is a browser plug-in.
The initial visitor used my page to create a Wiki page and Wiki Media was next visitor (see Below #2).
The Wiki page does credit the active web page, unfortunately I generally deny Wiki refers, as well as the folks whom create the same Wiki pages (as I did in this instance; a MediaCom IP range).
1) A standard visit to the page using a standard browser ("Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)")
2) followed up by the following (note the trailing comma in lines # 4 & 5, which would have resulted in a 404 even if the 403 didn't exist.):
208.80.153.167 - - [21/Sep/2013:09:06:43 -0600] "GET /MyFolder/MyPage.html, HTTP/1.1" 403 573 "-" "LinkSaver/2.0"
208.80.153.167 - - [21/Sep/2013:09:06:43 -0600] "HEAD /SameFolder/SamePage.html, HTTP/1.1" 403 143 "-" "LinkParser/2.0"
144.76.45.19 - - [21/Sep/2013:09:18:54 -0600] "GET /SameFolder/SamePage HTTP/1.1" 403 644 "http://www.google.com/" "Mozilla/5.0 (compatible; Windows NT 5.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/535.19"
208.80.153.168 - - [21/Sep/2013:09:22:15 -0600] "HEAD /SameFolder/SamePage, HTTP/1.1" 403 143 "-" "COIParser/2.0"
69.208.90.zz - - [21/Sep/2013:13:24:19 -0600] "GET SameFolder/SamePage, HTTP/1.1" 403 644 "http://en.wikipedia.org/wiki/SimilarName" "Mozilla/5.0 (Windows NT 6.0; rv:23.0) Gecko/20100101 Firefox/23.0"
69.208.90.zz - - [21/Sep/2013:13:24:20 -0600] "GET /favicon.ico HTTP/1.1" 200 419 "-" "Mozilla/5.0 (Windows NT 6.0; rv:23.0) Gecko/20100101 Firefox/23.0"
The 69.208. range is an ATT/SBC PPPoX Pool, many of which I've had denied for more than a decade, due to repeated non-standard practices. I do make some exceptions for known associates.
My question is if the THREE snapshot requests were denied, why is there not a corresponding log of the initial page request that was mirrored in their archives?
For "my widgets", there is absolutely no benefit to having materials duplicated (even reworded materials) on a Wiki page. In fact, some "widget Wiki pages" draw traffic from existing "widget orgs own pages".