archive.is

Not sure why I'm posting this.
Perhaps I'm just confused by the capability and the lack of non-standard procedures.

This creature (for lack of a better word) had offered snapshots of less than a dozen of hundreds of pages. Each snapshot offered a reference date. I explored the most recent addition.

Their FAQ offers the following explanation:
"What software do you run and how data is stored ?
The archive runs Apache Hadoop and Apache Accumulo. All data is stored on HDFS, textual content is duplicated 3 times among servers in different datacenters and images
are duplicated 2 times. All datacenters are in Europe."

Their FAQ also provides a brief explanation as to why they do not comply with robots.txt

The FAQ also provides that the tool used is a browser plug-in.

The initial visitor used my page to create a Wiki page and Wiki Media was next visitor (see Below #2).
The Wiki page does credit the active web page, unfortunately I generally deny Wiki refers, as well as the folks whom create the same Wiki pages (as I did in this instance; a MediaCom IP range).

1) A standard visit to the page using a standard browser ("Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)")

2) followed up by the following (note the trailing comma in lines # 4 & 5, which would have resulted in a 404 even if the 403 didn't exist.):
208.80.153.167 - - [21/Sep/2013:09:06:43 -0600] "GET /MyFolder/MyPage.html, HTTP/1.1" 403 573 "-" "LinkSaver/2.0"
208.80.153.167 - - [21/Sep/2013:09:06:43 -0600] "HEAD /SameFolder/SamePage.html, HTTP/1.1" 403 143 "-" "LinkParser/2.0"
144.76.45.19 - - [21/Sep/2013:09:18:54 -0600] "GET /SameFolder/SamePage HTTP/1.1" 403 644 "http://www.google.com/" "Mozilla/5.0 (compatible; Windows NT 5.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/535.19"
208.80.153.168 - - [21/Sep/2013:09:22:15 -0600] "HEAD /SameFolder/SamePage, HTTP/1.1" 403 143 "-" "COIParser/2.0"
69.208.90.zz - - [21/Sep/2013:13:24:19 -0600] "GET SameFolder/SamePage, HTTP/1.1" 403 644 "http://en.wikipedia.org/wiki/SimilarName" "Mozilla/5.0 (Windows NT 6.0; rv:23.0) Gecko/20100101 Firefox/23.0"
69.208.90.zz - - [21/Sep/2013:13:24:20 -0600] "GET /favicon.ico HTTP/1.1" 200 419 "-" "Mozilla/5.0 (Windows NT 6.0; rv:23.0) Gecko/20100101 Firefox/23.0"

The 69.208. range is an ATT/SBC PPPoX Pool, many of which I've had denied for more than a decade, due to repeated non-standard practices. I do make some exceptions for known associates.

My question is if the THREE snapshot requests were denied, why is there not a corresponding log of the initial page request that was mirrored in their archives?

For "my widgets", there is absolutely no benefit to having materials duplicated (even reworded materials) on a Wiki page. In fact, some "widget Wiki pages" draw traffic from existing "widget orgs own pages".

archive.is

wilderness

iomfan

lucy24

iomfan

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week