Forum Moderators: phranque

Message Too Old, No Replies

Private URL found by Crawler (security issue?)

         

explorador

5:21 am on Aug 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi webmasters, this is very strange. I created a PERL script that helps me in certain tasks managing a website. The script is only FOR ME, nobody has the url and the script itself is not included on the menu of my cms (that I built myself).

The script sends some emails, and does this only when called. The thing is I received many of these emails without the script being called.

Please take this very seriously:

  • Its a Perl script under cgi-bin/and many folders inside... so only knowing the exact url you would be able to call it
  • NOBODY (I insist) NOBODY knows the existence of this script, nobody has the url, still it was called.
  • Please take this in consideration: the name is not natural, is not public and not listed anywhere... and being a perl script under the cgi-bin folder is not possible to do a list of the folder to see its contents.

This is not a shared server, is the physical server at the company, is not on another building or country, is being managed under a lot of security... no kidding.

I created a security routine to identify who called the script, ip and such... IT IS THE ALEXA CRAWLER. The thing is, there is no way on earth the script could be found.

I already talked to my sysadmin and we can't figure out this issue. Only the two of us have complete access and the other users can't view other files than theirs. A security issue has been discarded already.

And... nobody has access to my computer.

How can you explain this?, the only explanation I see is bookmarks... I created this script and added a bookmark on firefox, we had this problem and then I renamed the script... it was called again. I now added the security validation routine and it reveals the Alexa Crawler is calling the script... is there some buggy functions on Firefox?

Does firefox somehow reports the urls on the bookmarks to alexa?

Creepy.....

kaled

7:08 am on Aug 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Any extension/toolbar on any browser may phone home and report the urls you visit. Also, anti-virus software can include link-checking features - probably some firewalls do too.

For instance, Googlebot may crawl any url that you visit if you have the Google toolbar installed with the page-rank feature enabled.

You need to set up a password.

Kaled.

explorador

2:50 pm on Aug 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Kaled. Zero toolbars here, we use ESET antivirus and the bookmark was added to firefox, no toolbars of any kind installed, zero spyware or adware present.

Leosghost

6:37 pm on Aug 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



vanilla firefox does "talk back" ..one obvious way is if you start to type into the search box it "proposes" search terms ..ie you type "goo and it will add the gle and present you with a drop down of live updated choices ..watch the data in/out flow whilst you are typing ..it's talking home to google ..

if you think it may also be talkng to alexa ( I've never looked ) run a packet sniffer ..eset does phone home if you ticked participation in the "threat" program when you installed ..otherwise it only connects to check for updates to tables..

MatthewHSE

6:19 pm on Aug 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



NOBODY (I insist) NOBODY knows the existence of this script, nobody has the url, still it was called.

If it was called, and you didn't call it, then *somebody* knows about the script. ;) You should password-protect the file or add some sort of authentication to use it.

Rule #1 on the Internet is that if it's not protected, it's public! (Rule #2 is that it might become public even if it's protected, but that's a separate issue.)

physics

6:54 pm on Aug 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



.htpasswd is your friend ;)

D_Blackwell

3:48 am on Aug 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You need to set up a password.

Everyone is right. There is broken link (LOL) in the chain - a leak that you haven't thought of yet. These things usually wake me up about 3:00am and - BAM - I know what I knew.

There's a reason that we sometimes password protect password protected directories within another, et cetera.
...........

Also - to state the obvious - passwords that do not change should be considered compromised after X amount of time. Period.
...........

I live in a secure building. Keypad combinations to every door. I have lived here for over five years; only ten units, but multiply that by turnover, friends, 'authorized' people who need access, and how long until everyone in town has access? With more than one of my lease renewals I STRONGLY advised, and documented, that I considered security to the building to be lacking and that my notification of these concerns could constitute evidence of extreme negligence if there was ever a problem. (They are good people - but do tend toward fixing problems that should never have occurred. I've watched them put a torch to cash more than once this way.)

I had, nicely, suggested a few times that they really needed to at least change the front and back doors combinations. In person, and worse, correspondence.

A number of months ago there was an incident - police, public spectacle, the whole nine yards. Keypad codes were VERY swiftly updated for the first time in YEARS (probably ever). And they KNEW they had, by definition, a de facto security breach. Just lucky no one was killed. If I had been interviewed, deposed, what-have-you - I have nearly every email and document ever sent or received. They really lucked out of not getting sued into oblivion. And they've got assets.

explorador

9:42 pm on Aug 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ok, take per example I create the perl script wcyDtodaY.pl. There is no way nobody can find it unless I give away the address. There is no way to browse a cgi-bin dir. There is no other users accessing my folders.

Try guessing the name of the script... impossible. The uppercase-lowercase combination makes it impossible added to no way to hit the name.

I just found out another script is being called by alexa. Both of the scripts are on my bookmarks and noooooo, nobody has access to my computer and these are no public scripts, no link to them nowhere.

Guess FX is calling Alexa... Will research some more.

D_Blackwell

10:45 pm on Aug 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's a leak. I mean, Alexa isn't out hacking websites for scripts, right.? Maybe you have a Google account and don't logout. If you remain logged onto a Google account it tracks every page you visit. I don't know what the policy is for what they do with the trail of breadcrumbs, but could/would they index such a page; however deeply?

You've already acceded that the 'bookmark' is a good place to start as a source.

I would work it both ways - look for where/how Alexa is finding the page - and look for where/how it is leaking out.

How many scripts are bookmarked? Are all now 'out there'?

If the leak is directly from the bookmark software, maybe you need a directory/file(s) for the links to these kinds of pages. Set up a something that you can bring quick from a Run command. (Easier and faster than bookmarks if have a lot that you use a lot - or have a lot and don't like BM options. I know there are a ton, and don't like nearly any. Too much research - too many links- another topic.)

MatthewHSE

1:25 am on Aug 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there an index file in the directory containing your script, or does it (or any directory above it) load up a default page showing a list of all the files it contains?

explorador

4:10 pm on Aug 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes there is a leak.

The script was called by the IP: 67.202.41.3 with no trace of referral. It identifies itself as ia_archiver (+http://www.alexa.com/site/help/webmasters, crawler@alexa.com. Port: 80

As said before there is nothing on the web pointing to this file, only my bookmark. I have ZERO toolbars of any kind.

encyclo

4:15 pm on Aug 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One obvious question not yet asked: are your server access logs inadvertently publicly-accessible?

MatthewHSE

5:23 pm on Aug 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A couple of pertinent points:

1.) You get emails when this script is called - so far you've only gotten those emails when you've used the script yourself, or when Alexa has found it.

2.) Alexa is a crawler, not a hacking tool.

3.) There are several ways Alexa could have found the page. Spyware or hacking techniques are not likely candidates.

4.) The file in question is unprotected, and therefore publicly accessible.

For these reasons, is it important where the "leak" is? I'd suggest you protect the file and move on - I don't see anything scary in this situation except that an important and possibly sensitive utility is not protected against unauthorized use.

explorador

6:30 pm on Aug 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm amazed by the difference in how a thread goes on other parts of the forum... not very interesting, I don't want others to "see" my point, the thing is, as others have mentioned, FX is calling "home".

thanks for the replies. Over..

Leosghost

7:03 pm on Aug 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



what is the exact build of FF you run ?

( I keep all the old installers in case I ever need them ..likewise IE , Opera ..)

Have some spare machines to experiment with "virgin" OS and browser install ..and stripped xp pro sp3( no IE )..packet sniffer ..;)

D_Blackwell

8:15 pm on Aug 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



.....I don't want others to "see" my point,.....FX is calling "home"......

Okay - I don't get it. Others have been suggesting potential points of leak and potential options fixing or avoiding the issue.

I missed the part where it definitively became FX calling home (much less confirmed), though though the BM tool is one pretty obvious thing to check out.

Going back over this thread simply doesn't leave me feeling the responses are arguing whether or not to "see" your point. They are sifting the options. That necessarily includes the idea of "Ask every question. Question every answer." (Especially when the early posts go on about how this, that, and the other are impossible - but here is this big problem.) Challenging one or more assumptions will drill dry holes if they stand up. However, challenging assumed assumptions often root out solutions pretty quickly.

Anyway, problem solved I reckon.

<edit>typ0</edit>

phranque

8:36 pm on Aug 14, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



is it possible the url to your script appeared on an error page?

Brett_Tabke

9:13 pm on Aug 14, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Off the top (most have been mentioned):

> How can you explain this?

#1 Browsers do leak referrals. If you are on a page and then go to the address bar for that page, and type in something else - presto, you can "leak" a referrer. IE has done it randomly for years. Or FF with it's autocomplete and send a referral to google.

Once your browser has "leaked" a referral to another site, there are a myrid of ways that a search engine, or analytics service like Alexa can find it. Someone could surf out of those logs. Someone could leave their logs open. Someone could do anything like the below with a toolbar on that sends the url to the search engine for checking.

#2 Proxy cache at your ISP. The proxy cache at the largest ISP in the US is managed by a 3rd party. That third party shares it's proxy logs with certain search engines. Other ISP's are in partnership with search engines and routinely sell their log data to 3rd parties - including search engines.

#3 Toolbars. You run the Google toolbar? Have the page rank feature turned on? You just handed Google the url to your 'private' server.

#4 (Phranque is right on), you have an open log file somewhere?

#5 You use an analytics service? You could have just handed your url to a search engine.

#6 You running anti virus on your desktop? What else is setting between your browser and the net?

#7 Your browser. Do you have "safe surfing" turned on? It checks the url to make sure you are not going into a 'bad neighborhood'.

#8 3rd party javascript? CSS? Any off site page calls? If so, you just leaked a referral to all of those as well.

#9 Other inserts: any affiliate code on the page? Ad code? How about a link that you may click and leak a referral? Any hrefs on the page?
Alexa is owned by amazon. It might point to "something" on the page managed by amazon. Their cloud services are powering so many services today (ad programs, affiliate programs, script serving, gfx serving, and whole site serving like Twitter) that it is easy to forget there is other code being pumped out by the program. Alexa *does* crawl out of Amazon logs.

#10 Browser addons: there are so many 3rd party extensions out there running under FF. Alot of them phone home data.

-------

it is hard to search for it, but this topic comes up about every six months. Usually it is google getting accused of hacking someones 'secret' url.

JS_Harris

6:54 am on Aug 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



NOBODY (I insist) NOBODY knows the existence of this script, nobody has the url, still it was called.

Someone, maybe even you, visited the page with the Alexa toolbar active. The link was passed along to Alexa and their crawler eventually came looking. Google toolbar can do the same.

Not linking to a page isn't enough if security is an issue, most browsers record history and many "features" aggregate that history.

I'd like to add that images uploaded to a folder are just as vulnerable even if they aren't displayed anywhere. I've tested this myself. I placed 30 images in a folder and created one page in which I included a link to each image (an href link, no images were ever displayed). Google image search began showing the images a week or so later and when you click on the image thumbnail from image search you end up on the page with lots of links to images, you don't see the images themselves though they are indexed.

If it's online and not secured it will be discovered, guaranteed.

Marcia

2:14 pm on Aug 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got a private URL showing by Alexa that's just a single page used for a "Start Page" with links to frequently visited sites - which is totally excluded in robots.txt.

I believe it's been picked up by a FF add-on - specifically Search Status, which displays Google PageRank, Complete and Alexa.

jarhboosh

7:38 pm on Aug 16, 2009 (gmt 0)

10+ Year Member



Hello everyone!

I found this thread because the exact same issue happen on my website as well: alexa crawler called a service script I've just created. It also did something completely unexpected. Read on.
The setup:
1) The script was not supposed to be called by anyone so the name was selected as a random 18-character (digits, letters) name.
2) The script does not send emails to anyone, like that in the original post, just does some simple checking on the status of website files (no TCP/IP or other calls).
3) The script is not listed, mentioned, linked or anything from anywhere.
4) The script takes one parameter (GET request) that acts as a password.

Here is what happened: I was testing the freshly written script using my FF (3.0.13), no alexa toolbar but some non-alexa add-ons installed.
Once during the tests I MISTYPED the password parameter to the script. On the next round, I realized the mistake and re-typed the call correctly. That happened Aug 9.
Yesterday, August 15, in Apache logs I found a record of alexa crawler calling my script MAKING THE SAME TYPO i did earlier. Funny, that was the only call the crawler made. Some crawling...

The repetition of this typo is a clear evidence that the information was either leaked to alexa/amazon or it was eavesdropped by alexa/amazon.

No matter how actively or passively, directly or indirectly alexa/amazon obtained this information, their usage of it is unequivocally wrong ethically.

I plan to contact them and let them know what [expletive deleted] they are. Any other ideas on how to deal with them, maybe in a funny way, would be greatly appreciated.

Thanks in advance!

[edited by: phranque at 10:48 pm (utc) on Aug. 16, 2009]
[edit reason] expletive deleted [/edit]

D_Blackwell

7:59 pm on Aug 16, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No matter how actively or passively, directly or indirectly alexa/amazon obtained this information, their usage of it is unequivocally wrong ethically.

Not sure that I can buy into this. Data is sold and traded like any other commodity. You mention FF with 'some add-ons'. IF the source was an add-on phoning home and that information was 'passed on' - oh well. Seems to me like this thread should back up to locking the barn door on URIs that you do not want accessed. Not all add-ons are 'free'; this isn't new news.

anallawalla

4:49 am on Aug 19, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



For a long time, Alexa search was a great way to find stuff you weren't supposed to take for free. I think it simply ignored robots.txt directives. Then I heard this changed and that robots.txt was honoured.

Several FF plugins warn you that they pass on data to Google just as though you used the Google toolbar.

Does that script URL show up in any search engine?

jarhboosh

5:25 am on Aug 19, 2009 (gmt 0)

10+ Year Member



anallawalla,

Thank you for the reply!

Alexa's crawler ignores robots.txt. It keeps banging on my password-protected directories that are clearly disallowed in robots.txt file.

I started uninstalling some FF plug-ins to find the "stool pigeon".

And, No, so far the script URL does not show up in the 3 major SE. Neither any other crawler was caught calling it.

I also noticed that alexa crawler pokes around trying to guess (!) directory names. Now a good crawler should not do that, it simply ain't no "crawling" but spying.

I'm tempted to ban it altogether.

AlexK

7:02 am on Aug 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Certainly an eye-opening thread. Very useful to be informed/reminded of the ways that (at least some) browsers & SEs currently work.

The world-weary "sophistication" displayed in some posts is disappointing.

Brett_Tabke

7:59 pm on Aug 29, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



> Alexa's crawler ignores robots.txt.
> It keeps banging on my password-protected directories

How is this any different from Google, Yahoo, Microsoft, or ASK? All of them spider content restricted by robots.txt. That does NOT mean they index it for public consumption.

> I also noticed that alexa crawler pokes around trying to guess (!) directory names

No it does not try to 'guess' directories. It uses the same algo that Yahoo uses and causes an intentional 404 to get a reference page to compare to other pages. Once it matches your 404 template, it knows what pages are really 404 regardless of the http status that may or may not be accurate.

> The repetition of this typo is a clear evidence that the
> information was either leaked

Ya, by your ISP proxy cache that is serviced by a Amazon/Alexa. No big whoop.

Next time, use it to your advantage and leave bait laying around for your competition to find... ;-)

jarhboosh

10:34 pm on Aug 29, 2009 (gmt 0)

10+ Year Member



Hello Brett,

thank you for your enthusiasm and cavalier spirit!

In reply to your remarks:
> ...Google, Yahoo, Microsoft, or ASK? ... spider content restricted by robots.txt

Well, NO, they do not do that on my websites. Some rogue unknown spiders are all over the place but the majors respect robots instructions on my websites. I don't claim I know about their behaviour on other sites but, on mine -- they are fine.

> No, it does not try to 'guess' directories...
Yes it does. The behaviour in your example has nothing to do with guessing, so it does not pertain to the discussion.
When a crawler goes name fishing by probing: name.com/m name.com/mobile name.com/mobil name.com/mobi name.com/mob name.com/ipod -- to discover my site's mobile version which is not advertised anywhere, they can call it any name but it ain't no stinking crawling.

> Ya, your ISP proxy cache that is serviced by a Amazon/Alexa.

Nice try. But NO. The answer is much simpler: SearchStatus add-on for Firefox leaks to Google, Alexa and Compete.
When the SearchStatus add-on was uninstalled, the leaking stopped. That's all.

> No big whoop.

I could be too young to share this reaction and here is my point of view:
The fact that "big guys" or everyone do something does not make "that something" right.
My mailman opening my letters, my ISP snooping on my passwords, or my bank appropriating my money would still be wrong despite the fact they have access to these resources.

A post earlier, AlexK called it tongue-in-cheek "the world-weary sophistication".
I would just call this "complacency".

I want to close this thread with the material evidence:
in my case the "SearchStatus" add-on for Firefox reported all typed URLs to Google, Alexa and Compete. If you want to prevent that - uninstall the add-on.

D_Blackwell

11:41 pm on Aug 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I want to close this thread with the material evidence:
in my case the "SearchStatus" add-on for Firefox reported all typed URLs to Google, Alexa and Compete. If you want to prevent that - uninstall the add-on.

You ask for help. People take in interest in the topic and you can't be bothered to 'close the thread' earlier with 'the solution'?

People that participate in a thread, give it time or thought, are entitled to see the conclusion (when/if there is one). That's not right. Your last post on this was 19 August. Here it is 29 August. You got more than you gave to this thread.

Took a quick look at the description of this extension. Ya think maybe this could have been suspected pretty easily before installation? I love my extensions, but they can come with a sharp double edge. This one as much as tells you what it is going to do. Just dropping out of the thread because the answer might be a little embarrassing is not straightup or standup.

jarhboosh

2:16 am on Aug 30, 2009 (gmt 0)

10+ Year Member



Hello Blackwell,

Really sorry I did not report from Aug 19 through Aug 29.
I was busy uninstalling plugins and tracking logs.
Thank you for straightening me up!

I'm really in awe from your piercing analysis of my negative benefit-vs.-contribution balance on the thread. It's hard to imagine how much time and effort you afford to keep your tabs on everyone. That's great!
Can I get an extra point for identifying the actual culprit in my case?

[edited by: Brett_Tabke at 3:55 am (utc) on Aug. 30, 2009]