Forum Moderators: coopster & phranque

Message Too Old, No Replies

developing web spider

web spider, language, platform

         

abeen

6:35 am on Apr 2, 2008 (gmt 0)



I am trying to develop a web spider but new to web spider architecture.
Which programming is the best for developing web spider?
I would want to work with python, wut u say?

phranque

10:07 am on Apr 2, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], abeen!

i would find an existing script/package/module in a language with which i am familiar and modify that to suit if necessary - or at least use it to learn how it works before you write your own.

adwatson

5:25 pm on Apr 7, 2008 (gmt 0)

10+ Year Member



I don't know python - I'd say use whatever you're comfortable with, as long as it has good text handling/regexp/whatever - since you'll need to be looking for links within the text of the pages.

IanKelley

9:51 pm on Apr 20, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



More important than which is language to use is "do you have massive server farm and bandwidth required to spider even a tiny portion of the web?"

chorny

10:10 pm on May 8, 2008 (gmt 0)

10+ Year Member



abeen, use Perl with WWW::Mechanize (for complex cases) or LWP.