I am trying to develop a web spider but new to web spider architecture. Which programming is the best for developing web spider? I would want to work with python, wut u say?
i would find an existing script/package/module in a language with which i am familiar and modify that to suit if necessary - or at least use it to learn how it works before you write your own.
adwatson
5:25 pm on Apr 7, 2008 (gmt 0)
I don't know python - I'd say use whatever you're comfortable with, as long as it has good text handling/regexp/whatever - since you'll need to be looking for links within the text of the pages.
IanKelley
9:51 pm on Apr 20, 2008 (gmt 0)
More important than which is language to use is "do you have massive server farm and bandwidth required to spider even a tiny portion of the web?"
chorny
10:10 pm on May 8, 2008 (gmt 0)
abeen, use Perl with WWW::Mechanize (for complex cases) or LWP.