I'm looking for a tool, but I'm not quite sure where to find it. What I need is basically a focused webcrawler... I want a script, preferably php if possible, which I can give a particular seed url and a particular term to search for. From that seed url, it should traverse the site looking for off-site links which contain the search term somewhere in the linked page. Then it should repeat the process with those sites, and so on. The goal is basically to try to pick up sites that the major search engines might miss, if at all possible, while still maintaining a high signal to noise ratio. (ie, not pulling down spammy ad sites)
Anyone know of anything which behaves like this? I know I could sit down and try to write my own from the example snippets I can find of various spiders, but frankly I'm just not feeling that ambitious at this precise moment. I'd prefer something prefabbed that will do what I want out of the box.
Anyone know of anything which behaves like this? I know I could sit down and try to write my own from the example snippets I can find of various spiders, but frankly I'm just not feeling that ambitious at this precise moment. I'd prefer something prefabbed that will do what I want out of the box.