algorithm - Best approach for building a custom web-crawler for finding sites with some arbitary text in the URL? -


I would like to search for all sites, whose keyword 'surfing wave' is somewhere at their address, it is very easy! But, without using any search engine, which means, writing a pure web crawler.

Problems, I think, I have to face:

  1. It will obviously
  2. It continues until it is It does not last until it runs until the first 2000 sites ...

Am I right? Or in other words, should I try to do this also? I do not want to use search engines because they limit the amount of results.

What is the Consciousness that Limits Search Engine Results? They are especially for this purpose to find things and you should use it. Even if you stop writing your crawler, the crawler will need some starting points (start url) to start crawling. You may be able to use search results from Google, but then you will not end up with a better result (and after a long time) you are already searching for the same URL / address.


Comments