PHP Browser-Based Website Crawler

I figured out a way to create a php website crawler that can be run via web browser instead of command line. You can use this to harvest links from a website for use in a database or search engine…or to see how easily a spider or bot can creep your site. Try it here!

<html>
<head><title>PHP Website Crawler</title></head>
<body>
<font face="verdana" color=#66ccff">
<form id="crawl" method="post" action="">

<label>URL:
<input name="url" type="text" id="url" value="<?php $url; ?>http://website.com" size="70" maxlength="255" />
</label>
<br />
<br />
<label>
<input type="submit" name="Submit" value="Crawl!" />
</label>
<br />
</form>
</body>
</html>
<?php
if (isset($_POST['url'])) {
$url = $_POST['url'];
$f = @fopen($url,"r");
while( $buf = fgets($f,1024) )
{
$buf = fgets($f, 4096);
preg_match_all("/<\s*a\s+[^>]*href\s*=\s*[\"']?([^\"' >]+)[\"' >]/isU",$buf,$words);
for( $i = 0; $words[$i]; $i++ )
{
for( $j = 0; $words[$i][$j]; $j++ )
{
$cur_word = strtolower($words[$i][$j]);
print "$cur_word<br>";
}
}
}
}
?>

Be Sociable, Share!

Leave a Reply

Your email address will not be published. Required fields are marked *