PHP Browser-Based Website Crawler

I figured out a way to create a php website crawler that can be run via web browser instead of command line. You can use this to harvest links from a website for use in a database or search engine…or to see how easily a spider or bot can creep your site. Try it here!

<head><title>PHP Website Crawler</title></head>
<font face="verdana" color=#66ccff">
<form id="crawl" method="post" action="">

<input name="url" type="text" id="url" value="<?php $url; ?>" size="70" maxlength="255" />
<br />
<br />
<input type="submit" name="Submit" value="Crawl!" />
<br />
if (isset($_POST['url'])) {
$url = $_POST['url'];
$f = @fopen($url,"r");
while( $buf = fgets($f,1024) )
$buf = fgets($f, 4096);
preg_match_all("/<\s*a\s+[^>]*href\s*=\s*[\"']?([^\"' >]+)[\"' >]/isU",$buf,$words);
for( $i = 0; $words[$i]; $i++ )
for( $j = 0; $words[$i][$j]; $j++ )
$cur_word = strtolower($words[$i][$j]);
print "$cur_word<br>";

Be Sociable, Share!

Leave a Reply

Your email address will not be published. Required fields are marked *