Image

Image_benbenben_ wrote in Imagephp

follow up to site scraping...



So I found a pretty good script to learn from and have been hacking it up to fit my needs...
Although I've been having trouble with one line...

$hrefs = $xpath->evaluate("/html/body/a");

So I searched google for evaluate(); and haven't found much information about it.

What I need the above line to do is not just scrape for a (anchor) tags but also rel. (as in rel="contact"),  So that I will get only xfn link tags.

How can I add, more items to scrape for in the evaluate(); function???



<code>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>scrape</title>
<meta name="keywords" content="" />
<meta name="description" content="" />
</head>
<body>
<?
$target_url = "http://livejournal.com";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
    echo "<br />cURL error number:" .curl_errno($ch);
    echo "<br />cURL error:" . curl_error($ch);
    exit;
}
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body/a");
for ($i = 0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    $url = $href->getAttribute('href');
    $xfn = $href->getAttribute('rel');
    echo "$url $xfn $alt<br />";
}
?>
</body>
</html>
</code>