数据采集利器——PHP Simple HTML DOM Parser
2014-09-19 08:25:46PHP Simple HTML DOM Parser(PHP实现的简单的HTML DOM解析器)。
Description, Requirement & Features
- A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
- Require PHP 5+.
- Supports invalid HTML.
- Find tags on an HTML page with selectors just like jQuery.
- Extract contents from HTML in a single line.
Download & Documents
- Download latest version form Sourceforge.
- Read Online Document.
Quick Start
- How to get HTML elements?
[php]
// Create DOM from URL or file
$html = file_get_html(‘http://www.google.com/’);
// Find all images
foreach($html->find(‘img’) as $element)
echo $element->src . ‘
’;
// Find all links
foreach($html->find(‘a’) as $element)
echo $element->href . ‘
’;
[/php]
- How to modify HTML elements?
[php]
// Create DOM from string
$html = str_get_html(‘
$html->find(‘div’, 1)->class = ‘bar’;
$html->find(‘div[id=hello]’, 0)->innertext = ‘foo’;
echo $html; // Output:
[/php]
- Extract contents from HTML
[php]
// Dump contents (without tags) from HTML
echo file_get_html(‘http://www.google.com/’)->plaintext;
[/php]
- Scraping Slashdot!
[php]
// Create DOM from URL
$html = file_get_html(‘http://slashdot.org/’);
// Find all article blocks
foreach($html->find(‘div.article’) as $article) {
$item[‘title’] = $article->find(‘div.title’, 0)->plaintext;
$item[‘intro’] = $article->find(‘div.intro’, 0)->plaintext;
$item[‘details’] = $article->find(‘div.details’, 0)->plaintext;
$articles[] = $item;
}
print_r($articles);
[/php]