前段时间看有人在做关于美女图片的采集 ,随自己随手写了个 ,发现正则真的是薄弱了,哎 不知道为什么正则的规则一直记不住 why ?
腾讯微博代码采集Demo非API接口
$str = "比基尼"; $url = "http://k.t.qq.com/k/" . urlencode($str); $html_list = get_html($url); preg_match("/\<ul id=\"talkList\" class=\"LC\"\>(.+?)\<\/li\>\<\/ul\>/ms", $html_list, $preg_list); $str_arr = $preg_list[0]; preg_match_all("/\<div class=\"msgCnt\"\>(.+?)\<\/div\>(.+?)\<div class=\"picBox \"\>\<a href=\"(.+?)\" target=\"_blank/ms", $str_arr, $new_arr); for($i = 0;$i < count($new_arr[1]);$i++) { echo strip_tags($new_arr[1][$i]) . "<img src='" . $new_arr[3][$i] . "'><br>"; } function get_html($url) { while (true) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_TIMEOUT, 3); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); curl_setopt($ch, CURLOPT_COOKIE, "TIEBAPB=OLDPB"); $html = curl_exec($ch); curl_close($ch); if (!empty($html)) { return $html; } sleep(1); } }
我也是记不住正则表达式的。。
要好好加强啊 哎