Posted 2018-09-27Updated 2019-01-1412 minutes read (About 1777 words)

instagram 内容抓取

1、需要登录信息，即抓取时需要附带cookie，同时需要user-agent。

2、数据获取接口及下载均有频率限制，无间隔的请求（几百个资源）会被限制，在被限制后睡眠一定时间继续。

3、内容抓取分为两个入口

两种入口附带的cookie不同，请求的URL不同。

4、抓取步骤：

Posted 2018-09-26Updated 2019-01-1410 minutes read (About 1453 words)

coub.com 内容抓取

1、总共17个分类。

2、数据获取

url：https://coub.com/api/v2/timeline/hot/movies/half?per_page=25
说明：movies 为分类。 per_page 为每页返回的数据量[1,25]。首次获取只需传入 page=1 即为第一页的数据。下次请求附带字段 anchor 为上次请求返回的 next

Posted 2018-09-25Updated 2019-01-148 minutes read (About 1193 words)

1、总共52个分类。

2、数据获取

url：https://9gag.com/v1/group-posts/group/cute/type/hot?c=10
说明：cute 为分类。首次获取只需传入 c=10 即为前十条数据。下次请求附带上次请求返回的 nextCursor 参数即可。每次请求返回10条数据。

3、每个资源的属性：

Posted 2016-08-02Updated 2018-12-102 minutes read (About 321 words)

use

target url：