(31条消息) 微博爬虫之:无需账号获取微博weibo的Cookie

这里主要演示原理,不涉及具体的编程代码,工具:Postman,主要3个步骤:

 第1步(获取tid):

  1. 网址:https://passport.weibo.com/visitor/genvisitor
  2. 方式:POST
  3. 参数:
  4. cb:gen_callback(固定)
  5. fp:{"os":"1","browser":"Chrome70,0,3538,25","fonts":"undefined","screenInfo":"1920*1080*24","plugins":"Portable Document Format::internal-pdf-viewer::Chromium PDF Plugin|::mhjfbmdgcfjbbpaeojofohoefgiehjai::Chromium PDF Viewer|::gbkeegbaiigmenfmjfclcdgdpimamgkj::Google文档、表格及幻灯片的Office编辑扩展程序|::internal-nacl-plugin::Native Client"}(视浏览器真实值而定)

响应结果:

window.gen_callback && gen_callback({"retcode":20000000,"msg":"succ","data":{"tid":"t4vkYDYI5yHEIXBRL+VFdoXnXPqE9389EuMYk4HojIE=","new_tid":true}});

Postman截图:

第2步(获取sub和subp):

  1. 网址:https://passport.weibo.com/visitor/visitor
  2. 方式:GET
  3. 参数:
  4. a:incarnate(固定)
  5. t:UhIQHACePHlmNiYcsClsQk4FcWAJx8dnTtn7lSkeql8(即上面得到的tid)
  6. w:3(如果上面的new_tid为true,则此值为3,否则为2)
  7. c:100(如果上面的data中有此值则取此值,否则默认为100)
  8. cb:cross_domain(固定)
  9. from:weibo(固定)

响应结果:

window.cross_domain && cross_domain({"retcode":20000000,"msg":"succ","data":{"sub":"_2AkMr-VWef8NxqwJRmfoQzGvgbYh1yAvEieKdpaRFJRMxHRl-yT83qmMMtRB6AHl7cF8_VEgmhI22z4tOrHKOgCxqTZfs","subp":"0033WrSXqPxfM72-Ws9jqgMF55529P9D9W5bD_b5wVspSuGXLY-FIm9m"}});

Postman截图:

 

第3步(将sub和subp拼接组成Cookie,实现爬取数据):

  1. 网址:https://d.weibo.com/1087030002_2975_2017_0
  2. 方式:GET
  3. Headers参数:
  4. Cookie:SUB=_2AkMr-Uitf8NxqwJRmP4Vym7lZIt2wwDEieKdpbl2JRMxHRl-yT83qhAytRB6AHlmQiE0cGNJVvYskBmcaMuDeBtcMDoK; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9WW7Ds97Ql.cFbVqMIoBZMpe
  5. (SUB和SUBP有上一个接口得到)

Postman截图:

(0)

相关推荐