Python3爬虫之模拟post登陆及get登陆

一、模拟登陆需要账号,密码的网址

一些不需要登陆的网址操作已经试过了,这次来用Python尝试需要登陆的网址,来利用cookie模拟登陆

由于我们教务系统有验证码偏困难一点,故挑了个软柿子捏,赛氪,https://www.saikr.com

我用的是火狐浏览器自带的F12开发者工具,打开网址输入账号,密码,登陆,如图

可以看到捕捉到很多post和get请求,第一个post请求就是我们提交账号和密码的,

点击post请求的参数选项可以看到我们提交的参数在bian表单数据里,name为账户名,pass为加密后的密码,remember为是否记住密码,0为不记住密码。

我们再来看看headers,即消息头

我们把这些请求头加到post请求的headers后对网页进行模拟登陆,

Cookie为必填项,否则会报错:

{"code":403,"message":"访问超时,请重试,多次出现此提示请联系QQ:1409765583","data":[]}

便可以创建一个带有cookie的opener,在第一次访问登录的URL时,将登录后的cookie保存下来,然后利用带有这个cookie的opener来访问该网址的其他版块,查看登录之后才能看到的信息。

比如我是登陆https://www.saikr.com/login后模拟登陆了“我的竞赛”版块https://www.saikr.com/u/5598522

代码如下:

  1. import urllib
  2. from urllib import request
  3. from http import cookiejar
  4. login_url = "https://www.saikr.com/login"
  5. postdata ={
  6. "name": "your account","pass": "your password(加密后)"
  7. }
  8. header = {
  9. "Accept":"application/json, text/javascript, */*; q=0.01",
  10. "Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
  11. "Connection":"keep-alive",
  12. "Host":"www.saikr.com",
  13. "Referer":"https://www.saikr.com/login",
  14. "Cookie":"your cookie",
  15. "Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
  16. "TE":"Trailers","X-Requested-With":"XMLHttpRequest"
  17. }
  18. postdata = urllib.parse.urlencode(postdata).encode('utf8')
  19. #req = requests.post(url,postdata,header)
  20. #声明一个CookieJar对象实例来保存cookie
  21. cookie = cookiejar.CookieJar()
  22. #利用urllib.request库的HTTPCookieProcessor对象来创建cookie处理器,也就CookieHandler
  23. cookie_support = request.HTTPCookieProcessor(cookie)
  24. #通过CookieHandler创建opener
  25. opener = request.build_opener(cookie_support)
  26. #创建Request对象
  27. my_url="https://www.saikr.com/u/5598522"
  28. req1 = request.Request(url=login_url, data=postdata, headers=header)#post请求
  29. req2 = request.Request(url=my_url)#利用构造的opener不需要cookie即可登陆,get请求
  30. response1 = opener.open(req1)
  31. response2 = opener.open(req2)
  32. print(response1.read().decode('utf8'))
  33. print(response2.read().decode('utf8'))

到此就告一段落了: 

ps:有点小插曲,当在headers里加入

Accept-Encoding

gzip, deflate, br

时,最后在 print(response1.read().decode('utf8'))时便会报错

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

原因:在请求header中设置了'Accept-Encoding': 'gzip, deflate'

参考链接:https://www.cnblogs.com/chyu/p/4558782.html

解决方法:去掉Accept-Encoding后就正常了

二、模拟登陆网址常用方法总结

1.通过urllib库的request库的函数进行请求

  1. from urllib import request
  2. #get请求
  3. ------------------------------------------------------
  4. #不加headers
  5. response=request.urlopen(url)
  6. page_source = response.read().decode('utf-8')

  7. #加headers,由于urllib.request.urlopen() 函数不接受headers参数,所以需要构建一个urllib.request.Request对象来实现请求头的设置
  8. req= request.Request(url=url,headers=headers)
  9. response=request.urlopen(req)
  10. page_source = response.read().decode('utf-8')

  11. #post请求
  12. -------------------------------------------------------
  13. postdata = urllib.parse.urlencode(data).encode('utf-8')#必须进行重编码
  14. req= request.Request(url=url,data=postdata,headers=headers)
  15. response=request.urlopen(req)
  16. page_source = response.read().decode('utf-8')
  17. #使用cookie访问其他版块
  18. #声明一个CookieJar对象实例来保存cookie
  19. cookie = cookiejar.CookieJar()
  20. #利用urllib.request库的HTTPCookieProcessor对象来创建cookie处理器,也就CookieHandler
  21. cookie_support = request.HTTPCookieProcessor(cookie)
  22. #通过CookieHandler创建opener
  23. opener = request.build_opener(cookie_support)
  24. # 将Opener安装位全局,覆盖urlopen函数,也可以临时使用opener.open()函数
  25. #urllib.request.install_opener(opener)
  26. #创建Request对象
  27. my_url="https://www.saikr.com/u/5598522"
  28. req2 = request.Request(url=my_url)
  29. response1 = opener.open(req1)
  30. response2 = opener.open(req2)
  31. #或者直接response2=opener.open(my_url)
  32. print(response1.read().decode('utf8'))
  33. print(response2.read().decode('utf8'))

 

2.通过requests库的get和post函数

  1. import requests
  2. import urllib
  3. import json
  4. #get请求
  5. -----------------------------------------------------------
  6. #method1
  7. url="https://www.saikr.com/"
  8. params={ 'key1': 'value1','key2': 'value2' }
  9. real_url = base_url + urllib.parse.urlencode(params)
  10. #real_url="https://www.saikr.com/key1=value1&key2=value2"
  11. response=requests.get(real_url)
  12. #method2
  13. response=requests.get(url,params)
  14. print(response.text)#<class 'str'>
  15. print(response.content)# <class 'bytes'>

  16. #post请求
  17. login_url = "https://www.saikr.com/login"
  18. postdata ={
  19. "name": "1324802616@qq.com","pass": "my password",
  20. }
  21. header = {
  22. "Accept":"application/json, text/javascript, */*; q=0.01",
  23. "Accept-Language":"zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
  24. "Connection":"keep-alive",
  25. "Host":"www.saikr.com",
  26. "Referer":"https://www.saikr.com/login",
  27. "Cookie":"mycookie",
  28. "Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
  29. "TE":"Trailers","X-Requested-With":"XMLHttpRequest"
  30. }
  31. #requests中的post中传入的data可以不进行重编码
  32. #login_postdata = urllib.parse.urlencode(postdata).encode('utf8')
  33. response=requests.post(url=login_url,data=postdata,headers=header)#<class 'requests.models.Response'>
  34. #以下三种都可以解析结果
  35. json1 = response1.json()#<class 'dict'>
  36. json2= json.loads(response1.text)#<class 'dict'>
  37. json_str = response2.content.decode('utf-8')#<class 'str'>

  38. #利用session维持会话访问其他版块
  39. --------------------------------------------------------------------
  40. login_url = "https://www.saikr.com/login"
  41. postdata ={
  42. "name": "1324802616@qq.com","pass": "my password",
  43. }
  44. header = {
  45. "Accept":"application/json, text/javascript, */*; q=0.01",
  46. "Connection":"keep-alive",
  47. "Referer":"https://www.saikr.com/login",
  48. "Cookie":"mycookie",
  49. }
  50. session = requests.session()
  51. response = session.post(url=url, data=data, headers=headers)
  52. my_url="https://www.saikr.com/u/5598522"
  53. response1 = session.get(url=my_url, headers=headers)
  54. print(response1.json())

 

(0)

相关推荐