调用博主最近登录时间
生活中的HYGGE
selenium爬虫如何防止被浏览器特征抓取反爬

selenium爬虫如何防止被浏览器特征抓取反爬

hygge
2023-01-29 / 0 评论 / 869 阅读 / 正在检测是否收录...

前言

ldh0k2o3.png

爬网站的时候遇到了cf拦截,根据百度到的尝试添加参数还是无法跳过

service = Service('msedgedriver.exe')
options = Options()
# 开启开发者模式
options.add_experimental_option('excludeSwitches', ['enable-automation'])
# 禁用Blink运行时功能
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Edge(service=service)

undetected-chromedriver

Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.

  • Tested until current chrome beta versions
  • Works also on Brave Browser and many other Chromium based browsers, some tweaking
  • Python 3.6++**

我主要使用的Edge,介绍说会自动下载Chrome,并没有体验到,于是自己安装了Chrome浏览器

代码跟之前selenium的相差不大,成功解决了问题,再没出现过Cf拦截

from pyquery import PyQuery as pq
import re
import time
from undetected_chromedriver import ChromeOptions
import undetected_chromedriver as uc

options = ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
driver = uc.Chrome(options=options)


driver.get('http://...')
html_source = driver.page_source
doc = pq(html_source)
titles = doc.find('tag')

ldh0k9ir.png

引用

1.ultrafunkamsterdam/undetected-chromedriver:https://github.com/ultrafunkamsterdam/undetected-chromedriver

2.Chrome Headless Detection (Round II):https://intoli.com/blog/not-possible-to-block-chrome-headless/chrome-headless-test.html

3.selenium爬虫如何防止被浏览器特征抓取反爬,undetected_chromedriver他来了。:https://blog.csdn.net/wywinstonwy/article/details/118479162

0

评论 (0)

取消