导航：首页 > 开发技术 >

Python爬虫的必备技巧有哪些

发表于：2025-02-19 作者：千家信息网编辑

千家信息网最后更新 2025年02月19日，这篇文章主要介绍"Python爬虫的必备技巧有哪些"，在日常操作中，相信很多人在Python爬虫的必备技巧有哪些问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答"Pytho

千家信息网最后更新 2025年02月19日Python爬虫的必备技巧有哪些

这篇文章主要介绍"Python爬虫的必备技巧有哪些"，在日常操作中，相信很多人在Python爬虫的必备技巧有哪些问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答"Python爬虫的必备技巧有哪些"的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

自定义函数

import requestsfrom bs4 import BeautifulSoupheaders={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0'}def baidu(company):    url = 'https://www.baidu.com/s?rtt=4&tn=news&word=' + company    print(url)    html = requests.get(url, headers=headers).text    s = BeautifulSoup(html, 'html.parser')    title=s.select('.news-title_1YtI1 a')    for i in title:        print(i.text)# 批量调用函数companies = ['腾讯', '阿里巴巴', '百度集团']for i in companies:    baidu(i)

批量输出多个搜索结果的标题

结果保存为文本文件

import requestsfrom bs4 import BeautifulSoupheaders={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0'}def baidu(company):    url = 'https://www.baidu.com/s?rtt=4&tn=news&word=' + company    print(url)    html = requests.get(url, headers=headers).text    s = BeautifulSoup(html, 'html.parser')    title=s.select('.news-title_1YtI1 a')    fl=open('test.text','a', encoding='utf-8')    for i in title:        fl.write(i.text + '\n')# 批量调用函数companies = ['腾讯', '阿里巴巴', '百度集团']for i in companies:    baidu(i)

写入代码

fl=open('test.text','a', encoding='utf-8')for i in title:        fl.write(i.text + '\n')

异常处理

for i in companies:    try:        baidu(i)        print('运行成功')    except:        print('运行失败')

写在循环中不会让程序停止运行而会输出运行失败

休眠时间

import timefor i in companies:    try:        baidu(i)        print('运行成功')    except:        print('运行失败')time.sleep(5)

time.sleep(5)

括号里的单位是秒

放在什么位置则在什么位置休眠（暂停）

爬取多页内容

百度搜索腾讯

切换到第二页

去掉多多余的

https://www.baidu.com/s?wd=腾讯&pn=10

分析出

https://www.baidu.com/s?wd=腾讯&pn=0 为第一页
https://www.baidu.com/s?wd=腾讯&pn=10 为第二页
https://www.baidu.com/s?wd=腾讯&pn=20 为第三页
https://www.baidu.com/s?wd=腾讯&pn=30 为第四页
..........

代码

from bs4 import BeautifulSoupimport timeheaders={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0'}def baidu(c):    url = 'https://www.baidu.com/s?wd=腾讯&pn=' + str(c)+'0'    print(url)    html = requests.get(url, headers=headers).text    s = BeautifulSoup(html, 'html.parser')    title=s.select('.t a')    for i in title:        print(i.text)for i in range(10):    baidu(i)    time.sleep(2)

到此，关于"Python爬虫的必备技巧有哪些"的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注网站，小编会继续努力为大家带来更多实用的文章！

很赞哦！