千家信息网

怎么用Python抓取百度地图里的店名信息

发表于:2024-09-21 作者:千家信息网编辑
千家信息网最后更新 2024年09月21日,本篇内容主要讲解"怎么用Python抓取百度地图里的店名信息",感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习"怎么用Python抓取百度地图里的店名信息"吧!
千家信息网最后更新 2024年09月21日怎么用Python抓取百度地图里的店名信息

本篇内容主要讲解"怎么用Python抓取百度地图里的店名信息",感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习"怎么用Python抓取百度地图里的店名信息"吧!

代码如下:

import requests
import re
import csv
import time
def BusinessFromBaiduDitu(citycode = '287',key_word='筛网',pageno=0):
parameter = {
"newmap": "1",
"reqflag": "pcmap",
"biz": "1",
"from": "webmap",
"da_par": "direct",
"pcevaname": "pc4.1",
"qt": "con",
"c": citycode, # 城市代码 "wd": key_word, # 搜索关键词 "wd2": "",
"pn": pageno, # 页数 "nn": pageno * 10,
"db": "0",
"sug": "0",
"addr": "0",
"da_src": "pcmappg.poi.page",
"on_gel": "1",
"src": "7",
"gr": "3",
"l": "12",
"tn": "B_NORMAL_MAP", # "u_loc": "12621219.536556,2630747.285024", "ie": "utf-8", # "b": "(11845157.18,3047692.2;11922085.18,3073932.2)", #这个应该是地理位置坐标,可以忽略 "t": "1468896652886"} headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36(KHTML, like Gecko) Chrome/56.0.2924.87Safari/537.36'} url = 'http://map.baidu.com/'
htm = requests.get(url, params=parameter, headers=headers) htm = htm.text.encode('latin-1').decode('unicode_escape') # 转码 pattern = r'(?<=\baddress_norm":"\[).+?(?="ty":)' htm = re.findall(pattern, htm) # 按段落匹配 for r in htm: pattern = r'(?<=\b"\},"name":").+?(?=")' name = re.findall(pattern, r) #if not name: pattern = r'(?<=\b,"name":").+?(?=")' name = re.findall(pattern, r) #print(name[0]) # 名称 pattern = r'.+?(?=")' adr = re.findall(pattern, r) pattern = r'\(.+?\[' address = re.sub(pattern, ' ', adr[0]) pattern = r'\(.+?\]' address = re.sub(pattern, ' ', address) #print(address) # 地址 pattern = r'(?<="phone":").+?(?=")' phone = re.findall(pattern, r)
try:
if phone[0] and '",' != phone[0]: phone_list = phone[0].split(sep=',')
for number in phone_list:
if re.match('1', number): print(citycode+name[0]+','+address+','+number) writer.writerow((name[0], address, number))
except:
continue print(citycode + ' ' + key_word + ' ' + str(pageno))

现在开始写我搜"丝网""筛网"(key_word)的代码获取想要的数据,也要改城市代码(citycode)城市代码文件链接

#citynumlist是百度地图城市代码列表citynumlist = ['33','34','35'          '''''''''''''''''             '370','371','372']keywordlist = ['丝网','筛网']
start = time.time()
num = 1

#建立csv文件,保存数据csvFile = open(r'/Users/apple888/PycharmProjects/百度地图/Data/%s.csv' % 'CityData','a+', newline='', encoding='utf-8')writer = csv.writer(csvFile)writer.writerow(('name', 'address', 'number'))
for citycode in citynumlist:
for kw in keywordlist:
for page in range(10): BusinessFromBaiduDitu(citycode=citycode, key_word=kw, pageno=page) #防止访问频率太高,避免被百度公司封
time.sleep(1)
if num%20 == 0:
time.sleep(2)
if num0== 0:
time.sleep(3)
if num%200==0:
time.sleep(7)
num = num + 1

end = time.time()lasttime = int((end-start))print('耗时'+str(lasttime)+'s')

程序运行了大约三个小时,抓取了1085条有用信息信息

到此,相信大家对"怎么用Python抓取百度地图里的店名信息"有了更深的了解,不妨来实际操作一番吧!这里是网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!

0