导航：首页 > 开发技术 >

怎么利用nodejs爬取并下载一万多张图片

发表于：2024-11-23 作者：千家信息网编辑

千家信息网最后更新 2024年11月23日，这篇文章主要为大家展示了"怎么利用nodejs爬取并下载一万多张图片"，内容简而易懂，条理清晰，希望能够帮助大家解决疑惑，下面让小编带领大家一起研究并学习一下"怎么利用nodejs爬取并下载一万多张图

千家信息网最后更新 2024年11月23日怎么利用nodejs爬取并下载一万多张图片

这篇文章主要为大家展示了"怎么利用nodejs爬取并下载一万多张图片"，内容简而易懂，条理清晰，希望能够帮助大家解决疑惑，下面让小编带领大家一起研究并学习一下"怎么利用nodejs爬取并下载一万多张图片"这篇文章吧。

爬取图片

首先初始化项目，并且安装 axios 和 cheerio

npm init -y && npm i axios cheerio

axios 用于爬取网页内容，cheerio 是服务端的 jquery api, 我们用它来获取 dom 中的图片地址；

const axios = require('axios')const cheerio = require('cheerio')function getImageUrl(target_url, containerEelment) {  let result_list = []  const res = await axios.get(target_url)  const html = res.data  const $ = cheerio.load(html)  const result_list = []  $(containerEelment).each((element) => {    result_list.push($(element).find('img').attr('src'))  })  return result_list}

这样就可以获取到页面中的图片 url 了。接下来需要根据 url 下载图片。

如何使用 nodejs 下载文件

方式一：使用内置模块 'https' 和 'fs'

使用 nodejs 下载文件可以使用内置包或第三方库完成。

GET 方法用于 HTTPS 来获取要下载的文件。 createWriteStream() 是一个用于创建可写流的方法，它只接收一个参数，即文件保存的位置。Pipe()是从可读流中读取数据并将其写入可写流的方法。

const fs = require('fs')const https = require('https')// URL of the imageconst url = 'GFG.jpeg'https.get(url, (res) => {  // Image will be stored at this path  const path = `${__dirname}/files/img.jpeg`  const filePath = fs.createWriteStream(path)  res.pipe(filePath)  filePath.on('finish', () => {    filePath.close()    console.log('Download Completed')  })})

方式二：DownloadHelper

npm install node-downloader-helper

下面是从网站下载图片的代码。一个对象 dl 是由类 DownloadHelper 创建的，它接收两个参数:

将要下载的图像。
下载后必须保存图像的路径。

File 变量包含将要下载的图像的 URL，filePath 变量包含将要保存文件的路径。

const { DownloaderHelper } = require('node-downloader-helper')// URL of the imageconst file = 'GFG.jpeg'// Path at which image will be downloadedconst filePath = `${__dirname}/files`const dl = new DownloaderHelper(file, filePath)dl.on('end', () => console.log('Download Completed'))dl.start()

方法三：使用 download

是 npm 大神 sindresorhus 写的，非常好用

npm install download

下面是从网站下载图片的代码。下载函数接收文件和文件路径。

const download = require('download')// Url of the imageconst file = 'GFG.jpeg'// Path at which image will get downloadedconst filePath = `${__dirname}/files`download(file, filePath).then(() => {  console.log('Download Completed')})

最终代码

本来想去爬百度壁纸，但是清晰度不太够，而且还有水印等，后来，群里有个小伙伴找到了一个 api，估计是某个手机 APP 上的高清壁纸，可以直接获得下载的 url，我就直接用了。

下面是完整代码

const download = require('download')const axios = require('axios')let headers = {  'User-Agent':    'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',}function sleep(time) {  return new Promise((reslove) => setTimeout(reslove, time))}async function load(skip = 0) {  const data = await axios    .get(      'http://service.picasso.adesk.com/v1/vertical/category/4e4d610cdf714d2966000000/vertical',      {        headers,        params: {          limit: 30, // 每页固定返回30条          skip: skip,          first: 0,          order: 'hot',        },      }    )    .then((res) => {      return res.data.res.vertical    })    .catch((err) => {      console.log(err)    })  await downloadFile(data)  await sleep(3000)  if (skip < 1000) {    load(skip + 30)  } else {    console.log('下载完成')  }}async function downloadFile(data) {  for (let index = 0; index < data.length; index++) {    const item = data[index]    // Path at which image will get downloaded    const filePath = `${__dirname}/美女`    await download(item.wp, filePath, {      filename: item.id + '.jpeg',      headers,    }).then(() => {      console.log(`Download ${item.id} Completed`)      return    })  }}load()

上面代码中先要设置 User-Agent 并且设置 3s 延迟，这样可以防止服务端阻止爬虫，直接返回 403。

直接 node index.js 就会自动下载图片了。

、