根据URL列表批量完整网页截图

发表评论
790

A+

命令行版
Python版

命令行版

命令行调用浏览器内置的headless截屏，但是不能动态获取页面高度，只能手动控制，设置window-size大小，适合页面高度差别不大的页面截图。新建记事本，粘贴脚本，修改后缀为bat,双击运行即可。

"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --enable-logging --headless --disable-gpu --screenshot=d:\chrome7.jpg --hide-scrollbars --window-size=1920,5000 https://www.abc.com

Python版

采用Hopetree方法，利用Python的selenium库，结合PhantomJS实现批量截屏。

首先安装Python，pip，selenium和PhantomJS。没有安装过的，去Python官网下载安装包，目前最新Python3.8可以在安装时选择安装pip，安装成功后，pip install selenium命令安装selenium。

然后去PhantomJS官网下载安装PhantomJS解析器，也可以从Github或者网盘镜像下载。下载完成之后解压一下，将Bin目录下的phantomjs.exe放到Python的安装目录的Scripts文件夹内。

如果PhantomJS失效或者截图失败，换用ChromeDriver，按照官方教程安装后，同样把chromedriver.exe放到Python的安装目录的Scripts文件夹内。

最后，运行Screenshot.py即可。

文件目录：

* folder\pics\
* folder\ghostdriver.log
* folder\Screenshot.py
* folder\urls.txt

其中，urls.txt用来放置需要批量截图的 URL，格式为文件名英文逗号网址，如：

20,http://jandan.net/ooxx/page-20
21,http://jandan.net/ooxx/page-21
22,http://jandan.net/ooxx/page-22
23,http://jandan.net/ooxx/page-23

Python脚本文件源码为：

# -*- coding: utf-8 -*-

from selenium import webdriver
import time
import os.path
import multiprocessing as mp



def readtxt():
    '''读取txt文件，返回一个列表，每个元素都是一个元组;文件的格式是图片保存的名称加英文逗号加网页地址'''
    with open('urls.txt','r') as f:
        lines = f.readlines()
    urls = []
    for line in lines:
        try:
            thelist = line.strip().split(",")
            if len(thelist) == 2 and thelist[0] and thelist[1]:
                urls.append((thelist[0],thelist[1]))
        except:
            pass
    return urls

def get_dir():
    '''判断文件夹是否存在，如果不存在就创建一个'''
    filename = "pics"
    if not os.path.isdir(filename):
        os.makedirs(filename)
    return filename

def webshot(tup):
    driver = webdriver.PhantomJS()
    driver.maximize_window()
    # 返回网页的高度的js代码
    js_height = "return document.body.clientHeight"
    picname = str(tup[0])
    link = tup[1]
    try:
        driver.get(link)
        k = 1
        height = driver.execute_script(js_height)
        while True:
            if k*500 < height:
                js_move = "window.scrollTo(0,{})".format(k * 500)
                driver.execute_script(js_move)
                time.sleep(0.2)
                height = driver.execute_script(js_height)
                k += 1
            else:
                break
        driver.save_screenshot('pics'+"\\"+picname+'.png')
        print("Process {} get one pic !!!".format(os.getpid()))
        time.sleep(0.1)
    except Exception as e:
        print(picname,e)

if __name__ == '__main__':
    t = time.time()
    get_dir()
    urls = readtxt()
    pool = mp.Pool()
    pool.map_async(func=webshot,iterable=urls)
    pool.close()
    pool.join()
    print("操作结束，耗时：{:.2f}秒".format(float(time.time()-t)))

压缩包存档

batch-screenshot.zip

独角兽驿站

公众号

根据URL列表批量完整网页截图

命令行版

Python版

发表评论取消回复

微信