千家信息网

python3下multiprocessing、threading和gevent性能对比以及进程池、线程池和协程池性能对比

发表于:2024-11-25 作者:千家信息网编辑
千家信息网最后更新 2024年11月25日,python3下multiprocessing、threading和gevent性能对比以及进程池、线程池和协程池性能对比,很多新手对此不是很清楚,为了帮助大家解决这个难题,下面小编将为大家详细讲解,
千家信息网最后更新 2024年11月25日python3下multiprocessing、threading和gevent性能对比以及进程池、线程池和协程池性能对比

python3下multiprocessing、threading和gevent性能对比以及进程池、线程池和协程池性能对比,很多新手对此不是很清楚,为了帮助大家解决这个难题,下面小编将为大家详细讲解,有这方面需求的人可以来学习下,希望你能有所收获。

目前计算机程序一般会遇到两类I/O:硬盘I/O和网络I/O。我就针对网络I/O的场景分析下python3下进程、线程、协程效率的对比。进程采用multiprocessing.Pool进程池,线程是自己封装的进程池,协程采用gevent的库。用python3自带的urlllib.request和开源的requests做对比。代码如下:

import urllib.requestimport requestsimport timeimport multiprocessingimport threadingimport queuedef startTimer():    return time.time()def ticT(startTime):    useTime = time.time() - startTime    return round(useTime, 3)#def tic(startTime, name):#    useTime = time.time() - startTime#    print('[%s] use time: %1.3f' % (name, useTime))def download_urllib(url):    req = urllib.request.Request(url,            headers={'user-agent': 'Mozilla/5.0'})    res = urllib.request.urlopen(req)    data = res.read()    try:        data = data.decode('gbk')    except UnicodeDecodeError:        data = data.decode('utf8', 'ignore')    return res.status, datadef download_requests(url):    req = requests.get(url,            headers={'user-agent': 'Mozilla/5.0'})    return req.status_code, req.textclass threadPoolManager:        def __init__(self,urls, workNum=10000,threadNum=20):                self.workQueue=queue.Queue()                self.threadPool=[]                self.__initWorkQueue(urls)                self.__initThreadPool(threadNum)        def __initWorkQueue(self,urls):                for i in urls:                        self.workQueue.put((download_requests,i))        def __initThreadPool(self,threadNum):                for i in range(threadNum):                        self.threadPool.append(work(self.workQueue))        def waitAllComplete(self):                for i in self.threadPool:                        if i.isAlive():                                i.join()class work(threading.Thread):        def __init__(self,workQueue):                threading.Thread.__init__(self)                self.workQueue=workQueue                self.start()        def run(self):                while True:                        if self.workQueue.qsize():                                do,args=self.workQueue.get(block=False)                                do(args)                                self.workQueue.task_done()                        else:                                breakurls = ['http://www.ustchacker.com'] * 10urllibL = []requestsL = []multiPool = []threadPool = []N = 20PoolNum = 100for i in range(N):    print('start %d try' % i)    urllibT = startTimer()    jobs = [download_urllib(url) for url in urls]    #for status, data in jobs:    #    print(status, data[:10])    #tic(urllibT, 'urllib.request')    urllibL.append(ticT(urllibT))    print('1')        requestsT = startTimer()    jobs = [download_requests(url) for url in urls]    #for status, data in jobs:    #    print(status, data[:10])    #tic(requestsT, 'requests')    requestsL.append(ticT(requestsT))    print('2')        requestsT = startTimer()    pool = multiprocessing.Pool(PoolNum)    data = pool.map(download_requests, urls)    pool.close()    pool.join()    multiPool.append(ticT(requestsT))    print('3')    requestsT = startTimer()    pool = threadPoolManager(urls, threadNum=PoolNum)    pool.waitAllComplete()    threadPool.append(ticT(requestsT))    print('4')import matplotlib.pyplot as pltx = list(range(1, N+1))plt.plot(x, urllibL, label='urllib')plt.plot(x, requestsL, label='requests')plt.plot(x, multiPool, label='requests MultiPool')plt.plot(x, threadPool, label='requests threadPool')plt.xlabel('test number')plt.ylabel('time(s)')plt.legend()plt.show()

运行结果如下:

从上图可以看出,python3自带的urllib.request效率还是不如开源的requests,multiprocessing进程池效率明显提升,但还低于自己封装的线程池,有一部分原因是创建、调度进程的开销比创建线程高(测试程序中我把创建的代价也包括在里面)。

下面是gevent的测试代码:

import urllib.requestimport requestsimport timeimport gevent.poolimport gevent.monkeygevent.monkey.patch_all()def startTimer():    return time.time()def ticT(startTime):    useTime = time.time() - startTime    return round(useTime, 3)#def tic(startTime, name):#    useTime = time.time() - startTime#    print('[%s] use time: %1.3f' % (name, useTime))def download_urllib(url):    req = urllib.request.Request(url,            headers={'user-agent': 'Mozilla/5.0'})    res = urllib.request.urlopen(req)    data = res.read()    try:        data = data.decode('gbk')    except UnicodeDecodeError:        data = data.decode('utf8', 'ignore')    return res.status, datadef download_requests(url):    req = requests.get(url,            headers={'user-agent': 'Mozilla/5.0'})    return req.status_code, req.texturls = ['http://www.ustchacker.com'] * 10urllibL = []requestsL = []reqPool = []reqSpawn = []N = 20PoolNum = 100for i in range(N):    print('start %d try' % i)    urllibT = startTimer()    jobs = [download_urllib(url) for url in urls]    #for status, data in jobs:    #    print(status, data[:10])    #tic(urllibT, 'urllib.request')    urllibL.append(ticT(urllibT))    print('1')        requestsT = startTimer()    jobs = [download_requests(url) for url in urls]    #for status, data in jobs:    #    print(status, data[:10])    #tic(requestsT, 'requests')    requestsL.append(ticT(requestsT))    print('2')        requestsT = startTimer()    pool = gevent.pool.Pool(PoolNum)    data = pool.map(download_requests, urls)    #for status, text in data:    #    print(status, text[:10])    #tic(requestsT, 'requests with gevent.pool')    reqPool.append(ticT(requestsT))    print('3')        requestsT = startTimer()    jobs = [gevent.spawn(download_requests, url) for url in urls]    gevent.joinall(jobs)    #for i in jobs:    #    print(i.value[0], i.value[1][:10])    #tic(requestsT, 'requests with gevent.spawn')    reqSpawn.append(ticT(requestsT))    print('4')    import matplotlib.pyplot as pltx = list(range(1, N+1))plt.plot(x, urllibL, label='urllib')plt.plot(x, requestsL, label='requests')plt.plot(x, reqPool, label='requests geventPool')plt.plot(x, reqSpawn, label='requests Spawn')plt.xlabel('test number')plt.ylabel('time(s)')plt.legend()plt.show()

运行结果如下:

从上图可以看到,对于I/O密集型任务,gevent还是能对性能做很大提升的,由于协程的创建、调度开销都比线程小的多,所以可以看到不论使用gevent的Spawn模式还是Pool模式,性能差距不大。

因为在gevent中需要使用monkey补丁,会提高gevent的性能,但会影响multiprocessing的运行,如果要同时使用,需要如下代码:

gevent.monkey.patch_all(thread=False, socket=False, select=False)

可是这样就不能充分发挥gevent的优势,所以不能把multiprocessing Pool、threading Pool、gevent Pool在一个程序中对比。不过比较两图可以得出结论,线程池和gevent的性能最优的,其次是进程池。附带得出个结论,requests库比urllib.request库性能要好一些哈:-)

看完上述内容是否对您有帮助呢?如果还想对相关知识有进一步的了解或阅读更多相关文章,请关注行业资讯频道,感谢您对的支持。

0