2023-06-25 09:52:50 來(lái)源 : 博客園
requests 模塊是寫python腳本使用頻率最高的模塊之一。很多人寫python第一個(gè)使用的模塊就是requests,因?yàn)樗梢宰鼍W(wǎng)絡(luò)爬蟲(chóng)。不僅寫爬蟲(chóng)方便,在日常的開(kāi)發(fā)中更是少不了requests的使用。如調(diào)用后端接口,上傳文件,查詢數(shù)據(jù)庫(kù)等。本篇詳細(xì)介紹requests的使用。requests 是?Python編寫的第三方庫(kù),它基于python自帶網(wǎng)絡(luò)庫(kù)urllib3封裝完成。采?Apache2 Licensed開(kāi)源協(xié)議的 HTTP 庫(kù)。它? urllib3 更加?便,可以節(jié)約使用者?量的時(shí)間。
下面從如下6個(gè)方面,全面講解requests模塊
(資料圖)
不需要看完全篇內(nèi)容,直接跳轉(zhuǎn)到需要查找的功能上傳文件: 請(qǐng)求參數(shù)->files使用認(rèn)證接口調(diào)用:請(qǐng)求參數(shù)->header使用json接口調(diào)用:請(qǐng)求參數(shù)->json使用form表單接口調(diào)用:請(qǐng)求參數(shù)->data使用
requests初識(shí)requests 是一個(gè)第三方庫(kù),使用之前需要安裝。安裝命令如下:
pip3 install requests -i https://pypi.tuna.tsinghua.edu.cn/simple
最簡(jiǎn)單請(qǐng)求,發(fā)送一個(gè)get請(qǐng)求,獲得返回值。
import requestsres = requests.get("http://www.baidu.com")print(res)>>>
從如上代碼可以看出,使用requets發(fā)送請(qǐng)求只需要一行代碼就可以搞定,是非常簡(jiǎn)單的事情。而requests的設(shè)計(jì)理念就是 **Requests** is an elegant and simple HTTP library for Python, built for human beings.
意思就是:requests是一個(gè)優(yōu)雅而簡(jiǎn)單的 Python HTTP 庫(kù),它是為人類構(gòu)建的。由于不同版本之間參數(shù)和功能略有差異,所以說(shuō)明本文使用的requests版本是 2.31.0
requests支持大部分的HTTP請(qǐng)求方法。具體如下:
關(guān)于每一個(gè)請(qǐng)求方法的使用下面一一列舉出來(lái)。以下示例基于本地啟動(dòng)的后端服務(wù),嘗試跑示例請(qǐng)更換url。
get請(qǐng)求獲取記錄import requestsurl = "http://127.0.0.1:8090/demos"res = requests.get(url)print(res.json()) # 返回json反序列化之后的字典對(duì)象>>>{"result": [{"age": 0, "create_at": "Mon, 29 May 2023 22:05:40 GMT", "id": 2, "name": "string", "status": 0, "update_at": "Mon, 29 May 2023 22:05:40 GMT", "user_id": 0}, {"age": 100, "create_at": "Sun, 11 Jun 2023 10:38:28 GMT", "id": 3, "name": "ljk", "status": 0, "update_at": "Sun, 11 Jun 2023 10:38:28 GMT", "user_id": 223}], "total": 2}
post請(qǐng)求創(chuàng)建記錄import requestsurl = "http://127.0.0.1:8090/demo"payload = { "age": 18, "desc": "post_demo", "name": "post_method", "user_id": 102}# body體會(huì)自動(dòng)json序列化res = requests.post(url, json=payload)print(res.json())>>>{"age": 18, "create_at": "Sun, 11 Jun 2023 16:14:40 GMT", "id": 4, "name": "post_method", "status": 0, "update_at": "Sun, 11 Jun 2023 16:14:40 GMT", "user_id": 102}
put請(qǐng)求更新記錄import requestsurl = "http://127.0.0.1:8090/demo/4"payload = { "age": 20, "user_id": 1001}res = requests.put(url, json=payload)print(res.json())>>>{"msg": "success"}
delete請(qǐng)求刪除記錄import requestsurl = "http://127.0.0.1:8090/demo/4"res = requests.delete(url)print(res.json())>>>{"msg": "success"}
head請(qǐng)求獲取headerimport requestsurl = "http://127.0.0.1:8090/demos"res = requests.head(url)print(res.ok)print(res.headers)>>>ok{"Server": "Werkzeug/2.3.6 Python/3.9.6", "Date": "Sat, 17 Jun 2023 06:34:44 GMT", "Content-Type": "application/json", "Content-Length": "702", "Connection": "close"}
從返回結(jié)果的headers中可以找到返回的數(shù)據(jù)類型 "Content-Type": "application/json",這說(shuō)明返回的數(shù)據(jù)是json編碼格式的,所以需要json反序列化之后才能使用。
patch請(qǐng)求更新部分?jǐn)?shù)據(jù)import requestsurl = "http://127.0.0.1:8090/demo/4"payload = { "age": 200}res = requests.patch(url, json=payload)print(res.json())"""{"msg": "success"}"""
options請(qǐng)求查看接口要求import requestsurl = "http://127.0.0.1:8090/demo/4"headers={ "Access-Control-Request-Method": "GET", "Origin": "*", "Access-Control-Request-Headers": "Authorization", }res = requests.options(url, headers=headers)print(res.ok)print(res.headers)>>>True{"Server": "Werkzeug/2.3.6 Python/3.9.6", "Date": "Sat, 17 Jun 2023 06:38:21 GMT", "Content-Type": "text/html; charset=utf-8", "Allow": "OPTIONS, DELETE, PUT, PATCH, HEAD, GET", "Content-Length": "0", "Connection": "close"}
從返回的headers中可以看到,該接口允許的請(qǐng)求包括:"Allow": "OPTIONS, DELETE, PUT, PATCH, HEAD, GET",所以該接口可以使用允許的方法去訪問(wèn)。相反沒(méi)有允許的方法是無(wú)法訪問(wèn)的該接口的。
請(qǐng)求參數(shù)request 請(qǐng)求的函數(shù)簽名如下,可以看出requests支持非常多的參數(shù)。截止當(dāng)前版本2.31.0一共16個(gè)參數(shù)。
def request( self, method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None, json=None, ):
參數(shù)說(shuō)明:
params 使用示例功能:拼接請(qǐng)求url在get請(qǐng)求中如果攜帶查詢參數(shù)如分頁(yè)查詢
http://127.0.0.1:8090/demos?offset=10&limint=10
查詢部分的參數(shù)有兩種寫法,第一是直接拼接成如上的url,另一種寫法是使用params參數(shù)。將查詢的參數(shù)定義為字典,傳入到params中。
url = "http://127.0.0.1:8090/demos"res = requests.get(url, params={"offset": 1, "limit": 10})print(res.json()) print(res.url) # 返回請(qǐng)求的url>>>{"result": [{"age": 200, "create_at": "Sun, 11 Jun 2023 10:38:28 GMT", "id": 3, "name": "ljk", "status": 0, "update_at": "Sun, 11 Jun 2023 10:38:28 GMT", "user_id": 1002}], "total": 2}http://127.0.0.1:8090/demos?offset=1&limit=10
請(qǐng)求返回對(duì)象有一個(gè)url屬性,可以展示請(qǐng)求的方法。可以看到params將傳入的字典追加到url當(dāng)中。
data 使用示例功能:保存請(qǐng)求body體、上傳文件使用data發(fā)送一個(gè)body是json格式的請(qǐng)求,首先設(shè)置header中數(shù)據(jù)格式為json,然后使用json序列化body。
import jsonimport requestsurl = "http://127.0.0.1:8090/demo"payload = { "age": 18, "desc": "post_demo", "name": "post_method", "user_id": 102}headers = {"Content-Type": "application/json"}res = requests.post(url, data=json.dumps(payload), headers=headers)print(res.json())
知識(shí)加油站
:
Content-Type字段:header 頭部信息中有一個(gè) Content-Type 字段,該字段用于客戶端告訴服務(wù)器實(shí)際發(fā)送的數(shù)據(jù)類型,比如發(fā)送的數(shù)據(jù)可以是文件、純文本、json字符串、表單等。在requests中常用的數(shù)據(jù)類型有5種:
application/x-www-form-urlencoded:form表單數(shù)據(jù)被編碼為key/value格式發(fā)送到服務(wù)器。請(qǐng)求默認(rèn)格式multipart/form-data:不僅可以傳輸參數(shù),還可以傳輸文件text/xml : XML格式。發(fā)送的數(shù)據(jù)必須是xml格式application/json:json 格式。發(fā)送的數(shù)據(jù)必須是json格式text/plain :純文本格式form-data 提交數(shù)據(jù)的接口某些接口需要發(fā)送multipart/form-data類型的數(shù)據(jù),有兩種方法:
手動(dòng)組建form-data并修改headers通過(guò)files參數(shù)傳遞form-data,推薦此種方式手動(dòng)組建form-data
import requestsurl = "http://www.demo.com/"payload = """------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"phone\"\n\n{}\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"idnum\"\n\n{}\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"name\"\r\n\r\n{}\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW\r\nContent-Disposition: form-data; name=\"products\"\r\n\r\n {}\r\n------WebKitFormBoundary7MA4YWxkTrZu0gW--""".format(12374658756, 23, "demo", [201,])headers = { "content-type": "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW"}resp = requests.post(url, data=payload, verify=False, headers=headers)
通過(guò)files傳遞
import requestsfiles = { "schoolId": (None, -1), "schoolName": (None, ""), "reward": (None, 5), "publishText": (None, "測(cè)試測(cè)試"), "tags": (None, 1), "image": ("image.jpg", open("%s/resource/upload_images/image.jpg" % PATH_DIR, "rb"), "application/octet-stream")}response = requests.post(url, files=files)
json 使用示例功能:保存body體并json序列化后端接口接受json格式的數(shù)據(jù),除了使用json.dumps序列化body之后,使用json參數(shù)是更方便的選擇。json參數(shù)會(huì)自動(dòng)將傳入的字典序列化并添加json格式的頭信息。
import requestsurl = "http://127.0.0.1:8090/demo"payload = { "age": 18, "desc": "post_demo", "name": "post_method", "user_id": 102}res = requests.post(url, json=payload)print(res.json())
header 使用示例功能:保存header信息,可用于偽裝瀏覽器,攜帶認(rèn)證信息等公共接口為了反爬蟲(chóng)都會(huì)校驗(yàn)請(qǐng)求頭里的信息,非瀏覽器的請(qǐng)求會(huì)被拒絕。使用特定的headers信息即可將腳本偽裝成瀏覽器。接口中通常需要校驗(yàn)認(rèn)證信息,需要攜帶token發(fā)起請(qǐng)求,token就需要再headers中指定。
import requestsurl = "http://127.0.0.1:8090/demo"headers = { "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36" "mtk": "xxxxx"}res = requests.get(url, headers=headers)print(res.json())
files 使用示例功能:上傳文件上傳文件首先打開(kāi)一個(gè)文件獲得文件句柄,然后傳入files中??梢陨蟼饕粋€(gè)或多個(gè)文件。建議使用二進(jìn)制的方式讀取文件,requests 可能會(huì)為你提供 header 中的 Content-Length。
import requestsurl = "http://127.0.0.1:8090/demo"filea = open("a.txt", "rb")fileb = open("b.txt", "rb")res = requests.post(url, files={"file_a": filea, "file_b": fileb})print(res.json())
timeout 使用示例功能:指定請(qǐng)求的超時(shí)時(shí)間超時(shí)可分為連接超時(shí)和讀取超時(shí)分別設(shè)置連接超時(shí)和讀取超時(shí),timeout=(連接超時(shí)時(shí)間, 讀取超時(shí)時(shí)間)統(tǒng)一設(shè)置連接超時(shí)和讀取超時(shí), timeout=超時(shí)時(shí)間
url = "http://127.0.0.1:8090/demo/10"res = requests.get(url, timeout=(3, 10))print(res.json())
hooks 使用示例功能:添加鉤子函數(shù)Hooks即鉤子方法,用于在某個(gè)流程執(zhí)行時(shí)捎帶執(zhí)行另一個(gè)自定義的方法。requests庫(kù)只支持一個(gè)response的鉤子,在響應(yīng)返回時(shí)可以捎帶執(zhí)行我們自定義的某些方法??梢杂糜诖蛴∫恍┬畔?,做一些響應(yīng)檢查或在響應(yīng)對(duì)象中添加額外的信息。
import requestsdef verify_res(res, *args, **kwargs): res.status = "PASS" if res.status_code == 200 else "FAIL" print(res.status)url = "http://www.baiu.com"response = requests.get(url, hooks={"response": verify_res})print("result_url " + response.url)
除了為某一個(gè)請(qǐng)求自定義鉤子之外,還可以給所有請(qǐng)求都自定鉤子函數(shù)。
# 創(chuàng)建自定義請(qǐng)求對(duì)象時(shí),修改全局模塊拋出錯(cuò)誤異常seesion = requests.Session()def hook_func(): passhttp.hooks["response"] = [hook_func]session.get("xxx")
返回對(duì)象每一次請(qǐng)求都需要獲取詳細(xì)準(zhǔn)確的返回結(jié)果,requests請(qǐng)求返回的是一個(gè)response對(duì)象,該對(duì)象有豐富的屬性和方法。
content、text、json() 的區(qū)別content 返回是的二進(jìn)制的內(nèi)容,text返回是字符串格式的內(nèi)容,json()返回的是序列化的內(nèi)容。
import requestsurl = "http://127.0.0.1:8090/demo/5"res = requests.get(url)print(f"content類型 -> type: {type(res.content)}\n 內(nèi)容: {res.content}")print(f"text類型 -> type: {type(res.text)}\n 內(nèi)容: {res.text}")print(f"json()類型 -> type: {type(res.json())}\n 內(nèi)容: {res.json()}")>>>content類型 -> type: 內(nèi)容: b"{\n "age": 18,\n "id": 5,\n "name": "post_method",\n "status": 0,\n "user_id": 102\n}\n"text類型 -> type: 內(nèi)容: { "age": 18, "id": 5, "name": "post_method", "status": 0, "user_id": 102}json()類型 -> type: 內(nèi)容: {"age": 18, "id": 5, "name": "post_method", "status": 0, "user_id": 102}
從以上返回結(jié)果的類型可以清晰看出三者之間的不同。通常接口返回json格式的數(shù)據(jù)比較好處理。推薦使用:
確切知道接口返回的json格式的字符串,使用response.json()獲取結(jié)果不知道接口返回的數(shù)據(jù)格式,使用response.text獲取結(jié)果status_code 和 okstatus_code 是接口的標(biāo)準(zhǔn)響應(yīng)碼,ok 是表示一個(gè)請(qǐng)求是否正常。關(guān)于正常的定義可以參見(jiàn)ok函數(shù)的函數(shù)說(shuō)明。
@propertydef ok(self): """Returns True if :attr:`status_code` is less than 400, False if not.
import requestsurl = "http://127.0.0.1:8090/demo/5"res = requests.get(url)print(f"狀態(tài)碼:{res.status_code}, 是否ok: {res.ok}")url = "http://127.0.0.1:8090/demo/10"res = requests.get(url)print(f"狀態(tài)碼:{res.status_code}, 是否ok: {res.ok}")>>>狀態(tài)碼:200, 是否ok: True狀態(tài)碼:404, 是否ok: False
接口標(biāo)準(zhǔn)響應(yīng)碼:
信息響應(yīng) (100–199)成功響應(yīng) (200–299)重定向消息 (300–399)客戶端錯(cuò)誤響應(yīng) (400–499)服務(wù)端錯(cuò)誤響應(yīng) (500–599)reason 簡(jiǎn)要結(jié)果說(shuō)明reason 可以獲取請(qǐng)求的簡(jiǎn)單結(jié)果描述。200的結(jié)果是200,非200的結(jié)果都會(huì)有一個(gè)簡(jiǎn)潔的說(shuō)明。
import requestsurl = "http://127.0.0.1:8090/demo/5"res = requests.get(url)print(f"狀態(tài)碼:{res.status_code}, reason: {res.reason}")>>>狀態(tài)碼:404, reason: NOT FOUNDurl = "http://127.0.0.1:8090/demo/5"res = requests.get(url)print(f"狀態(tài)碼:{res.status_code}, reason: {res.reason}")>>>狀態(tài)碼:500, reason: INTERNAL SERVER ERROR
header 和 cookies 的展示在調(diào)用需要登陸的接口可能需要認(rèn)證之后的cookies和header中某些特殊字段,所以在請(qǐng)求返回中通過(guò)header和cookies拿到相應(yīng)的參數(shù)。
import requestsurl = "http://127.0.0.1:8090/demo/5"res = requests.get(url)print(f"header: {res.headers}")print(f"cookies: {res.cookies}")>>>header: {"Server": "Werkzeug/2.3.6 Python/3.9.6", "Date": "Tue, 13 Jun 2023 13:27:13 GMT", "Content-Type": "application/json", "Content-Length": "85", "Connection": "close"}cookies:
異常捕獲網(wǎng)絡(luò)請(qǐng)求通常會(huì)存在很多可能的錯(cuò)誤,特別是http請(qǐng)求還有復(fù)雜的后端接口。所以對(duì)于錯(cuò)誤信息的捕獲就特別重要,合理的捕獲異常信息可以極大的增強(qiáng)代碼的及健壯性。requests 提供了多種異常庫(kù),包括如下:
class RequestException(IOError): pass class InvalidJSONError(RequestException): pass class JSONDecodeError(InvalidJSONError, CompatJSONDecodeError): pass class HTTPError(RequestException): pass class ConnectionError(RequestException): pass class ProxyError(ConnectionError): pass class SSLError(ConnectionError): pass class Timeout(RequestException): pass class ConnectTimeout(ConnectionError, Timeout): pass class ReadTimeout(Timeout): pass class URLRequired(RequestException): pass class TooManyRedirects(RequestException): pass class MissingSchema(RequestException, ValueError): pass class InvalidSchema(RequestException, ValueError): pass class class InvalidURL(RequestException, ValueError): pass class InvalidHeader(RequestException, ValueError): pass class InvalidProxyURL(InvalidURL): pass class ChunkedEncodingError(RequestException): pass class ContentDecodingError(RequestException, BaseHTTPError): passclass StreamConsumedError(RequestException, TypeError): pass class RetryError(RequestException): pass class UnrewindableBodyError(RequestException): pass
挑選最常用的幾個(gè)異常加以說(shuō)明
未捕獲異常沒(méi)有捕獲異常,當(dāng)異常發(fā)生時(shí)最后會(huì)導(dǎo)致程序異常退出。
url = "http://127.0.0.1:8090/demo/10"res = requests.get(url)>>>Traceback (most recent call last): File "/Users/ljk/Documents/python_env/dev/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/Users/ljk/Documents/python_env/dev/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection raise err File "/Users/ljk/Documents/python_env/dev/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection sock.connect(sa)ConnectionRefusedError: [Errno 61] Connection refused
RequestExceptionRequestException 可以捕獲requests請(qǐng)求所有的異常,是最大顆粒度的異常。
import requestsurl = "http://127.0.0.1:8090/demo/10"try: res = requests.get(url)except requests.exceptions.RequestException as e: print("something error:") print(e)else: print(f"狀態(tài)碼:{res.status_code}, 是否ok: {res.ok}")finally: print("request end")>>>something error:HTTPConnectionPool(host="127.0.0.1", port=8090): Max retries exceeded with url: /demo/10 (Caused by NewConnectionError(": Failed to establish a new connection: [Errno 61] Connection refused"))request end
ConnectionErrorConnectionError 可以捕獲請(qǐng)求中網(wǎng)絡(luò)相關(guān)的錯(cuò)誤,如網(wǎng)絡(luò)不可達(dá),拒絕連接等。使用ConnectionError捕獲到拒絕連接的錯(cuò)誤。
import requestsurl = "http://127.0.0.1:8090/demo/10"try: res = requests.get(url, timeout=1)except requests.exceptions.ConnectionError as e: print("something error:") print(e)else: print(f"狀態(tài)碼:{res.status_code}, 是否ok: {res.ok}")finally: print("request end")>>>something error:HTTPConnectionPool(host="127.0.0.1", port=8090): Max retries exceeded with url: /demo/10 (Caused by NewConnectionError(": Failed to establish a new connection: [Errno 61] Connection refused"))request end
ConnectTimeout請(qǐng)求拒絕是對(duì)端服務(wù)器收到了請(qǐng)求但是拒絕連接,而ConnectTimeout是沒(méi)有和對(duì)端服務(wù)器建立連接而超時(shí)。
import requestsurl = "http://www.facebook.com"try: res = requests.get(url, timeout=10)except requests.exceptions.ConnectTimeout as e: print("something error:") import pdb pdb.set_trace() print(e)else: print(f"狀態(tài)碼:{res.status_code}, 是否ok: {res.ok}")finally: print("request end")>>>something error:HTTPConnectionPool(host="www.facebook.com", port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, "Connection to www.facebook.com timed out. (connect timeout=10)"))request end
ReadTimeoutReadTimeout 是和對(duì)端服務(wù)器建立了連接,接口返回時(shí)超時(shí)。在請(qǐng)求接口中睡眠10s,人為制造一個(gè)讀取超時(shí)。
class Demo(MethodView): @swag_from("./apidocs/get.yml") def get(self, demo_id): """獲取單個(gè)demo數(shù)據(jù)""" # 直接查詢數(shù)據(jù)庫(kù)也可,封裝成函數(shù)可以做一些緩存 import time time.sleep(5) demo = DemoTable.get_by_demo_id(demo_id) return json_response(data=demo.to_dict())
import requestsurl = "http://127.0.0.1:8090/demo/10"try: res = requests.get(url, timeout=1)except requests.exceptions.ReadTimeout as e: print("something error:") print(e)else: print(f"狀態(tài)碼:{res.status_code}, 是否ok: {res.ok}")finally: print("request end")>>>something error:HTTPConnectionPool(host="127.0.0.1", port=8090): Read timed out. (read timeout=1)request end
接口錯(cuò)誤的異常處理requests請(qǐng)求中所有的接口本身出錯(cuò)都不會(huì)拋出異常,比如接口404,500,502等都不會(huì)主動(dòng)拋出異常,而是通過(guò)異常狀態(tài)碼展示出來(lái)。
import requestsurl = "http://127.0.0.1:8090/demo/10"try: res = requests.get(url, timeout=10)except requests.exceptions.RequestException as e: print("something error:") print(e)else: print(f"狀態(tài)碼:{res.status_code}, 是否ok: {res.ok}")finally: print("request end")>>>狀態(tài)碼:404, 是否ok: Falserequest end狀態(tài)碼:502, 是否ok: Falserequest end
可以看到使用最大返回的異常捕獲也沒(méi)有捕獲到接口相關(guān)的異常,所以接口異常需要通過(guò)status_code狀態(tài)碼去判斷。狀態(tài)碼有很多,如果不想寫很多if else判斷語(yǔ)句,可以使用 response.raise_for_status() 來(lái)拋出異常。raise_for_status() 是一個(gè)類似斷言assert的方法,如果請(qǐng)求不是200就拋出一個(gè)異常。
import requestsurl = "http://127.0.0.1:8090/demo/10"res = requests.get(url, timeout=5)res.raise_for_status()print(res.json())>>>Traceback (most recent call last): File "/Users/ljk/Documents/code/daily_dev/requests_demo/method_demo.py", line 166, in res.raise_for_status() File "/Users/ljk/Documents/python_env/dev/lib/python3.9/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self)requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: http://127.0.0.1:8090/demo/10
提高請(qǐng)求效率的方法多線程低效的請(qǐng)求:當(dāng)有大量的請(qǐng)求任務(wù)時(shí)使用for循環(huán)逐個(gè)遍歷請(qǐng)求是非常低效的實(shí)現(xiàn)。網(wǎng)絡(luò)IO最耗時(shí)的地方便是等待請(qǐng)求的返回,而for循環(huán)是順序執(zhí)行,只有在前一個(gè)請(qǐng)求返回之后才能繼續(xù)下一個(gè),大量的時(shí)間都浪費(fèi)在網(wǎng)絡(luò)等待中。
多線程優(yōu)化:使用多線程能夠顯著提高代碼效率,減少請(qǐng)求耗時(shí)。原理是:python的多線程在遇到網(wǎng)絡(luò)請(qǐng)求時(shí)會(huì)主動(dòng)讓CPU,所以當(dāng)大量請(qǐng)求線程執(zhí)行時(shí),一個(gè)線程遇到網(wǎng)絡(luò)請(qǐng)求就讓出CPU給其他線程使用,不會(huì)阻塞等待請(qǐng)求返回。這樣大量請(qǐng)求都能同一時(shí)間發(fā)送出去。for循環(huán)請(qǐng)求和多線程請(qǐng)求對(duì)比:
import timeimport threadingimport requests# for循環(huán)start = time.time()for i in range(10): res = requests.get("https://www.csdn.net/", timeout=3)end = time.time()print(f"總計(jì)耗時(shí):{end-start}")# 多線程def get_request(): res = requests.get("https://www.csdn.net/", timeout=3)start = time.time()t_list = []for i in range(10): t = threading.Thread(target=get_request) t_list.append(t) t.start()for t in t_list: t.join()end = time.time()print(f"總計(jì)耗時(shí):{end-start}")>>>總計(jì)耗時(shí):6.254332065582275總計(jì)耗時(shí):0.740969181060791
可以看出多線程的耗時(shí)幾乎是for循環(huán)的10分之一,將整體的請(qǐng)求耗時(shí)降低了一個(gè)層級(jí)。在多線程請(qǐng)求時(shí)如果線程超過(guò)10個(gè),比較推薦使用線程池的技術(shù),能夠有效減少線程的創(chuàng)建耗時(shí)。
from concurrent.futures import ThreadPoolExecutordef get_request(): res = requests.get("https://www.csdn.net/", timeout=3)with ThreadPoolExecutor(max_workers=2) as pool: for i in range(10): pool.submit(get_request)
復(fù)用TCP鏈路每調(diào)用一次requests方法請(qǐng)求一次目標(biāo)服務(wù)器,本地機(jī)器和目標(biāo)服務(wù)器之間都會(huì)建立一次TCP連接,然后傳輸http請(qǐng)求的數(shù)據(jù)。在發(fā)起大量請(qǐng)求的情況下建立過(guò)多的tcp連接不僅會(huì)導(dǎo)致代碼耗時(shí)增加,而且會(huì)讓目標(biāo)服務(wù)器承受網(wǎng)絡(luò)讀寫壓力。使用session可以做到多個(gè)請(qǐng)求共用一個(gè)TCP連接,在大量請(qǐng)求的場(chǎng)景下能夠有效減少代碼耗時(shí)和降低目標(biāo)服務(wù)器壓力。使用session非常簡(jiǎn)單,只需要多做一步實(shí)例化一個(gè)session對(duì)象即可,示例如下:
# 初始化一個(gè)session對(duì)象,相當(dāng)于建立一個(gè)tcp連接s = requests.Session()for i in range(100): res = s.get(f"https://www.target.com/i") print(res.text)# 另一種使用方法with requests.Session() as s: s.get("https://httpbin.org/get")
普通請(qǐng)求和復(fù)用tcp連接請(qǐng)求耗時(shí)對(duì)比:
import threading# 普通連接def get_request(): res = requests.get("https://www.csdn.net/", timeout=3)start = time.time()t_list = []for i in range(10): t = threading.Thread(target=get_request) t_list.append(t) t.start()for t in t_list: t.join()end = time.time()print(f"總計(jì)耗時(shí):{end-start}")# 復(fù)用tcp連接def get_request_session(s): res = s.get("https://www.csdn.net/", timeout=3)start = time.time()t_list = []with requests.Session() as s: for i in range(10): t = threading.Thread(target=get_request_session, args=(s,)) t_list.append(t) t.start() for t in t_list: t.join() end = time.time() print(f"總計(jì)耗時(shí):{end-start}")>>>總計(jì)耗時(shí):0.9967081546783447總計(jì)耗時(shí):0.7688210010528564
可以看出,復(fù)用TCP之后速度有更進(jìn)一步的提升。
重試機(jī)制通常在一次請(qǐng)求中如果超時(shí)了還會(huì)重試幾次,實(shí)現(xiàn)重試邏輯通常會(huì)使用一個(gè)記次的邏輯??赡軙?huì)寫出如下的代碼:
i = 0while i < 3: try: res = requests.get(url, timeout=5) break except requests.exceptions.Timeout: i += 1
其實(shí)重試的功能requests已經(jīng)提供了。requests提供了一個(gè)傳輸適配器的方法完成一些如重試機(jī)制、心跳檢測(cè)等功能能。重試機(jī)制:每當(dāng) Session 被初始化,就會(huì)有默認(rèn)的適配器附著在 Session 上,其中一個(gè)供 HTTP 使用,另一個(gè)供 HTTPS 使用。requests允許用戶創(chuàng)建和使用他們自己的傳輸適配器,實(shí)現(xiàn)他們需要的特殊功能。示例如下:
import timefrom requests.adapters import HTTPAdapters = requests.Session()# 為session添加適配器。根據(jù)url是否為https選擇一個(gè)即可s.mount("http://", HTTPAdapter(max_retries=3))s.mount("https://", HTTPAdapter(max_retries=3))start = time.time()try: res = s.get("http://www.facebook.com", timeout=5) print(res.text)except requests.exceptions.Timeout as e: print(e)end = time.time()print(end-start)>>>HTTPConnectionPool(host="www.facebook.com", port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, "Connection to www.facebook.com timed out. (connect timeout=5)"))20.0400869846344
說(shuō)明:以上代碼一共耗時(shí)20s,然后拋出異常。一次正常的請(qǐng)求加上三次重試,每次5s超時(shí),所以是20s。三次之后請(qǐng)求還是超時(shí),拋出timeout的異常并被捕獲到。
附錄 resquests 最核心代碼def send( self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None ): """Sends PreparedRequest object. Returns Response object. :param request: The :class:`PreparedRequest ` being sent. :param stream: (optional) Whether to stream the request content. :param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) ` tuple. :type timeout: float or tuple or urllib3 Timeout object :param verify: (optional) Either a boolean, in which case it controls whether we verify the server"s TLS certificate, or a string, in which case it must be a path to a CA bundle to use :param cert: (optional) Any user-provided SSL certificate to be trusted. :param proxies: (optional) The proxies dictionary to apply to the request. :rtype: requests.Response """ try: conn = self.get_connection(request.url, proxies) except LocationValueError as e: raise InvalidURL(e, request=request) self.cert_verify(conn, request.url, verify, cert) url = self.request_url(request, proxies) self.add_headers( request, stream=stream, timeout=timeout, verify=verify, cert=cert, proxies=proxies, ) chunked = not (request.body is None or "Content-Length" in request.headers) if isinstance(timeout, tuple): try: connect, read = timeout timeout = TimeoutSauce(connect=connect, read=read) except ValueError: raise ValueError( f"Invalid timeout {timeout}. Pass a (connect, read) timeout tuple, " f"or a single float to set both timeouts to the same value." ) elif isinstance(timeout, TimeoutSauce): pass else: timeout = TimeoutSauce(connect=timeout, read=timeout) try: resp = conn.urlopen( method=request.method, url=url, body=request.body, headers=request.headers, redirect=False, assert_same_host=False, preload_content=False, decode_content=False, retries=self.max_retries, timeout=timeout, chunked=chunked, ) except (ProtocolError, OSError) as err: raise ConnectionError(err, request=request) except MaxRetryError as e: if isinstance(e.reason, ConnectTimeoutError): # TODO: Remove this in 3.0.0: see #2811 if not isinstance(e.reason, NewConnectionError): raise ConnectTimeout(e, request=request) if isinstance(e.reason, ResponseError): raise RetryError(e, request=request) if isinstance(e.reason, _ProxyError): raise ProxyError(e, request=request) if isinstance(e.reason, _SSLError): # This branch is for urllib3 v1.22 and later. raise SSLError(e, request=request) raise ConnectionError(e, request=request) except ClosedPoolError as e: raise ConnectionError(e, request=request) except _ProxyError as e: raise ProxyError(e) except (_SSLError, _HTTPError) as e: if isinstance(e, _SSLError): # This branch is for urllib3 versions earlier than v1.22 raise SSLError(e, request=request) elif isinstance(e, ReadTimeoutError): raise ReadTimeout(e, request=request) elif isinstance(e, _InvalidHeader): raise InvalidHeader(e, request=request) else: raise return self.build_response(request, resp)