Python的BeautifulSoup之find和find_all

woff · 發表於 2022-4-4 17:28:24

今天學習寫爬蟲，練習網址為http://www.tshopping.com.tw/forum-266-1.html，做一個抓取title。在過程中遇到這樣一個問題，代碼所示:

import urllib.request as req
url= "http://www.tshopping.com.tw/forum-266-1.html"
request = req.Request(url, headers={
"content-type": "application/json",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
})
with req.urlopen(request) as response:
data = response.read().decode("utf-8")
# 解析原始碼，取得每篇文章的標題
import bs4
root=bs4.BeautifulSoup(data, "html.parser") #讓BeautifulSoup協助我們解析HTML
#print(root.title.string)
ccls = root.find_all("div", class_="c cl") # 尋找所有c cl的DIV標籤
print(ccls)
titles = ccls.find_all("a", class_="z") # 尋找所有c cl的DIV標籤
for title in titles:
print(title)

複製代碼

運行結果如下：

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

糾結了很久，後來看了一些資料發現問題出在對find和find_all這兩個函數的理解不夠。官方指南原文如下：https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

find(  name  ,  attrs  ,  recursive  ,  text  ,  **kwargs  )

find_all() 方法將返回文檔中符合條件的所有tag,儘管有時候我們只想得到一個結果.比如文檔中只有一個<body>標籤,那麼使用 find_all() 方法來查找<body>標籤就不太合適, 使用 find_all 方法並設置 limit=1 參數不如直接使用 find() 方法.下面兩行代碼是等價的:

soup . find_all ( 'title' ,  limit = 1 )
# [<title>The Dormouse's story</title>]

soup . find ( 'title' )
# <title>The Dormouse's story</title>
唯一的區別是 find_all() 方法的返回結果是值包含一個元素的列表,而 find() 方法直接返回結果.

find_all() 方法沒有找到目標是返回空列表,  find() 方法找不到目標時,返回 None  .

紅色字部分是重點，這也就能解釋為什麼用find_all之後再用find_all會報錯，因為find_all返回的是一個list，再對list用find_all時，需要指定元素，所以，改為：

ccls = root.find_all("div", class_="c cl")[0]

複製代碼

賬號		自動登錄	找回密碼
密碼			註冊

[分享] Python的BeautifulSoup之find和find_all

相關帖子

瀏覽過的版塊