怎么使用BeautifulSoup与requests库爬取网页

lewis 2年前 (2024-04-19) 阅读数 15 #程序编程

首先需要安装BeautifulSoup和requests库，可以通过以下命令安装：

pip install beautifulsoup4
pip install requests

接下来可以使用以下代码来爬取网页内容：

import requests
from bs4 import BeautifulSoup

# 发起GET请求
url = 'https://www.example.com'
response = requests.get(url)

# 解析网页内容
soup = BeautifulSoup(response.text, 'html.parser')

# 找到需要的内容
content = soup.find('div', class_='content').text

print(content)

在上面的代码中，首先发起一个GET请求并获取网页的内容，然后使用BeautifulSoup解析网页内容。接着通过find方法找到需要的内容，最后打印出来。可以根据网页的具体结构和需要的内容做相应的调整。