101591 Views
79842 Views
45596 Views
44332 Views
40991 Views
33419 Views
Raspberry Pi Time machine
Now Ad-Free
Guiding Light
Sync Files on your Pis, with Syncthing
NextCloud
Buddy Jr.
Introduction to FreeCAD for Beginners
Building a Robot Arm with Raspberry Pi and PCA9685
Building User Authentication for Static Sites with FastAPI
Mastering Pydantic for Robust Data Validation
Mastering Markdown for Documentation with Jekyll
Introduction to Rust
KevsRobots Learning Platform
50% Percent Complete
By Kevin McAleer, 2 Minutes
In this lesson, we will introduce the concept of web scraping, which is a method of extracting information from websites. Python offers several libraries for web scraping, including Beautiful Soup and requests.
Web scraping is the process of extracting information directly from a web page. It involves making a request to a web page, downloading its HTML content, and parsing that content to extract the information you need.
requests
The requests library allows you to send HTTP requests using Python. You can use it to download web pages.
import requests # Make a request to a web page response = requests.get('https://www.example.com') # Print the status code (200 means success) print(response.status_code) # Print the first 500 characters of the HTML content print(response.text[:500])
Beautiful Soup
Beautiful Soup is a library for parsing HTML and XML documents. It provides methods and Pythonic idioms for iterating, searching, and modifying the parse tree.
from bs4 import BeautifulSoup import requests # Make a request to a web page response = requests.get('https://www.example.com') # Create a Beautiful Soup object soup = BeautifulSoup(response.text, 'html.parser') # Find the title tag title_tag = soup.find('title') # Print the text of the title tag print(title_tag.text)
In this lesson, you’ve learned about web scraping with Python. We’ve covered how to use the requests library to download web pages and the Beautiful Soup library to parse HTML and extract information. Web scraping is a powerful tool for gathering data from the internet.
< Previous Next >