Learn how to install BeautifulSoup a Python library on Ubuntu 22.04, 20.04, or any other version of this Linux that is used for web scraping and parsing HTML and XML documents.
Python’s ecosystem offers various libraries to make the job of developers easy, out of such one is BeautifulSoup. Its name may not sound geeky but it is quite useful for those who are coding in Python. It provides a convenient way to extract data from web pages or search for specific elements, and retrieve their contents or attributes; making it valuable for various data analysis and web scraping tasks.
Its key features are parse HTML and XML documents; allow searching for specific elements in the parse tree; using it you can navigate the parse tree by traversing parents, siblings, and children; can detect the encoding of a document and convert it to Unicode, and its ability Integration with Other Libraries such as pandas or CSV libraries for storing extracted data.
In this article, we find out the commands to install Beautiful Soup on Ubuntu Linux.
Step 1: Install Python PIP, if not already
The easiest and best way to install the Python library is to use its package manager called PIP. So, if you already have that installed then you can skip this step.
To check whether it is available or not, use:
If the output shows the version details, then PIP is there on your system.
However, if you don’t have PIP (Python package installer) already installed then here are the commands to get it.
sudo apt update -y
sudo apt install python3-pip
Step 2: Install BeautifulSoup on Ubuntu
Once you have confirmed the PIP is available on your Ubuntu Linux system, we can use it to install various Python libraries including BeautifulSoup. So, on your Ubuntu command terminal run:
pip install beautifulsoup4 --user
Alternatively, those who don’t want to use PIP, can go for the APT package manager, here is the command for that:
sudo apt install python3-bs4
Step 3: Check BeautifulSoup Version
To confirm, using the previous step command, whether the BeautifulSoup is actually installed on your system to use in Python Script or not, we can use the given command. It will give us the version details of a specified library and where it has been installed.
python3 -m pip show beautifulsoup4
Step 4: Run a simple test script
Let’s create a simple script file in which we import BeeatuifulSoup to parse a simple HTML code.
Use any text editor and create a new Python file. Here we are using nano, for example, test_beautifulsoup.py.
Add the following code to the file:
from bs4 import BeautifulSoup html_doc = """ <html> <head> <title>Example HTML Page</title> </head> <body> <h1>Welcome to Beautiful Soup</h1> <p>This is an example paragraph.</p> </body> </html> """ soup = BeautifulSoup(html_doc, 'html.parser') print(soup.prettify())
Save the created file by pressing Ctrl+X, type Y and then hit the Enter key.
Run the created Python file:
You will see the parsed and prettified HTML in the output on your Ubuntu terminal.
How to Upgrade
In the future, if there is some new update available for the Beautifulsoup library then to upgrade it use:
pip install --upgrade beautifulsoup4 --user
In case after some time, you won’t need the BeatuifulSoup library of Python on your Ubuntu Linux system any more than to remove it, we can again use the PIP, here is the command:
pip uninstall beautifulsoup4
Beautiful Soup is a very useful Python library for web scraping and you have seen the installation process of it is quite simple for Ubuntu Linux. Even you can use the PIP command given in this article on other operating systems to install the Beautiful Soup module.
- 6 Best Python IDEs available for Ubuntu Linux for coding
- How to install PIP in Linux without sudo?
- How to install OpenCV for Python on Ubuntu 22.04, 20.04, or others
- Pygame installation on Ubuntu 22.04 or 20.04 Linux
- Install Eigen C++ library on Ubuntu 22.04 or 20.04
- 2 Methods to Install Eclipse IDE on Ubuntu 22.04 or 20.04 LTS