Retrieving anchor tag using BeautifulSoup

Naivedh Shah
2 min readNov 7, 2019

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching and modifying the parse tree. It commonly saves the programmer’s hours or days of work.

The examples in this documentation should work the same way in Python 2.7 and Python 3.2.

You need to install beautiful soup.

We will import urllib, SSL, Beautiful soup.

Urllib is a package that collects several modules for working with URLs:
urllib.request for opening and reading URLs
urllib.error containing the exceptions raised by urllib.request
urllib.parse for parsing URLs
urllib.robotparser for parsing robots.txt files

Secure Sockets Layer (SSL) is a networking protocol designed for securing connections between web clients and web servers over an insecure network, such as the internet. After being formally introduced in 1995, SSL made it possible for a web server to securely enable online transactions between consumers and businesses.

Note: Don’t worry about the SSL and the next 3 lines. It is just a way to ignore errors if you have SSL certification errors

So, let’s start.

Now, we will ask the user to enter the Url. We will use urllib.request.urlopen().read() method to read data from web pages. Then we will pass this data to Beautiful soup so that it will deal with all the nasty bit of code and it will convert UTF-8 to Unicode as Python follows Unicode.

Now, we retrieve all the anchor tags.

Then, we loop through all tags and we pull out all text in href.

So, let’s try it out.

OUTPUT

This, is how we retrieve all the href’s.

--

--

Naivedh Shah

Data Science Enthusiast | ML Enthusiast | TCS CA | Coding Blocks CA | Blogger | Community Member | Public Speaker