Irrelevant content getting scrapped

The web content that is being scrapped from the url provided in the "01-defining-data-science" is extracting irrelevant information like navigation, random articles and refrences and causes errors in getting insights and forming wordcloud

A clear and concise description of what you want to happen.
I would like to form a solution that takes only the necessary and relevant content for further processing

We can use BeautifulSoup instead of HTMLParser and utilize its features to extract only the relevant content

Irrelevant Content:
![irrelevant](https://github.com/microsoft/Data-Science-For-Beginners/assets/115731296/01eda6b6-4a87-45e2-ad6c-ae8ed2cc6864)
Relevant Content
![relevant](https://github.com/microsoft/Data-Science-For-Beginners/assets/115731296/5e417081-6289-4af4-b305-65359ea8b778)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Irrelevant content getting scrapped #538

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Irrelevant content getting scrapped #538

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions