Open
Description
The web content that is being scrapped from the url provided in the "01-defining-data-science" is extracting irrelevant information like navigation, random articles and refrences and causes errors in getting insights and forming wordcloud
A clear and concise description of what you want to happen.
I would like to form a solution that takes only the necessary and relevant content for further processing
We can use BeautifulSoup instead of HTMLParser and utilize its features to extract only the relevant content
Metadata
Metadata
Assignees
Labels
No labels