# First Scraping Exercise


To begin, we should import dependencies, which are Python libraries that help us carryout some tasks. 

In [None]:
# import dependencies
from bs4 import BeautifulSoup
import requests

Let us get the url of the webpage we want to scrape and store it in a variable called **url**.

In [None]:
# get url
url = 'http://quotes.toscrape.com/'

Use the **requests** to get the entire source content of the webpage and store it in a variable called **response**.

In [None]:
response = requests.get(url)

In [None]:
print(response)

<Response [200]>


Parse the content of the **response** using **BeautifulSoup** library. You need a parser - we chose *html.parser*, which is a common parser people use.

In [None]:
soup = BeautifulSoup(response.content, 'html.parser')

In [None]:
print(soup)

<!DOCTYPE html>

<html lang="en">
<head>
<meta charset="utf-8"/>
<title>Quotes to Scrape</title>
<link href="/static/bootstrap.min.css" rel="stylesheet"/>
<link href="/static/main.css" rel="stylesheet"/>
</head>
<body>
<div class="container">
<div class="row header-box">
<div class="col-md-8">
<h1>
<a href="/" style="text-decoration: none">Quotes to Scrape</a>
</h1>
</div>
<div class="col-md-4">
<p>
<a href="/login">Login</a>
</p>
</div>
</div>
<div class="row">
<div class="col-md-8">
<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
            Tags:
            <meta class="keywords" content="change,deep-thoughts,thinking,world" itemprop="keywords"/>
<a class="

We can display the content of the webpage in a more appealing way using prettify() method of the soup object we created.

In [None]:
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Quotes to Scrape
  </title>
  <link href="/static/bootstrap.min.css" rel="stylesheet"/>
  <link href="/static/main.css" rel="stylesheet"/>
 </head>
 <body>
  <div class="container">
   <div class="row header-box">
    <div class="col-md-8">
     <h1>
      <a href="/" style="text-decoration: none">
       Quotes to Scrape
      </a>
     </h1>
    </div>
    <div class="col-md-4">
     <p>
      <a href="/login">
       Login
      </a>
     </p>
    </div>
   </div>
   <div class="row">
    <div class="col-md-8">
     <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
      <span class="text" itemprop="text">
       “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
      </span>
      <span>
       by
       <small class="author" itemprop="author">
        Albert Einstein
       </small>
       <a href="/author/Albert

It is possible to see people displaying the output of a variable without using the ```print()``` statement, just like in the cell below where **tags** was used instead of ```print(tags)```. It works on Colab, but might not work in other IDE.

Let us now search for all the tags that has the class **tag** on our webpage. Notice the underscore that is included the variable `class_`. That is how to reference a class when searching in scraping. Store the result in a variable, **tags**.

In [None]:
tags = soup.find_all('a', class_='tag')
tags

[<a class="tag" href="/tag/change/page/1/">change</a>,
 <a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>,
 <a class="tag" href="/tag/thinking/page/1/">thinking</a>,
 <a class="tag" href="/tag/world/page/1/">world</a>,
 <a class="tag" href="/tag/abilities/page/1/">abilities</a>,
 <a class="tag" href="/tag/choices/page/1/">choices</a>,
 <a class="tag" href="/tag/inspirational/page/1/">inspirational</a>,
 <a class="tag" href="/tag/life/page/1/">life</a>,
 <a class="tag" href="/tag/live/page/1/">live</a>,
 <a class="tag" href="/tag/miracle/page/1/">miracle</a>,
 <a class="tag" href="/tag/miracles/page/1/">miracles</a>,
 <a class="tag" href="/tag/aliteracy/page/1/">aliteracy</a>,
 <a class="tag" href="/tag/books/page/1/">books</a>,
 <a class="tag" href="/tag/classic/page/1/">classic</a>,
 <a class="tag" href="/tag/humor/page/1/">humor</a>,
 <a class="tag" href="/tag/be-yourself/page/1/">be-yourself</a>,
 <a class="tag" href="/tag/inspirational/page/1/">inspirational</a>,
 

We can use a technique called **list comprehension** to extract links from the tags we obtained.

In [None]:
tag_description = [tags[i].text for i in range(len(tags))]
tag_description

['change',
 'deep-thoughts',
 'thinking',
 'world',
 'abilities',
 'choices',
 'inspirational',
 'life',
 'live',
 'miracle',
 'miracles',
 'aliteracy',
 'books',
 'classic',
 'humor',
 'be-yourself',
 'inspirational',
 'adulthood',
 'success',
 'value',
 'life',
 'love',
 'edison',
 'failure',
 'inspirational',
 'paraphrased',
 'misattributed-eleanor-roosevelt',
 'humor',
 'obvious',
 'simile',
 'love',
 'inspirational',
 'life',
 'humor',
 'books',
 'reading',
 'friendship',
 'friends',
 'truth',
 'simile']

We can also extract links from the tags too using list comprehension.

In [None]:
links = [tags[i]['href'] for i in range(len(tags))]
links

['/tag/change/page/1/',
 '/tag/deep-thoughts/page/1/',
 '/tag/thinking/page/1/',
 '/tag/world/page/1/',
 '/tag/abilities/page/1/',
 '/tag/choices/page/1/',
 '/tag/inspirational/page/1/',
 '/tag/life/page/1/',
 '/tag/live/page/1/',
 '/tag/miracle/page/1/',
 '/tag/miracles/page/1/',
 '/tag/aliteracy/page/1/',
 '/tag/books/page/1/',
 '/tag/classic/page/1/',
 '/tag/humor/page/1/',
 '/tag/be-yourself/page/1/',
 '/tag/inspirational/page/1/',
 '/tag/adulthood/page/1/',
 '/tag/success/page/1/',
 '/tag/value/page/1/',
 '/tag/life/page/1/',
 '/tag/love/page/1/',
 '/tag/edison/page/1/',
 '/tag/failure/page/1/',
 '/tag/inspirational/page/1/',
 '/tag/paraphrased/page/1/',
 '/tag/misattributed-eleanor-roosevelt/page/1/',
 '/tag/humor/page/1/',
 '/tag/obvious/page/1/',
 '/tag/simile/page/1/',
 '/tag/love/',
 '/tag/inspirational/',
 '/tag/life/',
 '/tag/humor/',
 '/tag/books/',
 '/tag/reading/',
 '/tag/friendship/',
 '/tag/friends/',
 '/tag/truth/',
 '/tag/simile/']

Let us add the link extracted above to the base url so as to have url to a particular tag. Recall that at the beginning, we had a variable called **url**, this is our base url.

We can remove the '/' at the begining of each links, so there won't be double '//' in our url.

In [None]:
links = [links[i].lstrip('/') for i in range(len(links))]
links

['tag/change/page/1/',
 'tag/deep-thoughts/page/1/',
 'tag/thinking/page/1/',
 'tag/world/page/1/',
 'tag/abilities/page/1/',
 'tag/choices/page/1/',
 'tag/inspirational/page/1/',
 'tag/life/page/1/',
 'tag/live/page/1/',
 'tag/miracle/page/1/',
 'tag/miracles/page/1/',
 'tag/aliteracy/page/1/',
 'tag/books/page/1/',
 'tag/classic/page/1/',
 'tag/humor/page/1/',
 'tag/be-yourself/page/1/',
 'tag/inspirational/page/1/',
 'tag/adulthood/page/1/',
 'tag/success/page/1/',
 'tag/value/page/1/',
 'tag/life/page/1/',
 'tag/love/page/1/',
 'tag/edison/page/1/',
 'tag/failure/page/1/',
 'tag/inspirational/page/1/',
 'tag/paraphrased/page/1/',
 'tag/misattributed-eleanor-roosevelt/page/1/',
 'tag/humor/page/1/',
 'tag/obvious/page/1/',
 'tag/simile/page/1/',
 'tag/love/',
 'tag/inspirational/',
 'tag/life/',
 'tag/humor/',
 'tag/books/',
 'tag/reading/',
 'tag/friendship/',
 'tag/friends/',
 'tag/truth/',
 'tag/simile/']

In [None]:
tag_url = [url + links[i] for i in range(len(tags))]
tag_url

['http://quotes.toscrape.com/tag/change/page/1/',
 'http://quotes.toscrape.com/tag/deep-thoughts/page/1/',
 'http://quotes.toscrape.com/tag/thinking/page/1/',
 'http://quotes.toscrape.com/tag/world/page/1/',
 'http://quotes.toscrape.com/tag/abilities/page/1/',
 'http://quotes.toscrape.com/tag/choices/page/1/',
 'http://quotes.toscrape.com/tag/inspirational/page/1/',
 'http://quotes.toscrape.com/tag/life/page/1/',
 'http://quotes.toscrape.com/tag/live/page/1/',
 'http://quotes.toscrape.com/tag/miracle/page/1/',
 'http://quotes.toscrape.com/tag/miracles/page/1/',
 'http://quotes.toscrape.com/tag/aliteracy/page/1/',
 'http://quotes.toscrape.com/tag/books/page/1/',
 'http://quotes.toscrape.com/tag/classic/page/1/',
 'http://quotes.toscrape.com/tag/humor/page/1/',
 'http://quotes.toscrape.com/tag/be-yourself/page/1/',
 'http://quotes.toscrape.com/tag/inspirational/page/1/',
 'http://quotes.toscrape.com/tag/adulthood/page/1/',
 'http://quotes.toscrape.com/tag/success/page/1/',
 'http://quote

We can now store both the tags and the corresponding links in a table using Pandas DataFrame.

In [108]:
import pandas as pd

# Create a dictionary to contain the tags and links we created 
# and call it any name you like --> apple..lolz 😄
apple = {
    'Tag': tag_description,
    'Link': tag_url
}

In [107]:
# Create a table for the created dictionary using 
quote = pd.DataFrame(apple)
quote

Unnamed: 0,Tag,Link
0,change,http://quotes.toscrape.com/tag/change/page/1/
1,deep-thoughts,http://quotes.toscrape.com/tag/deep-thoughts/p...
2,thinking,http://quotes.toscrape.com/tag/thinking/page/1/
3,world,http://quotes.toscrape.com/tag/world/page/1/
4,abilities,http://quotes.toscrape.com/tag/abilities/page/1/
5,choices,http://quotes.toscrape.com/tag/choices/page/1/
6,inspirational,http://quotes.toscrape.com/tag/inspirational/p...
7,life,http://quotes.toscrape.com/tag/life/page/1/
8,live,http://quotes.toscrape.com/tag/live/page/1/
9,miracle,http://quotes.toscrape.com/tag/miracle/page/1/


# Conclusion
We can do more things with the basic idea we just learnt, depending on the content we want to see. Feel free to explore and try your hands on different things.