We will use three libraries in this class. **`requests`** helps you download the whole website content. **`BeautifulSoup`** picks out the important parts. We will convert the data we pick out to a proper data set with **`pandas`**. Let's import these. Be careful, you need to import BeautifulSoup **from** the `bs` library (see previous lesson).

In [1]:
import requests
import pandas
from bs4 import BeautifulSoup as bs

Go to the page https://www.nbim.no/en/responsibility/our-voting-records/ and do a search by "Meeting date search". We now see data we want to scrape. Note the URL. Did it change?<br>
yes, now it's `https://www.nbim.no/en/the-fund/responsible-investment/our-voting-records/bydate/?from=01%2F07%2F2013&to=29%2F03%2F2019`<br>
It did! Put the new URL in the variable `url`. Remember, this is a string, you need to use quotes.

In [2]:
url = "https://www.nbim.no/en/the-fund/responsible-investment/our-voting-records/bydate/?from=01%2F07%2F2013&to=29%2F03%2F2019"

We will dowload the website with requests. Create a variable `r` and write **`requests.get(url)`**<br>

In [3]:
r = requests.get(url).text

In [4]:
r

'\n<!DOCTYPE html>\n<html lang="en-GB" class="old-css-loaded">\n<head>\n<meta charset="utf-8" />\n<meta http-equiv="X-UA-Compatible" content="IE=edge" />\n<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=10.0" />\n<meta name="og:site_name" content="Norges Bank Investment Management" />\n<meta name="format-detection" content="telephone=no">\n<title>Our voting records</title>\n<script src="/Static/bundles/jqueryjs"></script>\n<link href="/Static/bundles/css?v=TRS_IZIyv7hpDlv2j_SQtopPaJZjhfiI_QJih2IWN5E1" rel="stylesheet" type="text/css" media="screen" />\n<link href="/Static/bundles/printcss?v=zTwjgUgzz6kW5VMzB_CEg1m4qMLh1ZTWfmMW60o7tj81" rel="stylesheet" type="text/css" media="print" />\n<script type="text/javascript" src="/WebResource.axd?d=k_66XIX6qHAa4rXz1-INcsTytCe4rKfg_b3urJT4QzydQ7ejh9lv-WNCU-eRCaFnkM4B0WasQbXBBJEY7HfqTwvDE-vF4NzuTsQ6pdKA3GkFRgPLdb0d2MeTk9NyPUPgCsO_nz6K59u2eBhKOb4I4kE4guBGrJqEZXesnNMhag1072mjQeBMj2m0u_n4TOLoiT3uNBi5WqBCsdks4XO6xQ

Now let's pick out the interesting parts with BeautifulSoup now. Which are the interesting parts?

## Inspect element

Now we will play **detectives** and **butchers**. We see what we want, but we need to instruct BeautifulSoup how to dissect the website for us. Computer sees different things than we do in a website.

![image.png](attachment:image.png)

Let's see [how tables are made](https://www.w3schools.com/html/tryit.asp?filename=tryhtml_table)

To dissect the website, we will use a couple of BeautifulSoup methods: 
* **`.find()`** : find the first element on a website
* **`.find_all()`** : find all the elements on a website (returns a list)
* **`.get()`** : get a certain attribute such as link
* **`.text`**: return only the text, no tags

## The recipe


  1. create an empty list, called _`data`_ for example
  1. read the website (we read with requests before!) with BeautifulSoup: **`BeautifulSoup(r, "lxml")`**
  2. find the `table` element: **`.find()`**
  3. find all the `tr` elements **`.find_all()`**
  4. `for` each `tr` element, find all `td` elements
  5. save each td element you want in a different variable, eg. `company_name`, `meeting_url`, etc.
  6. make a list of these variables and `append` them to another list

In [29]:
59283 / len(data)

1185.66

In [30]:
import time

In [31]:
data = []

for page in range(1,5):
    
    time.sleep(0.5)
    
    url = url + "&p=" +str(page)

    r = requests.get(url).text
    soup = bs(r)
    table = soup.find("table", class_ = "styled-autocolor")
        
    for row in table.find_all("tr")[1:]:
        cell = row.find_all("td")

        company = cell[0].text.strip()
        ticker = cell[1].text.strip()
        date = cell[2].text.strip()
        mtype = cell[3].text.strip()
        country = cell[4].text.strip()

        new_row = [company, ticker, date, mtype, country]
        #print(new_row)
        data.append(new_row)

In [32]:
len(data)

200

In [33]:
pandas.DataFrame(data).to_csv("data.csv")

Use pandas to export your table into a csv! And you are done!