# Extract search results with BeautifulSoup: PBS.org - part 01
Now that you have a basic understanding of extracting data of a single page we will continue with extracting search results to extract multiple pages. In this Notebook we will take a look at:
1. Multiple elements
2. Saving the multiple pages locally for data extraction

### 1. Looking at the search results
In this case we are interested in "artificial intelligence" and will search PBS Newshour for articles containing the concept.

In [15]:
import requests
from bs4 import BeautifulSoup

# we need the %22 or " to ensure that we get the combination artificial intelligence
url = 'https://www.pbs.org/newshour/search-results?q=%22artificial%20intelligence%22'

# get url
page = requests.get(url)

# transform to soup
soup = BeautifulSoup(page.content, 'html')


### 2. Extracting multiple elements
We are interested in the URLs that link to the individual articles we want to save later. In the previous Notebook you looked at single items but in this case multiple. Luckily, these are quite simple to find and a quick scan reveals we need to use `<h4 class="search-result__title"> ... </h4>` since this element contains the link to the article. BS4 is flexible and you can search within extracted snippets of HTML.

First we need to `.find_all()` items with a class. This will return a list of snippets extracted from the HTML.

In [16]:
results = soup.find_all(class_='search-result__title')

for res in results:
  print(res)
  
  # you can search in the extracted data by referencing the extracted data
  title = res.find('a').get_text()
  print(title)

<h4 class="search-result__title"><a href="https://www.pbs.org/newshour/economy/google-ceo-calls-for-regulation-of-artificial-intelligence">Google CEO calls for regulation of <b class="search-highlight">artificial</b> <b class="search-highlight">intelligence</b></a></h4>
Google CEO calls for regulation of artificial intelligence
<h4 class="search-result__title"><a href="https://www.pbs.org/newshour/health/are-health-care-claims-overblown-about-artificial-intelligence">Are health care claims overblown about <b class="search-highlight">artificial</b> <b class="search-highlight">intelligence</b>?</a></h4>
Are health care claims overblown about artificial intelligence?
<h4 class="search-result__title"><a href="https://www.pbs.org/newshour/science/how-artificial-intelligence-spotted-every-solar-panel-in-the-u-s">How <b class="search-highlight">artificial</b> <b class="search-highlight">intelligence</b> spotted every solar panel in the U.S.</a></h4>
How artificial intelligence spotted every s

##### 2a. Extract the urls and save in a list

In [None]:
url_list = []

# code goes here
for res in results:
  # you can search in the extracted data by referencing the extracted data
  url = res.find('a')['href']
  url_list.append(url)

url_list


### 3. Save the articles in the data folder
While retrieving and extracting at the same time is an option it is a less practical one. Saving a file to the disk has the following advantages:
1. You do not overuse the website
2. If there is an error extracting (for instance the page is slightly different) then you can easily redo without starting from the beginning
3. You have an archive

In [None]:
# we will use time.sleep() to make the script wait
import time

# now we iterate through the url list and save the individual pages
for url in url_list:
  print("Retrieving",url)

  # get url
  page = requests.get(url)
  
  # create a sensible filename
  filename = url.replace('https://www.pbs.org/newshour/', '').replace('/', '-') + '.html'
  destination = './data/' + filename
    
  with open(destination, 'w') as f:
    f.write(page.text)
  
  # wait two seconds not to overuse the server
  time.sleep(2)