# Lab Question 2

Using Python's `for` loop, we can iterate through a collection of items and take actions on each.
The following code creates a list of dictionaries, where each dictionary represents a certain character and contains their page URL and the name of the file where their scraped data should be saved.
Loop through it, extract the same information from their page as you did from Harry Potter's, and store it in the appropriate filename.

```python
characters = [
    {'url': 'https://en.wikipedia.org/wiki/Prince_Caspian_(character)', 'filename': 'caspian.txt'},
    {'url': 'https://en.wikipedia.org/wiki/Oliver_Twist_(character)', 'filename': 'oliver_twist.txt'},
    {'url': 'https://en.wikipedia.org/wiki/Jay_Gatsby', 'filename': 'gatsby.txt'},
]
```

To get you started, your loop might begin like this:

```python
for character in characters:
    url = character['url']
    filename = character['filename']
    ...
```

In Jupyter, loops must be entirely defined in a single cell -- so you may need to condense your code.

## Solution

Again, most of this code can come from our in-class example -- but now we need to put it all inside the body of this `for` loop, so we can execute all of it for each of the characters.

In [1]:
# Always good to start by importing libraries we know we'll need.
import requests
from bs4 import BeautifulSoup

In [2]:
characters = [
    {'url': 'https://en.wikipedia.org/wiki/Prince_Caspian_(character)', 'filename': 'caspian.txt'},
    {'url': 'https://en.wikipedia.org/wiki/Oliver_Twist_(character)', 'filename': 'oliver_twist.txt'},
    {'url': 'https://en.wikipedia.org/wiki/Jay_Gatsby', 'filename': 'gatsby.txt'},
]

In [3]:
for character in characters:
    url = character['url']
    filename = character['filename']
    # Fetch and parse the HTML
    response = requests.get(url)
    html = response.content
    bs = BeautifulSoup(html, 'html.parser')
    # Get the title
    title = bs.title.string
    # Get the image URL
    infobox = bs.find(name='table', class_='infobox')
    image = infobox.img
    image_url = image['src']
    # Get the page text
    all_paragraphs = bs.find(id='mw-content-text').find_all('p')
    p_text = ''
    for paragraph in all_paragraphs: # Notice that loops can be defined within other loops!
        p_text = p_text + ''.join(paragraph.strings)
        p_text = p_text + '\n'
    # Save the file with our scraped info
    with open(filename, 'w') as f:
        f.write('Title: ')
        f.write(title)
        f.write('\n')
        f.write('Image URL: ')
        f.write(image_url)
        f.write('\n')
        f.write(p_text)
    # Print a message so we can see when we've finished each iteration of the loop.
    print(f'Finished scraping {url}. Details written to {filename}')

Finished scraping https://en.wikipedia.org/wiki/Prince_Caspian_(character). Details written to caspian.txt
Finished scraping https://en.wikipedia.org/wiki/Oliver_Twist_(character). Details written to oliver_twist.txt
Finished scraping https://en.wikipedia.org/wiki/Jay_Gatsby. Details written to gatsby.txt


## Bonus Challege
If you're familiar with writing functions, try to define a function `scrape_to_file` so that your loop can be simply the following:

```python
for character in characters:
    scrape_to_file(character['url'], character['filename'])
```

### Solution

This is pretty straightforward if we've already done the above -- just move most of that code into a function and make it take arguments for URL and filename.

In [4]:
def scrape_to_file(url, filename):
    '''
    Scrape a character's Wikipedia page and store the details in a given file.
    '''
    # Fetch and parse the HTML
    response = requests.get(url)
    html = response.content
    bs = BeautifulSoup(html, 'html.parser')
    # Get the title
    title = bs.title.string
    # Get the image URL
    infobox = bs.find(name='table', class_='infobox')
    image = infobox.img
    image_url = image['src']
    # Get the page text
    all_paragraphs = bs.find(id='mw-content-text').find_all('p')
    p_text = ''
    for paragraph in all_paragraphs: # Notice that loops can be defined within other loops!
        p_text = p_text + ''.join(paragraph.strings)
        p_text = p_text + '\n'
    # Save the file with our scraped info
    with open(filename, 'w') as f:
        f.write('Title: ')
        f.write(title)
        f.write('\n')
        f.write('Image URL: ')
        f.write(image_url)
        f.write('\n')
        f.write(p_text)
    # Print a message so we can see when we've finished each iteration of the loop.
    print(f'Finished scraping {url}. Details written to {filename}')

In [5]:
for character in characters:
    scrape_to_file(character['url'], character['filename'])

Finished scraping https://en.wikipedia.org/wiki/Prince_Caspian_(character). Details written to caspian.txt
Finished scraping https://en.wikipedia.org/wiki/Oliver_Twist_(character). Details written to oliver_twist.txt
Finished scraping https://en.wikipedia.org/wiki/Jay_Gatsby. Details written to gatsby.txt
