# Cleaning Quiz: 
It's your turn! The [E.T. The Extra-terrestrial page](https://www.rottentomatoes.com/m/et_the_extraterrestrial) has a lot of information on this movie.

In this activity, you're going to perform similar actions with BeautifulSoup to extract the following information from each actor or actress listing on the page:
1. The actor/actress name - e.g. "Henry Thomas"
2. The role - e.g. "Elliott"

**Note: All solution notebooks can be found by clicking on the Jupyter icon on the top left of this workspace.**

### Step 1: Get text from the movie web page
You can use the `requests` library to do this.

Outputting all the javascript, CSS, and text may overload the space available to load this notebook, so we omit a print statement here.

In [1]:
# import statements
import pandas as pd
import numpy as np
import requests
import re
from bs4 import BeautifulSoup 

In [46]:
# fetch web page
r = requests.get('https://www.rottentomatoes.com/m/the_batman')
print(r.text)

<!DOCTYPE html>
<html lang="en"
      dir="ltr"
      xmlns:fb="http://www.facebook.com/2008/fbml"
      xmlns:og="http://opengraphprotocol.org/schema/">

    <head prefix="og: http://ogp.me/ns# flixstertomatoes: http://ogp.me/ns/apps/flixstertomatoes#">
        

        
            <script src="/assets/pizza-pie/javascripts/bundles/roma/rt-common.js?single"></script>
        
        <!-- salt=lay-def-02-juRm -->
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <meta http-equiv="x-ua-compatible" content="ie=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1">

        <title>The Batman - Rotten Tomatoes</title>
        <meta name="description" content="Batman ventures into Gotham City's underworld when a sadistic killer leaves behind a trail of cryptic clues. As the evidence begins to lead closer to home and the scale of the perpetrator's plans become clear, he must forge new relationships, unmask the culprit and br

### Step 2: Use BeautifulSoup to remove HTML tags
Use `"lxml"` rather than `"html5lib"`.

Again, printing this entire result may overload the space available to load this notebook, so we omit a print statement here.

In [47]:
soup = BeautifulSoup(r.text, "lxml")
print(soup.text)








The Batman - Rotten Tomatoes





































































                            Home
                        



                            Top Box Office
                        



                            Tickets & Showtimes
                        



                            DVD & Streaming
                        




                                TV
                            




                            News
                        






















What's the Tomatometer®?
Critics

SIGN UP
                        |
                        LOG IN




















                       Cancel
                    



Movies / TV



Celebrity



No Results Found

View All













                        Movies





Movies in Theaters


Opening This Week


Coming Soon to Theaters


Certified Fresh Movies






Movies at Home


Vudu


Netflix Streaming


iTunes


Amazon and Amazon Prime


Most Popular S

### Step 3: Find all cast crew members
Use the BeautifulSoup's `find_all` method to select based on tag type and class name. Just like in the video, you can right click on the item, and click "Inspect" to view its html on a web page.

Hint: using "class" may not give you the all the information, you can try something else other than "class".

In [48]:
# Find all cast crew summaries
crew = soup.find_all('div', class_='cast-item media inlineBlock')
print('Number of actor in the cast crew:', len(crew))

Number of actor in the cast crew: 6


### Step 4: Inspect the first crew member to find tags for the memeber's name and role
Tip: `.prettify()` is a super helpful method BeautifulSoup provides to output html in a nicely indented form! Make sure to use `print()` to ensure whitespace is displayed properly.

In [49]:
print(crew[0].prettify())

<div class="cast-item media inlineBlock" data-qa="cast-crew-item">
 <div class="pull-left">
  <a data-qa="cast-crew-item-img-link" href="/celebrity/robert_pattinson">
   <img class="js-lazyLoad actorThumb medium media-object" data-src="https://resizing.flixster.com/QkgyoFtFMA2bCcI_kt24eUIoCrk=/100x120/v2/https://flxt.tmsimg.com/assets/487714_v9_bb.jpg"/>
  </a>
 </div>
 <div class="media-body">
  <a class="unstyled articleLink" data-qa="cast-crew-item-link" href=" /celebrity/robert_pattinson ">
   <span title="Robert Pattinson">
    Robert Pattinson
   </span>
  </a>
  <span class="characters subtle smaller" title="Robert Pattinson">
   <br/>
   Bruce Wayne, 
                
                    The Batman
   <br/>
  </span>
 </div>
</div>



Look for tags that contain the actor/actress's name and the role that you want to extract. Then, use the `find_all` method on the crew object to pull out the html with those tags. Afterwards, don't forget to do some extra cleaning to isolate the names (get rid of unnecessary html), as you saw in the last video.

In [50]:
# Extract name
name = crew[0].find_all('span')[0].get_text().strip()
print(name)

Robert Pattinson


In [51]:
role = crew[0].find_all('span')[1].get_text().strip()
print(role)

Bruce Wayne, 
                
                    The Batman


### Step 5: Collect names and roles of ALL memeber listings
Reuse your code from the previous step, but now in a loop to extract the name and role from every crew summary in `crew`!

In [62]:
name_roles = []

for crew_member in crew:
    name = crew_member.find_all('span')[0].get_text().strip()
    name = " ".join(re.split(r'\W+', name))
    role = crew_member.find_all('span')[1].get_text().strip()
    role = " ".join(re.split(r'\W+', role))
    name_roles.append(f'{name} as: {role}')

# display results
print(len(name_roles), " actors found in cast crew. Sample:")
name_roles[:]

6  actors found in cast crew. Sample:


['Robert Pattinson as: Bruce Wayne The Batman',
 'Zoë Kravitz as: Selina Kyle',
 'Jeffrey Wright as: Lt James Gordon',
 'Colin Farrell as: Oz The Penguin',
 'Paul Dano as: The Riddler',
 'John Turturro as: Carmine Falcone']