# Cleaning Quiz: 
It's your turn! The [E.T. The Extra-terrestrial page](https://www.rottentomatoes.com/m/et_the_extraterrestrial) has a lot of information on this movie.

In this activity, you're going to perform similar actions with BeautifulSoup to extract the following information from each actor or actress listing on the page:
1. The actor/actress name - e.g. "Henry Thomas"
2. The role - e.g. "Elliott"

**Note: All solution notebooks can be found by clicking on the Jupyter icon on the top left of this workspace.**

### Step 1: Get text from the movie web page
You can use the `requests` library to do this.

Outputting all the javascript, CSS, and text may overload the space available to load this notebook, so we omit a print statement here.

In [2]:
# import statements
import requests
from bs4 import BeautifulSoup

In [3]:
# fetch web page
r = requests.get("https://www.rottentomatoes.com/m/et_the_extraterrestrial")
r.text

'<!DOCTYPE html>\n<html lang="en" dir="ltr" xmlns="http://www.w3.org/1999/xhtml" prefix="fb: http://www.facebook.com/2008/fbml og: http://opengraphprotocol.org/schema/">\n    <head prefix="og: http://ogp.me/ns# flixstertomatoes: http://ogp.me/ns/apps/flixstertomatoes#">\n        \n            \n            \n            \n            \n                <script\n                    charset="UTF-8"\n                    crossorigin="anonymous"\n                    data-domain-script="7e979733-6841-4fce-9182-515fac69187f"\n                    integrity="sha384-WEHwEli88wqOiQd913F1utFZiwisa8XhCkbjLnbKEpFa/WbFcPKeGg7h4fdsv0Z/"\n                    src="https://cdn.cookielaw.org/consent/7e979733-6841-4fce-9182-515fac69187f/otSDKStub.js"\n                    type="text/javascript"\n                >\n                </script>\n                <script type="text/javascript">\n                    function OptanonWrapper() { }\n                </script>\n            \n\n            \n             

### Step 2: Use BeautifulSoup to remove HTML tags
Use `"lxml"` rather than `"html5lib"`.

Again, printing this entire result may overload the space available to load this notebook, so we omit a print statement here.

In [4]:
soup = BeautifulSoup(r.text, "lxml")
soup.get_text()

'\n\n\n\n\n                    function OptanonWrapper() { }\n                \n\n\n\n\n\n\nE.T. the Extra-Terrestrial - Rotten Tomatoes\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n                {"@context":"http://schema.org","@type":"Movie","url":"https://www.rottentomatoes.com/m/et_the_extraterrestrial","name":"E.T. the Extra-Terrestrial","description":"After a gentle alien becomes stranded on Earth, the being is discovered and befriended by a young boy named Elliott (Henry Thomas). Bringing the extraterrestrial into his suburban California house, Elliott introduces E.T., as the alien is dubbed, to his brother and his little sister, Gertie (Drew Barrymore), and the children decide to keep its existence a secret. Soon, however, E.T. falls ill, resulting in government intervention and a dire situation for both Elliott and the alien.","dateCreated":"1982-06-11","duration":"1 hr. 55 min.","genre":["Kids & Family","Sci-Fi","Adventure"],"image":"https://resizing.flixster.com/-XZAfH

### Step 3: Find all cast crew members
Use the BeautifulSoup's `find_all` method to select based on tag type and class name. Just like in the video, you can right click on the item, and click "Inspect" to view its html on a web page.

Hint: using "class" may not give you the all the information, you can try something else other than "class".

- dif using class and {'data-qa':'cast-crew-item'} as attribute to fetch cast crew name 

In [8]:
# Find all cast crew summaries
crew = soup.find_all('div', class_='cast-wrap')
print('Number of actor in the cast crew:', len(crew))

Number of actor in the cast crew: 1


In [9]:
# Find all cast crew summaries
crew = soup.find_all('div', {'data-qa':'cast-crew-item'})
print('Number of actor in the cast crew:', len(crew))

Number of actor in the cast crew: 22


### Step 4: Inspect the first crew member to find tags for the memeber's name and role
Tip: `.prettify()` is a super helpful method BeautifulSoup provides to output html in a nicely indented form! Make sure to use `print()` to ensure whitespace is displayed properly.

In [12]:
# print the first summary in crew
print(crew[0].prettify())

<div class="cast-and-crew-item " data-qa="cast-crew-item">
 <a data-qa="cast-crew-item-img-link" href="/celebrity/henry_thomas">
  <img alt="Henry Thomas" class="" loading="lazy" src="https://resizing.flixster.com/vdXpX-hTKGBvkDP2Te6hRlS-s18=/100x120/v2/https://resizing.flixster.com/-XZAfHZM39UwaGJIFWKAE8fS0ak=/v3/t/assets/26487_v9_bb.jpg"/>
 </a>
 <div class="metadata">
  <a data-qa="cast-crew-item-link" href=" /celebrity/henry_thomas ">
   <p>
    Henry Thomas
   </p>
  </a>
  <p class="p--small">
   Elliott
  </p>
 </div>
</div>



Look for tags that contain the actor/actress's name and the role that you want to extract. Then, use the `find_all` method on the crew object to pull out the html with those tags. Afterwards, don't forget to do some extra cleaning to isolate the names (get rid of unnecessary html), as you saw in the last video.

In [19]:
crew[0]

<div class="cast-and-crew-item " data-qa="cast-crew-item">
<a data-qa="cast-crew-item-img-link" href="/celebrity/henry_thomas">
<img alt="Henry Thomas" class="" loading="lazy" src="https://resizing.flixster.com/vdXpX-hTKGBvkDP2Te6hRlS-s18=/100x120/v2/https://resizing.flixster.com/-XZAfHZM39UwaGJIFWKAE8fS0ak=/v3/t/assets/26487_v9_bb.jpg"/>
</a>
<div class="metadata">
<a data-qa="cast-crew-item-link" href=" /celebrity/henry_thomas ">
<p>Henry Thomas</p>
</a>
<p class="p--small">
            
                
                    Elliott
                
            
            
                
                    
                
            
        </p>
</div>
</div>

In [18]:
# Extract name
crew[0].find_all('p')[0].get_text().strip()

'Henry Thomas'

In [23]:
# Extract role
crew[0].find_all('p')[1].get_text().strip()

'Elliott'

### Step 5: Collect names and roles of ALL memeber listings
Reuse your code from the previous step, but now in a loop to extract the name and role from every crew summary in `crew`!

In [None]:
name_role = []
for summary in crew:
    # append name and role of each summary to name_role list
    name = summary.fin

In [None]:
# display results
print(len(name_role), " actors found in cast crew. Sample:")
name_role[:5]