## Homework 3: Scraping
This homework asks you to scrape from three different sources, please follow the instructions and do the best you can. With the exception of the first cell that imports the request and the beautiful soup library, I have not pre-written any code for you. Do not be afraid--you may look at the tutorial for examples, as well as the Beautiful Soup documentation, and any other Python resource (such a Stack overflow) if need be. 

This homework is intended to be challenging, if you only get 70% of the stuff done that's great!

In [2]:
import requests
from bs4 import BeautifulSoup

## Shakespeare 
I've posted some relatively simple HTML of the first act of William Shakespeare's The Tempest here: http://floatingmedia.com/columbia/tempest.html please use that URL to download the HTML and put it through Beautiful Soup. (It also may be very helpful for you to go to that page in chrome and inspect elements why you do this.) 
**Please note: there are 2 scenes in Act 1**



In the cells below you should you make two lines--a variable that requests and reads the HTML from the URL (http://floatingmedia.com/columbia/tempest.html), and a second variable that passes the raw HTML into Beautiful Soup. You will use that beautiful soup variable to search the HTML.

In [2]:
raw_text = requests.get('http://floatingmedia.com/columbia/tempest.html').content
soup = BeautifulSoup(raw_text, "html.parser")

Get the title of the play:



In [3]:
soup.title.text
# alternative use soup.title.string, which isn't "outdated"?

'The Tempest'

Get the HTML that contains the setting of Act One Scene 1:

In [4]:
soup.find(text='Act I, Scene 1').parent.parent.select('.stagedir')[0]

<p class="stagedir"><strong>
On a ship at sea</strong></p>

Get the setting of Act One Scene 1 (without HTML tags):

In [5]:
soup.find(text='Act I, Scene 1').parent.parent.select('.stagedir')[
  0].string

# or
# soup.div.strong.string

'\nOn a ship at sea'

Get the setting of scene 2

In [6]:
soup.find(text='Act I, Scene 2').parent.parent.select('.stagedir')[0].string

'The island. Before PROSPERO’S cell.'

Get the name of the first character in Scene 1

In [7]:
soup.select('div')[1].select('ul li.playtext strong')[0].string

'Master. '

Get a list of all of characters in Scene 1 (repeats are fine, you need a loop here)

In [8]:
character_list = [char.string for char in soup.select('div')[1]
                  .select('ul li.playtext strong')]
character_list


['Master. ',
 'Boatswain. ',
 'Master. ',
 'Boatswain. ',
 'Alonso. ',
 'Boatswain. ',
 'Antonio. ',
 'Boatswain. ',
 'Gonzalo. ',
 'Boatswain. ',
 'Gonzalo. ',
 'Boatswain. ',
 'Gonzalo. ',
 'Boatswain. ',
 'Sebastian. ',
 'Boatswain. ',
 'Antonio. ',
 'Gonzalo. ',
 'Boatswain. ',
 'Mariners. ',
 'Boatswain. ',
 'Gonzalo. ',
 'Sebastian. ',
 'Antonio. ',
 'Gonzalo. ',
 'Antonio. ',
 'Sebastian. ',
 'Gonzalo. ']

In [9]:
cleaned_character_list = set(character_list)
cleaned_character_list

{'Alonso. ',
 'Antonio. ',
 'Boatswain. ',
 'Gonzalo. ',
 'Mariners. ',
 'Master. ',
 'Sebastian. '}

Display every stage direction in _scene 2_

In [10]:
for stagedir in soup.select('div')[3].select('.stagedir'):
  print(stagedir.string)

[Enter PROSPERO and MIRANDA]
[Enter ARIEL]
[Exit]
[Enter CALIBAN]
[Draws, and is charmed from moving]
[Exeunt]


Get the HTML containing Miranda's first speech in Scene 2

In [11]:
soup.select('div')[3].find(text='Miranda. ').parent.parent

<li class="playtext"><strong>Miranda. </strong>If by your art, my dearest father, you have
 <span class="playlinenum">85</span><br/>Put the wild waters in this roar, allay them.
<br/>The sky, it seems, would pour down stinking pitch,
<br/>But that the sea, mounting to the welkin's cheek,
<br/>Dashes the fire out. O, I have suffered
<br/>With those that I saw suffer: a brave vessel,
 <span class="playlinenum">90</span><br/>Who had, no doubt, some noble creature in her,
<br/>Dash'd all to pieces. O, the cry did knock
<br/>Against my very heart. Poor souls, they perish'd.
<br/>Had I been any god of power, I would
<br/>Have sunk the sea within the earth or ere
 <span class="playlinenum">95</span><br/>It should the good ship so have swallow'd and
<br/>The fraughting souls within her.
</li>

Now display those **same lines without the speaker's name, and no HTML.**

This is a bit tricky, try to get as close to this result as you can. (Note: I didn't get the numbers out of there, but it's fine if you want to get them out too):

`If by your art, my dearest father, you have
 
85
Put the wild waters in this roar, allay them.

The sky, it seems, would pour down stinking pitch,

But that the sea, mounting to the welkin's cheek,

Dashes the fire out. O, I have suffered

With those that I saw suffer: a brave vessel,
 
90
Who had, no doubt, some noble creature in her,

Dash'd all to pieces. O, the cry did knock

Against my very heart. Poor souls, they perish'd.

Had I been any god of power, I would

Have sunk the sea within the earth or ere
 
95
It should the good ship so have swallow'd and

The fraughting souls within her.`

In [12]:
import bs4

for sibling in soup.select('div')[3].find(text='Miranda. '
                                               '').parent.next_siblings:
  if type(sibling) == bs4.element.NavigableString:
    print(sibling.string.strip())


If by your art, my dearest father, you have
Put the wild waters in this roar, allay them.
The sky, it seems, would pour down stinking pitch,
But that the sea, mounting to the welkin's cheek,
Dashes the fire out. O, I have suffered
With those that I saw suffer: a brave vessel,
Who had, no doubt, some noble creature in her,
Dash'd all to pieces. O, the cry did knock
Against my very heart. Poor souls, they perish'd.
Had I been any god of power, I would
Have sunk the sea within the earth or ere
It should the good ship so have swallow'd and
The fraughting souls within her.


In [None]:
# there is also the possibility to use 
# soup.find_all(string=True)

Get the HTML containing the speech after Miranda's

In [13]:
soup.select('div')[3].find(text='Miranda. '
                                '').parent.parent.parent.next_sibling.next_sibling

<ul><li class="playtext"><strong>Prospero. </strong>Be collected:
<br/>No more amazement: tell your piteous heart
<br/>There's no harm done.
 <span class="playlinenum">100</span></li></ul>

## Supreme Court Decisions 2017 
Okay now it's time to scrape from reality. The Supreme Court posts its decisions in a format that is not particularly data friendly. They have a simple HTML table with some information about the decision, including a link to a PDF that contains the written decision. We won't mess with those PDFs this week, but we do want to transform their tables into something useful to us. 

We will be scraping this page: 
https://www.supremecourt.gov/opinions/slipopinion/17

*Note:* While you won't see all of the tables for all the months when you go to the page, they are all there in the HTML that you will download and in the HTML source (which is the same thing). Definitely do a view source, and study the structure of the HTML tables before you start coding.

You eventually want to end up with a list of lists (rows and then columns) for every decision from the 2017. Follow the process, and see how far you get.


Once again, Write your lines that use requests to get the page, and a second variable that passes the raw HTML into Beautiful Soup for parsing. Include a third line that prints the HTML in the prettify() way.

In [3]:
supremecourt_raw_data = requests.get(
  'https://www.supremecourt.gov/opinions/slipopinion/17').content
soup = BeautifulSoup(supremecourt_raw_data, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
 <head id="ctl00_ctl00_Head1">
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="txt/html; charset=utf-8" http-equiv="content-type"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <script src="/js/jquery-3.1.0.min.js" type="text/javascript">
  </script>
  <script src="/js/bootstrap.js" type="text/javascript">
  </script>
  <link href="/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
  <link href="/css/bootstrap.min.css" rel="Stylesheet" type="text/css"/>
  <link href="/css/bootstrap-theme.min.css" rel="Stylesheet" type="text/css"/>
  <link href="/styles/newBootStrap2.css" rel="stylesheet" type="text/css"/>
  <!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
  <!--[if lt IE 9]>
          <script src="/js/html5shiv.js"></script>
          <script src="/js/respond.min.js"></script>
        <![endif]-->
  <!--[if lt IE 8]>
       

Isolate the HTML row with the first row of information for the case Azar v. Garza

In [4]:
for parent in soup.find(text='Azar v. Garza').parents:
  if parent.name == 'tr':
    case_row = parent
    break

case_row

<tr>
<td style="text-align: center;">46</td>
<td style="text-align: center;">6/04/18</td>
<td style="text-align: center;">17-654</td>
<td><a href="/opinions/17pdf/17-654_5j3b.pdf" target="_blank" title="The D. C. Circuit’s judgment is vacated, and the case is remanded with instructions to dismiss the individual claim for injunctive relief as moot.">Azar v. Garza</a></td>
<td style="text-align: center;"> </td>
<td style="text-align: center;">PC</td>
<td style="text-align: center;">584/2</td>
</tr>

Print out each cell of information from that first row. Your output should look like this:


```
46
6/04/18
17-654
Azar v. Garza
 
PC
584/2
```

In [5]:
for str in case_row.stripped_strings:
  print(str)

46
6/04/18
17-654
Azar v. Garza
PC
584/2


But wait, there is more information hidden inside the tags! Really important information. Find it and print it out like this (still just for this first row):

```
/opinions/17pdf/17-654_5j3b.pdf 
 The D. C. Circuit’s judgment is vacated, and the case is 
remanded with instructions to dismiss the individual claim 
for injunctive relief as moot.
 ```

In [6]:
case_link = soup.find(text='Azar v. Garza').parent
print(case_link.attrs['href'])
print(case_link.attrs['title'])

/opinions/17pdf/17-654_5j3b.pdf
The D. C. Circuit’s judgment is vacated, and the case is remanded with instructions to dismiss the individual claim for injunctive relief as moot.


Great! Now you want to go through all of the rows in that first table (but not the header), and get a list of lists with the information for every case in that row. Your output should look like this:

```
[['46',
  '6/04/18',
  '17-654',
  'Azar v. Garza',
  '\xa0',
  'PC',
  '584/2',
  '/opinions/17pdf/17-654_5j3b.pdf',
  'The D. C. Circuit’s judgment is vacated, and the case is remanded with instructions to dismiss the individual claim for injunctive relief as moot.'],
 ['45',
  '6/04/18',
  '16-1215',
  'Lamar, Archer & Cofrin, LLP v. Appling',
  '\xa0',
  'SS',
  '584/2',
  '/opinions/17pdf/16-1215_gdhk.pdf',
  'Single-asset statements qualify as “statement[s] respecting the debtor’s . . . financial condition” for purposes of Bankruptcy Code §523(a)(2)’s exceptions to discharge; where, as here, a single-asset statement is not in writing, the associated debt may be discharged.'],
 ['44',
  '6/04/18',
  '17-5716',
  'Koons v. United States',
  '\xa0',
  'A',
  '584/2',
  '/opinions/17pdf/17-5716_jhek.pdf',
  'Petitioners do not qualify for sentence reductions under 18 U. S. C. §3582(c)(2) because their sentences were not “based on” their lowered Federal Sentencing Guidelines ranges but, instead, were “based on” their mandatory minimums and their substantial assistance to the Government.'],
 ['43',
  '6/04/18',
  '17-155',
  'Hughes v. United States',
  '6/05/18',
  'K',
  '584/2',
  '/opinions/17pdf/17-155_new_4f15.pdf',
  'A Federal Rule of Criminal Procedure 11(c)(1)(C) plea agreement is “based on” the defendant’s Federal Sentencing Guidelines range so long as that range was part of the framework the district court relied on in imposing the sentence or accepting the agreement; thus, Hughes may seek a sentencing reduction under 18 U. S. C. §3582(c)(2).'],
 ['42',
  '6/04/18',
  '16-111',
  'Masterpiece Cakeshop, Ltd. v. Colorado Civil Rights Comm’n',
  '6/04/18',
  'K',
  '584/2',
  '/opinions/17pdf/16-111_new_d1of.pdf',
  'The Colorado Civil Rights Commission’s actions in assessing a cakeshop owner’s reasons for declining to make a cake for a same-sex couple’s wedding celebration violated the Free Exercise Clause.']]
```

In [7]:
def get_case_list_by_table(table):
  table_data = []
  for row in table.find_all('tr'):
    row_data = [cell.string.strip() for cell in row.find_all('td')]
    if len(row_data) < 1:
      continue
    link = row.find('a')
    row_data.append(link.attrs['href'])
    row_data.append(link.attrs['title'])
    table_data.append(row_data)

  return table_data


get_case_list_by_table(soup.find(id='pagemaindiv').find('table'))


[['50',
  '6/11/18',
  '17-269',
  'Washington v. United States',
  '',
  'PC',
  '584/2',
  '/opinions/17pdf/17-269_3eb4.pdf',
  ' Judgment affirmed by an equally divided Court.'],
 ['49',
  '6/11/18',
  '16-1432',
  'Sveen v. Melin',
  '',
  'EK',
  '584/2',
  '/opinions/17pdf/16-1432_7j8b.pdf',
  ' The retroactive application of Minnesota’s revocation-on-divorce statute—which automatically nullifies an ex-spouse’s beneficiary designation on a life-insurance policy or other will substitute—does not violate the Contracts Clause.'],
 ['48',
  '6/11/18',
  '16-980',
  'Husted v. A. Philip Randolph Institute',
  '',
  'A',
  '584/2',
  '/opinions/17pdf/16-980_f2q3.pdf',
  ' The process that Ohio uses to remove voters on change-of-residence grounds does not violate the National Voter Registration Act.'],
 ['47',
  '6/11/18',
  '17-432',
  'China Agritech, Inc. v. Resh',
  '',
  'G',
  '584/2',
  '/opinions/17pdf/17-432_08m1.pdf',
  ' Upon denial of class certification, a putative class me

Finally, go through EVERY table, and get out every row--no headers. So you have all of the 2017 decisions from 46-1 info in excellent list within list format.

In [19]:
allthedata = []

for table in soup.find(id='pagemaindiv').find_all('table'):
  allthedata = allthedata + get_case_list_by_table(table)
  
allthedata

[['50',
  '6/11/18',
  '17-269',
  'Washington v. United States',
  '',
  'PC',
  '584/2',
  '/opinions/17pdf/17-269_3eb4.pdf',
  ' Judgment affirmed by an equally divided Court.'],
 ['49',
  '6/11/18',
  '16-1432',
  'Sveen v. Melin',
  '',
  'EK',
  '584/2',
  '/opinions/17pdf/16-1432_7j8b.pdf',
  ' The retroactive application of Minnesota’s revocation-on-divorce statute—which automatically nullifies an ex-spouse’s beneficiary designation on a life-insurance policy or other will substitute—does not violate the Contracts Clause.'],
 ['48',
  '6/11/18',
  '16-980',
  'Husted v. A. Philip Randolph Institute',
  '',
  'A',
  '584/2',
  '/opinions/17pdf/16-980_f2q3.pdf',
  ' The process that Ohio uses to remove voters on change-of-residence grounds does not violate the National Voter Registration Act.'],
 ['47',
  '6/11/18',
  '17-432',
  'China Agritech, Inc. v. Resh',
  '',
  'G',
  '584/2',
  '/opinions/17pdf/17-432_08m1.pdf',
  ' Upon denial of class certification, a putative class me

## Real Shakespeare: Extra Credit
The Folger  Shakespeare Library has HTML versions of their Shakespeare publicly available, but in terrible HTML format. If you want to challenge yourself try pulling out the first 100 lines of Twelfth Night, available here:

http://floatingmedia.com/columbia/FolgerShakes/TN.html

The final output should resemble what you see below. Each of these lines contains three elements:

1) a code for act.scene.line Along with whether is the stage direction 
2) the speaker or the last person who spoke prior to the stage direction
3) a line or stage direction.

`
line-SD 1.1.0	NOSPEAKER	Enter Orsino, Duke of Illyria, Curio, and other Lords,
line-SD 1.1.0	NOSPEAKER	with
line-SD 1.1.0	NOSPEAKER	 Musicians playing.
line-1.1.1	ORSINO	If music be the food of love, play on.
line-1.1.2	ORSINO	Give me excess of it, that, surfeiting,
line-1.1.3	ORSINO	The appetite may sicken and so die.
line-1.1.4	ORSINO	That strain again! It had a dying fall.
line-1.1.5	ORSINO	O, it came o’er my ear like the sweet sound
line-1.1.6	ORSINO	That breathes upon a bank of violets,
line-1.1.7	ORSINO	Stealing and giving odor. Enough; no more.
line-1.1.8	ORSINO	’Tis not so sweet now as it was before.
line-1.1.9	ORSINO	O spirit of love, how quick and fresh art thou,
line-1.1.10	ORSINO	That, notwithstanding thy capacity
line-1.1.11	ORSINO	Receiveth as the sea, naught enters there,
line-1.1.12	ORSINO	Of what validity and pitch soe’er,
line-1.1.13	ORSINO	But falls into abatement and low price
line-1.1.14	ORSINO	Even in a minute. So full of shapes is fancy
line-1.1.15	ORSINO	That it alone is high fantastical.
line-1.1.16	CURIO	Will you go hunt, my lord?
line-1.1.17	ORSINO	What, Curio?
line-1.1.18	CURIO	The hart.
line-1.1.19	ORSINO	Why, so I do, the noblest that I have.
line-1.1.20	ORSINO	O, when mine eyes did see Olivia first,
line-1.1.21	ORSINO	Methought she purged the air of pestilence.
line-1.1.22	ORSINO	That instant was I turned into a hart,
line-1.1.23	ORSINO	And my desires, like fell and cruel hounds,
line-1.1.24	ORSINO	E’er since pursue me.
line-SD 1.1.24.1	ORSINO	Enter Valentine.
line-1.1.25	ORSINO	How now, what news from her?
line-1.1.26	VALENTINE	So please my lord, I might not be admitted,
line-1.1.27	VALENTINE	But from her handmaid do return this answer:
line-1.1.28	VALENTINE	The element itself, till seven years’ heat,
line-1.1.29	VALENTINE	Shall not behold her face at ample view,
line-1.1.30	VALENTINE	But like a cloistress she will veilèd walk,
line-1.1.31	VALENTINE	And water once a day her chamber round
line-1.1.32	VALENTINE	With eye-offending brine—all this to season
line-1.1.33	VALENTINE	A brother’s dead love, which she would keep fresh
line-1.1.34	VALENTINE	And lasting in her sad remembrance.
line-1.1.35	ORSINO	O, she that hath a heart of that fine frame
line-1.1.36	ORSINO	To pay this debt of love but to a brother,
line-1.1.37	ORSINO	How will she love when the rich golden shaft
line-1.1.38	ORSINO	Hath killed the flock of all affections else
line-1.1.39	ORSINO	That live in her; when liver, brain, and heart,
line-1.1.40	ORSINO	These sovereign thrones, are all supplied, and filled
line-1.1.41	ORSINO	Her sweet perfections with one self king!
line-1.1.42	ORSINO	Away before me to sweet beds of flowers!
line-1.1.43	ORSINO	Love thoughts lie rich when canopied with bowers.
line-SD 1.1.43.1	ORSINO	They exit.
line-SD 1.2.0	ORSINO	Enter Viola, a Captain, and Sailors.
line-1.2.1	VIOLA	What country, friends, is this?
line-1.2.2	CAPTAIN	This is Illyria, lady.
line-1.2.3	VIOLA	And what should I do in Illyria?
line-1.2.4	VIOLA	My brother he is in Elysium.
line-1.2.5	VIOLA	Perchance he is not drowned.—What think you,
line-1.2.6	VIOLA	sailors?
line-1.2.7	CAPTAIN	It is perchance that you yourself were saved.
line-1.2.8	VIOLA	O, my poor brother! And so perchance may he be.
line-1.2.9	CAPTAIN	True, madam. And to comfort you with chance,
line-1.2.10	CAPTAIN	Assure yourself, after our ship did split,
line-1.2.11	CAPTAIN	When you and those poor number saved with you
line-1.2.12	CAPTAIN	Hung on our driving boat, I saw your brother,
line-1.2.13	CAPTAIN	Most provident in peril, bind himself
line-1.2.14	CAPTAIN	(Courage and hope both teaching him the practice)
line-1.2.15	CAPTAIN	To a strong mast that lived upon the sea,
line-1.2.16	CAPTAIN	Where, like Arion
line-1.2.16	CAPTAIN	 on the dolphin’s back,
line-1.2.17	CAPTAIN	I saw him hold acquaintance with the waves
line-1.2.18	CAPTAIN	So long as I could see.
line-SD 1.2.19	VIOLA	, giving
line-SD 1.2.19	VIOLA	 him money
line-1.2.19	VIOLA	For saying so, there’s gold.
line-1.2.20	VIOLA	Mine own escape unfoldeth to my hope,
line-1.2.21	VIOLA	Whereto thy speech serves for authority,
line-1.2.22	VIOLA	The like of him. Know’st thou this country?
line-1.2.23	CAPTAIN	Ay, madam, well, for I was bred and born
line-1.2.24	CAPTAIN	Not three hours’ travel from this very place.
line-1.2.25	VIOLA	Who governs here?
line-1.2.26	CAPTAIN	A noble duke, in nature as in name.
line-1.2.27	VIOLA	What is his name?
line-1.2.28	CAPTAIN	Orsino.
line-1.2.29	VIOLA	Orsino. I have heard my father name him.
line-1.2.30	VIOLA	He was a bachelor then.
line-1.2.31	CAPTAIN	And so is now, or was so very late;
line-1.2.32	CAPTAIN	For but a month ago I went from hence,
line-1.2.33	CAPTAIN	And then ’twas fresh in murmur (as, you know,
line-1.2.34	CAPTAIN	What great ones do the less will prattle of)
line-1.2.35	CAPTAIN	That he did seek the love of fair Olivia.
line-1.2.36	VIOLA	What’s she?
line-1.2.37	CAPTAIN	A virtuous maid, the daughter of a count
line-1.2.38	CAPTAIN	That died some twelvemonth since, then leaving her
line-1.2.39	CAPTAIN	In the protection of his son, her brother,
line-1.2.40	CAPTAIN	Who shortly also died, for whose dear love,
line-1.2.41	CAPTAIN	They say, she hath abjured the sight
line-1.2.42	CAPTAIN	And company of men.
line-1.2.43	VIOLA	O, that I served that lady,
line-1.2.44	VIOLA	And might not be delivered to the world
line-1.2.45	VIOLA	Till I had made mine own occasion mellow,
line-1.2.46	VIOLA	What my estate is.
line-1.2.47	CAPTAIN	That were hard to compass
line-1.2.48	CAPTAIN	Because she will admit no kind of suit,
`

Request and parse the HTML, and give it a shot!

In [20]:
more_shakespeare_data = requests.get(
  'http://floatingmedia.com/columbia/FolgerShakes/TN.html').content
soup = BeautifulSoup(more_shakespeare_data, 'html.parser')


In [21]:

soup.find_all('span')

[<span class="italic">Michael Witmore</span>,
 <span class="italic">Hamlet</span>,
 <span class="italic">King Lear</span>,
 <span class="italic">Henry V</span>,
 <span class="italic">Romeo and Juliet</span>,
 <span class="italic">The Tempest</span>,
 <span class="italic">Othello</span>,
 <span class="italic">Henry V</span>,
 <span style="white-space:nowrap"><img alt="half-square bracket" class="imgTextX" src="fdt-emend-l.png"/>blood<img alt="half-square bracket" class="imgTextX" src="fdt-emend-r.png"/></span>,
 <span class="italic">Hamlet</span>,
 <span style="white-space:nowrap"><img alt="angle bracket" class="imgTextX" src="fdt-texta-l.png"/>soldier.<img alt="angle bracket" class="imgTextX" src="fdt-texta-r.png"/></span>,
 <span class="italic">Twelfth Night</span>,
 <span class="castName">Viola</span>,
 <span>, a lady of Messaline shipwrecked on the coast of Illyria<br/>
 <span class="alignment indent"> </span>(later disguised as <span class="castName">Cesario</span>)</span>,
 <span 

In [22]:
current_line = ''
current_line_number = ''
line_counter = 0
last_span_with_speaker = False

# We get all cases of lines that have a line number
for span in soup.find_all('span', title=True):
  line_number = span.attrs['title']

  if line_number == current_line_number:
    # we're still on the same line, so let's append the text
    for str in span.stripped_strings:
      if str:
        current_line += ' ' + str

  else:
    # print the last line, since we are now sure that we have collected all
    #  the lines that belong to this line, but first
    # backtrack to find the last speaker
    print('{}\t{}\t{}'.format(current_line_number,
                              last_span_with_speaker.string.strip() if last_span_with_speaker else '',
                              current_line))
    line_counter += 1
    if line_counter > 99:
      break

    # Then populate the variables with new stuff
    last_span_with_speaker = span.find_previous_sibling(
      class_='speaker')
    current_line_number = line_number
    current_line = ''
    for str in span.stripped_strings:
      current_line += str


		
SD 1.1.0		Enter Orsino, Duke of Illyria, Curio, and other Lords, with Musicians playing.
1.1.1	ORSINO	If music be the food of love, play on.
1.1.2	ORSINO	Give me excess of it, that, surfeiting,
1.1.3	ORSINO	The appetite may sicken and so die.
1.1.4	ORSINO	That strain again! It had a dying fall.
1.1.5	ORSINO	O, it came o’er my ear like the sweet sound
1.1.6	ORSINO	That breathes upon a bank of violets,
1.1.7	ORSINO	Stealing and giving odor. Enough; no more.
1.1.8	ORSINO	’Tis not so sweet now as it was before.
1.1.9	ORSINO	O spirit of love, how quick and fresh art thou,
1.1.10	ORSINO	That, notwithstanding thy capacity
1.1.11	ORSINO	Receiveth as the sea, naught enters there,
1.1.12	ORSINO	Of what validity and pitch soe’er,
1.1.13	ORSINO	But falls into abatement and low price
1.1.14	ORSINO	Even in a minute. So full of shapes is fancy
1.1.15	ORSINO	That it alone is high fantastical.
1.1.16	CURIO	Will you go hunt, my lord?
1.1.17	ORSINO	What, Curio?
1.1.18	CURIO	The hart.
1.1.19	ORSINO	Why