# Level 3 - While loops

## `while` loops!

Python's `while` loops can be very useful in web scraping.
They allow us to continue down paths, following links, until a certain condition is met.
Once that condition is met, we can stop web scraping and move on to the next step.

The `condition` can really be anything you want it to be.
Generally, you will continue until an element can't be found.

Essentially: `while` element `x` exists, scrape the website and then move to the next.
Once element `x` ceases to exist, exit out of the loop!

To help practice this concept, we can use this function `generate_sequence` to simulate elements existing on a website!
`generate_sequence` is a generator that will yield `True` a random number of times, and then yield `False` indefinitely.

In [1]:
from utils import generate_sequence

In [2]:
sequence = generate_sequence()

print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))
print(next(sequence))

True
True
True
True
True
True
True
True
False
False
False
False
False
False


What if we **never** want to print any values that are `False`? 
Maybe we want to print `True` for every `True` element in the sequence, and then stop printing all together - break out of the loop.

In [3]:
sequence = generate_sequence()

while next(sequence):
    print(True)

True
True
True
True


Okay..
That is great! 
But, how can we apply this knowledge towards what we know about websites?

Well.. 
To help practice this concept, we can use this function `generate_html` to simulate consistently fetching HTML!
`generate_html` is another generator that will yield a paragraph with text a random number of times, and then yield a paragraph _without text_ indefinitely.

In [4]:
from utils import generate_html

In [5]:
html_elements = generate_html()

print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))
print(next(html_elements))

<p>This contains text!</p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>


Okay, now let's try to do the same thing as we did with `generate_sequence`!

`While` the next HTML element contains text within the paragraph, print it!
Once there is no more text within the paragraph..
Break out of the loop!

In [6]:
from bs4 import BeautifulSoup

In [7]:
soup = BeautifulSoup('<p>Test</p>',"html.parser")

In [8]:
soup.text

'Test'

In [11]:
# code here
def paragraph_contains_text(paragraph):
    return True if paragraph.text else False
html_elements = generate_html()
while True:
    html_element = next(html_elements)
    soup = BeautifulSoup(html_element, "html.parser")
    if not paragraph_contains_text(soup.find("p")):
        break
    print(soup)

<p>This contains text!</p>
<p>This contains text!</p>
<p>This contains text!</p>
<p>This contains text!</p>
<p>This contains text!</p>
<p>This contains text!</p>
<p>This contains text!</p>
<p>This contains text!</p>
<p>This contains text!</p>
<p>This contains text!</p>


In [12]:
paragraph_contains_text(soup)

False