# The Python Mega Course: Build 10 Real World Applications
---

This notebook is a summary of [The Python Mega Course: Build 10 Real World Applciations](https://www.udemy.com/the-python-mega-course), a comprehensive online Python course taught by Ardit Sulce. Each lecture name is clickable and takes you to the video lecture in the course.

# Section 19: Application 7: Scrape Real Estate Property Data from the Web
***

**Lecture:** [Program Demonstration](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/9439078?start=0)
---

This video lecture shows the finished version of the website running on a browser.

**Lecture:** [Loading the Webpage in Python](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/9439078?start=0)
---

This code loads the webpage source code into Python ready for extracting information from it.

In [None]:
import requests, re
from bs4 import BeautifulSoup

r = requests.get("http://www.pythonhow.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/")
c = r.content

soup=BeautifulSoup(c, "html.parser")
print(soup.prettify())

**Lecture:** [Extracting "div" Tags](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/9439078?start=0)
---

We start extracting HTML tags starting from `div` tags.

In [None]:
import requests, re
from bs4 import BeautifulSoup

r = requests.get("http://www.pythonhow.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/")
c = r.content

soup = BeautifulSoup(c,"html.parser")

all = soup.find_all("div", {"class":"propertyRow"})
all[0].find("h4", {"class":"propPrice"}).text.replace("\n", "").replace(" ", "")

**Lecture:** [Extracting Addresses and Property Details](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/9439078?start=0)
---

Most of the data are stored inside `span` tags so we extract those data in this code.

In [None]:
import requests, re
from bs4 import BeautifulSoup

r = requests.get("http://www.pythonhow.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/")
c = r.content

soup = BeautifulSoup(c,"html.parser")

all = soup.find_all("div", {"class":"propertyRow"})
all[0].find("h4", {"class":"propPrice"}).text.replace("\n", "").replace(" ", "")

In [None]:
for item in all:
    print(item.find("h4", {"class", "propPrice"}).text.replace("\n","").replace(" ", ""))
    print(item.find_all("span", {"class","propAddressCollapse"})[0].text)
    print(item.find_all("span", {"class","propAddressCollapse"})[1].text)

    try:
        print(item.find("span", {"class", "infoBed"}).find("b").text)
    except:
        print(None)

    try:
        print(item.find("span", {"class", "infoSqFt"}).find("b").text)
    except:
        print(None)

    try:
        print(item.find("span", {"class", "infoValueFullBath"}).find("b").text)
    except:
        print(None)

    try:
        print(item.find("span", {"class", "infoValueHalfBath"}).find("b").text)
    except:
        print(None)
        
    print(" ")

**Lecture:** [Extracting Elements without Unique Identifiers](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/9439078?start=0)
---

Here we extract some more elements.

In [None]:
for item in all:
    print(item.find("h4", {"class", "propPrice"}).text.replace("\n","").replace(" ", ""))
    print(item.find_all("span", {"class","propAddressCollapse"})[0].text)
    print(item.find_all("span", {"class","propAddressCollapse"})[1].text)

    try:
        print(item.find("span", {"class", "infoBed"}).find("b").text)
    except:
        print(None)

    try:
        print(item.find("span", {"class", "infoSqFt"}).find("b").text)
    except:
        print(None)

    try:
        print(item.find("span", {"class", "infoValueFullBath"}).find("b").text)
    except:
        print(None)

    try:
        print(item.find("span", {"class", "infoValueHalfBath"}).find("b").text)
    except:
        print(None)
        
    for column_group in item.find_all("div", {"class":"columnGroup"}):
        for feature_group, feature_name in zip(column_group.find_all("span", {"class":"featureGroup"}), column_group.find_all("span", {"class":"featureName"})):
            if "Lot Size" in feature_group.text:
                print(feature_name.text)

    print(" ")

**Lecture:** [Saving the Extracted Data in CSV Files](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/9439078?start=0)
---

Finally, we save the extracted data into a CSV file.

In [None]:
import requests, re
from bs4 import BeautifulSoup

r = requests.get("http://www.pythonhow.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/")
c = r.content

soup = BeautifulSoup(c,"html.parser")

all = soup.find_all("div",{"class":"propertyRow"})

all[0].find("h4", {"class":"propPrice"}).text.replace("\n", "").replace(" ", "")

In [None]:
l = []
for item in all:
    d = {}
    df["Address"] = item.find_all("span", {"class", "propAddressCollapse"})[0].text
    df["Locality"] = item.find_all("span", {"class", "propAddressCollapse"})[1].text
    df["Price"] = item.find("h4", {"class", "propPrice"}).text.replace("\n","").replace(" ", "")
    
    try:
        d["Beds"] = item.find("span", {"class", "infoBed"}).find("b").text
    except:
        d["Beds"] = None

    try:
        d["Area"] = item.find("span", {"class", "infoSqFt"}).find("b").text
    except:
        d["Area"] = None

    try:
        d["Full Baths"] = item.find("span", {"class", "infoValueFullBath"}).find("b").text
    except:
        d["Full Baths"] = None

    try:
        d["Half Baths"] = item.find("span", {"class", "infoValueHalfBath"}).find("b").text
    except:
        d["Half Baths"] = None

    for column_group in item.find_all("div", {"class":"columnGroup"}):
        for feature_group, feature_name in zip(column_group.find_all("span", {"class":"featureGroup"}), column_group.find_all("span", {"class":"featureName"})):
            if "Lot Size" in feature_group.text:
                print(feature_name.text)
                d["Lot Size"] = feature_name.text
    l.append(d)

In [None]:
import pandas
df = pandas.DataFrame(l)
df

In [None]:
df.to_csv("Output.csv")

**Lecture:** [Crawling Through Webpages](https://www.udemy.com/the-python-mega-course/learn/v4/t/lecture/9439078?start=0)
---

In case you need to extract data from multiple pages, here is how to do it.

In [None]:
import requests, re
from bs4 import BeautifulSoup

r = requests.get("http://www.pythonhow.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/")
c = r.content

soup = BeautifulSoup(c,"html.parser")

all = soup.find_all("div",{"class":"propertyRow"})

all[0].find("h4", {"class":"propPrice"}).text.replace("\n", "").replace(" ", "")

page_nr = soup.find_all("a",{"class":"Page"})[-1].text
print(page_nr, "number of pages were found")

l = []
base_url = "http://www.pythonhow.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/t=0&s="
for page in range(0, int(page_nr)*10, 10):
    print( )
    r = requests.get(base_url + str(page) + ".html")
    c = r.content
    soup = BeautifulSoup(c, "html.parser")
    all = soup.find_all("div", {"class":"propertyRow"})
    for item in all:
        d = {}
        d["Address"] = item.find_all("span", {"class","propAddressCollapse"})[0].text
        
        try:
            d["Locality"] = item.find_all("span",{"class","propAddressCollapse"})[1].text
        except:
            d["Locality"] = None
        d["Price"] = item.find("h4", {"class", "propPrice"}).text.replace("\n","").replace(" ", "")
        
        try:
            d["Beds"] = item.find("span", {"class", "infoBed"}).find("b").text
        except:
            d["Beds"] = None

        try:
            d["Area"] = item.find("span", {"class", "infoSqFt"}).find("b").text
        except:
            d["Area"] = None
    
        try:
            d["Full Baths"] = item.find("span", {"class", "infoValueFullBath"}).find("b").text
        except:
            d["Full Baths"] = None

        try:
            d["Half Baths"] = item.find("span", {"class", "infoValueHalfBath"}).find("b").text
        except:
            d["Half Baths"] = None
        
        for column_group in item.find_all("div", {"class":"columnGroup"}):
            for feature_group, feature_name in zip(column_group.find_all("span", {"class":"featureGroup"}), column_group.find_all("span", {"class":"featureName"})):
                if "Lot Size" in feature_group.text:
                    print(feature_name.text)
                    d["Lot Size"] = feature_name.text
        l.append(d)

**Lecture:** [Final Code of Application 7]()
---

This is the final code. It accesses a webpage and it extracts data from that webpage and save those data in a CSV file.

**Note**: You need internet connection for the code to work.

In [None]:
import requests, re
from bs4 import BeautifulSoup

r = requests.get("http://www.pythonhow.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/")
c = r.content

soup = BeautifulSoup(c,"html.parser")

all = soup.find_all("div",{"class":"propertyRow"})

all[0].find("h4", {"class":"propPrice"}).text.replace("\n", "").replace(" ", "")

page_nr = soup.find_all("a",{"class":"Page"})[-1].text
print(page_nr, "number of pages were found")

l = []
base_url = "http://www.pythonhow.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/t=0&s="
for page in range(0, int(page_nr)*10, 10):
    print( )
    r = requests.get(base_url + str(page) + ".html")
    c = r.content
    soup = BeautifulSoup(c, "html.parser")
    all = soup.find_all("div", {"class":"propertyRow"})
    for item in all:
        d = {}
        d["Address"] = item.find_all("span", {"class","propAddressCollapse"})[0].text
        
        try:
            d["Locality"] = item.find_all("span",{"class","propAddressCollapse"})[1].text
        except:
            d["Locality"] = None
        d["Price"] = item.find("h4", {"class", "propPrice"}).text.replace("\n","").replace(" ", "")
        
        try:
            d["Beds"] = item.find("span", {"class", "infoBed"}).find("b").text
        except:
            d["Beds"] = None

        try:
            d["Area"] = item.find("span", {"class", "infoSqFt"}).find("b").text
        except:
            d["Area"] = None
    
        try:
            d["Full Baths"] = item.find("span", {"class", "infoValueFullBath"}).find("b").text
        except:
            d["Full Baths"] = None

        try:
            d["Half Baths"] = item.find("span", {"class", "infoValueHalfBath"}).find("b").text
        except:
            d["Half Baths"] = None
        
        for column_group in item.find_all("div", {"class":"columnGroup"}):
            for feature_group, feature_name in zip(column_group.find_all("span", {"class":"featureGroup"}), column_group.find_all("span", {"class":"featureName"})):
                if "Lot Size" in feature_group.text:
                    print(feature_name.text)
                    d["Lot Size"] = feature_name.text
        l.append(d)