# Web Scraping with BeautifulSoup

Objective: Web scrapping weather data from National Weather Service  
Website: http://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168

## Request HTML Page

In [1]:
import requests
page = requests.get("http://forecast.weather.gov/MapClick.php?lat=37.7772&lon=-122.4168")
print(page.status_code)
page.content

200




## Parsing Page with Beautifulsoup

In [2]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
#Find the div with id seven-day-forecast, and assign to seven_day

forecast_items = seven_day.find_all(class_="tombstone-container")
#Inside seven_day, find each individual forecast item

tonight = forecast_items[0]
print(tonight.prettify())
#Extract and print the first forecast item

<div class="tombstone-container">
 <p class="period-name">
  Tonight
  <br/>
  <br/>
 </p>
 <p>
  <img alt="Tonight: Partly cloudy, with a low around 45. North northwest wind 9 to 16 mph, with gusts as high as 21 mph. " class="forecast-icon" src="newimages/medium/nsct.png" title="Tonight: Partly cloudy, with a low around 45. North northwest wind 9 to 16 mph, with gusts as high as 21 mph. "/>
 </p>
 <p class="short-desc">
  Partly Cloudy
 </p>
 <p class="temp temp-low">
  Low: 45 °F
 </p>
</div>


## Extracting Single Information

In [3]:
period = tonight.find(class_="period-name").get_text()
short_desc = tonight.find(class_="short-desc").get_text()
temp = tonight.find(class_="temp").get_text()

print(period)
print(short_desc)
print(temp)

Tonight
Partly Cloudy
Low: 45 °F


In [4]:
img = tonight.find("img")
desc = img['title']

print(desc)

Tonight: Partly cloudy, with a low around 45. North northwest wind 9 to 16 mph, with gusts as high as 21 mph. 


## Extracting All Information

In [6]:
periods = [pt.get_text() for pt in seven_day.select(".tombstone-container .period-name")]
short_descs = [sd.get_text() for sd in seven_day.select(".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(".tombstone-container img")]

print(periods)
print(short_descs)
print(temps)
print(descs)

['Tonight', 'Thursday', 'ThursdayNight', 'Friday', 'FridayNight', 'Saturday', 'SaturdayNight', 'Sunday', 'SundayNight']
['Partly Cloudy', 'Sunny andBreezy', 'Clear', 'Sunny', 'Mostly Clear', 'Partly Sunny', 'Slight ChanceRain', 'Chance Rain', 'Chance Rain']
['Low: 45 °F', 'High: 56 °F', 'Low: 43 °F', 'High: 55 °F', 'Low: 45 °F', 'High: 56 °F', 'Low: 48 °F', 'High: 57 °F', 'Low: 51 °F']
['Tonight: Partly cloudy, with a low around 45. North northwest wind 9 to 16 mph, with gusts as high as 21 mph. ', 'Thursday: Sunny, with a high near 56. Breezy, with a north wind 16 to 22 mph, with gusts as high as 28 mph. ', 'Thursday Night: Clear, with a low around 43. North wind 18 to 20 mph, with gusts as high as 25 mph. ', 'Friday: Sunny, with a high near 55. East northeast wind 5 to 11 mph becoming west in the afternoon. ', 'Friday Night: Mostly clear, with a low around 45. West southwest wind 3 to 7 mph. ', 'Saturday: Partly sunny, with a high near 56.', 'Saturday Night: A 20 percent chance of ra

## Combine to Pandas Dataframe

In [7]:
import pandas as pd
weather = pd.DataFrame({
        "period": periods, 
        "short_desc": short_descs, 
        "temp": temps, 
        "desc":descs
    })
weather

Unnamed: 0,period,short_desc,temp,desc
0,Tonight,Partly Cloudy,Low: 45 °F,"Tonight: Partly cloudy, with a low around 45. ..."
1,Thursday,Sunny andBreezy,High: 56 °F,"Thursday: Sunny, with a high near 56. Breezy, ..."
2,ThursdayNight,Clear,Low: 43 °F,"Thursday Night: Clear, with a low around 43. N..."
3,Friday,Sunny,High: 55 °F,"Friday: Sunny, with a high near 55. East north..."
4,FridayNight,Mostly Clear,Low: 45 °F,"Friday Night: Mostly clear, with a low around ..."
5,Saturday,Partly Sunny,High: 56 °F,"Saturday: Partly sunny, with a high near 56."
6,SaturdayNight,Slight ChanceRain,Low: 48 °F,Saturday Night: A 20 percent chance of rain. ...
7,Sunday,Chance Rain,High: 57 °F,"Sunday: A chance of rain. Partly sunny, with ..."
8,SundayNight,Chance Rain,Low: 51 °F,Sunday Night: A chance of rain. Mostly cloudy...


## Analysis (Temperature)

In [8]:
temp_nums = weather["temp"].str.extract("(?P<temp_num>\d+)", expand=False)
weather["temp_num"] = temp_nums.astype('int')
temp_nums

0    45
1    56
2    43
3    55
4    45
5    56
6    48
7    57
8    51
Name: temp_num, dtype: object

In [16]:
round(weather["temp_num"].mean(),4)

50.6667

## Analysis (Temp Low = Night)

In [12]:
is_night = weather["temp"].str.contains("Low")
weather["is_night"] = is_night
weather[is_night]

Unnamed: 0,period,short_desc,temp,desc,temp_num,is_night
0,Tonight,Partly Cloudy,Low: 45 °F,"Tonight: Partly cloudy, with a low around 45. ...",45,True
2,ThursdayNight,Clear,Low: 43 °F,"Thursday Night: Clear, with a low around 43. N...",43,True
4,FridayNight,Mostly Clear,Low: 45 °F,"Friday Night: Mostly clear, with a low around ...",45,True
6,SaturdayNight,Slight ChanceRain,Low: 48 °F,Saturday Night: A 20 percent chance of rain. ...,48,True
8,SundayNight,Chance Rain,Low: 51 °F,Sunday Night: A chance of rain. Mostly cloudy...,51,True


## Analysis (Temp High = Day)

In [14]:
is_day = weather["temp"].str.contains("High")
weather["is_day"] = is_day
weather[is_day]

Unnamed: 0,period,short_desc,temp,desc,temp_num,is_night,is_day
1,Thursday,Sunny andBreezy,High: 56 °F,"Thursday: Sunny, with a high near 56. Breezy, ...",56,False,True
3,Friday,Sunny,High: 55 °F,"Friday: Sunny, with a high near 55. East north...",55,False,True
5,Saturday,Partly Sunny,High: 56 °F,"Saturday: Partly sunny, with a high near 56.",56,False,True
7,Sunday,Chance Rain,High: 57 °F,"Sunday: A chance of rain. Partly sunny, with ...",57,False,True
