# Mission to Mars

### Unit 12 Web Scraping Homework¶
Author: Vivianti Santosa <br>
Date: 2020-08-08 <br>

In this assignment, you will build a web application that scrapes various websites related to the Mission to Mars, save it in MongoDB, and displays the information in a single HTML page.

In [33]:
# depndencies
import pandas as pd
import requests
from splinter import Browser
from bs4 import BeautifulSoup as bs
from pprint import pprint
import time
import re

In [2]:
executable_path = {'executable_path': 'chromedriver.exe'}

## NASA Mars News

https://mars.nasa.gov/news/

Scrape the NASA Mars News Site and collect the latest News Title and Paragraph Text.<br> Assigning the text to variables that can be reference later.

In [3]:
# set browser with url of NASA Mars program
browser = Browser('chrome', **executable_path, headless=False)
url1 = 'https://mars.nasa.gov/news/'
browser.visit(url1)

# set up parser
html = browser.html
soup = bs(html, 'lxml')

In [4]:
# get latest news articles
news_title = soup.find_all('div', class_='content_title')[1].text
news_article = soup.find('div', class_='article_teaser_body').text
date = soup.find('div', class_='list_date').text

In [5]:
# close browser
browser.quit()

In [6]:
# create Mars_News dictionary
Mars_News_Dict = {"NewsTitle": news_title,
                  "NewsArticle":news_article,
                  "Date":date}
pprint(Mars_News_Dict)

{'Date': 'August  6, 2020',
 'NewsArticle': 'Vast areas of the Martian night sky pulse in ultraviolet '
                'light, according to images from NASA’s MAVEN spacecraft. The '
                'results are being used to illuminate complex circulation '
                'patterns in the Martian atmosphere.',
 'NewsTitle': "NASA's MAVEN Observes Martian Night Sky Pulsing in Ultraviolet "
              'Light'}


## JPL Mars Space Images - Featured Image

https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars

Use splinter to navigate the site.<br>
Get the url string for the current Featured Mars Image and assign it to a variable called featured_image_url.<br>
Save the complete url string for this image.

In [7]:
# set browser with url of NASA Mars program
browser = Browser('chrome', **executable_path, headless=False)
url2 = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url2)

# click FULLIMAGE button
FullImageButton = browser.find_by_id('full_image')
FullImageButton.click()

# click the "more info" button
#browser.is_element_present_by_text('more info', wait_time=1)
moreInfoButton = browser.find_link_by_partial_text('more info')
moreInfoButton.click()



In [8]:
# set up parser
html = browser.html
soup = bs(html, 'lxml')

In [9]:
soup.find('img', class_="main_image")['src']

'/spaceimages/images/largesize/PIA17841_hires.jpg'

In [10]:
soup.find('img', class_="main_image")['title']

"NASA's NuSTAR has, for the first time, imaged the radioactive 'guts' of a supernova remnant, the leftover remains of a star that exploded. The NuSTAR data are blue, and show high-energy X-rays."

In [11]:
# get url of the featured image_
featured_image_url = soup.select_one('figure.lede a img').get("src")
title = soup.select_one('figure.lede a img').get("title")

In [12]:
# close browser
browser.quit()

In [13]:
# create dictionary of feature image url
featured_image_url_dict = {"Feature Url": featured_image_url,
                      "Title":title}
pprint(featured_image_url_dict) 

{'Feature Url': '/spaceimages/images/largesize/PIA17841_hires.jpg',
 'Title': "NASA's NuSTAR has, for the first time, imaged the radioactive "
          "'guts' of a supernova remnant, the leftover remains of a star that "
          'exploded. The NuSTAR data are blue, and show high-energy X-rays.'}


In [14]:
# https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA14400_hires.jpg

## Mars Weather

https://twitter.com/marswxreport?lang=en

Scrape the latest Mars weather tweet to obtain Mars weather report. 
Save the tweet text for the weather report as a variable called mars_weather.

In [25]:
# set browser with url of twitter Mars weather
browser = Browser('chrome', **executable_path, headless=False)

url3 = 'https://twitter.com/marswxreport?lang=en'
browser.visit(url3)
time.sleep(5)

# set up parser
html = browser.html
soup = bs(html, 'html.parser')

In [34]:
pattern = re.compile(r'sol')
soup.find("span", text=pattern).text

'InSight sol 605 (2020-08-09) low -92.7ºC (-134.8ºF) high -18.4ºC (-1.1ºF)\nwinds from the WNW at 8.8 m/s (19.7 mph) gusting to 22.5 m/s (50.4 mph)\npressure at 7.90 hPa'

## Mars Facts

https://space-facts.com/mars/

Use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.<br>
Convert the data to a HTML table string.

In [None]:
# get url of the table to be scraped
url4 = 'https://space-facts.com/mars/'

In [None]:
# use pandas to read the tables
tables = pd.read_html(url4)
tables

In [None]:
df= tables[0]
df.columns = ['Features','Values']
df

In [None]:
df.to_html('Mars_Facts.html')

## Mars Hemispheres

https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars

Obtain high resolution images for each of Mar's hemispheres from USGS Astrogeology site.<br>
Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title.<br>

In [None]:
# set browser with url of Mars Hemispheres
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)
url5 = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
browser.visit(url5)

In [None]:
# check list of all link to mars hemispheres pages
links = browser.find_by_css("a.product-item h3")

for link in links:
    print(link.text)

In [None]:
#Create dictionary of mars hemisphere images
HemisphereImg_Dict = []

for i in range(0, len(links)):
    # navigate to the specific page
    link = browser.find_by_css("a.product-item h3")[i]
    print(link.text)
    link.click()                 

    # scrape
    soup = bs(browser.html, 'lxml')   # set up parser 
    image_ulr = soup.find('img', class_="wide-image")['src']   # get image's ulr
    title = soup.find('h2').text                               # get title
    
    # put in dictionary
    image_dict = {"Image_ULR": image_ulr,   
                  "Title":title}            
    HemisphereImg_Dict.append(image_dict)       # append dict to list
    
    # navigate to previous page      
    browser.back()                              

In [None]:
# close browser
browser.quit()

In [None]:
HemisphereImg_Dict