# Mission to Mars

### Unit 12 Web Scraping Homework¶
Author: Vivianti Santosa <br>
Date: 2020-08-08 <br>

In this assignment, you will build a web application that scrapes various websites related to the Mission to Mars, save it in MongoDB, and displays the information in a single HTML page.

In [1]:
# depndencies
import pandas as pd
import requests
from splinter import Browser
from bs4 import BeautifulSoup as bs
from pprint import pprint
import time
import re

## NASA Mars News

https://mars.nasa.gov/news/

Scrape the NASA Mars News Site and collect the latest News Title and Paragraph Text.<br> Assigning the text to variables that can be reference later.

In [2]:
# prepare chromedriver
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

# set browser with url of NASA Mars program
url1 = 'https://mars.nasa.gov/news/'
browser.visit(url1)
time.sleep(5)

# set up parser
html = browser.html
soup = bs(html, 'lxml')

In [3]:
# get latest news articles
news_title = soup.find_all('div', class_='content_title')[1].text
news_article = soup.find('div', class_='article_teaser_body').text
date = soup.find('div', class_='list_date').text

In [4]:
# create Mars_News dictionary
Mars_News_Dict = {"NewsTitle": news_title,
                  "NewsArticle":news_article,
                  "Date":date}
pprint(Mars_News_Dict)

{'Date': 'August  6, 2020',
 'NewsArticle': 'Vast areas of the Martian night sky pulse in ultraviolet '
                'light, according to images from NASA’s MAVEN spacecraft. The '
                'results are being used to illuminate complex circulation '
                'patterns in the Martian atmosphere.',
 'NewsTitle': "NASA's MAVEN Observes Martian Night Sky Pulsing in Ultraviolet "
              'Light'}


In [5]:
# close browser
browser.quit()

## JPL Mars Space Images - Featured Image

https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars

Use splinter to navigate the site.<br>
Get the url string for the current Featured Mars Image and assign it to a variable called featured_image_url.<br>
Save the complete url string for this image.

In [45]:
# prepare chromedriver
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

# run browser with url of NASA Mars program
url2 = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url2)

# click FULLIMAGE button
FullImageButton = browser.find_by_id('full_image')
FullImageButton.click()

# click the "more info" button
#browser.is_element_present_by_text('more info', wait_time=1)
moreInfoButton = browser.find_link_by_partial_text('more info')
moreInfoButton.click()
time.sleep(5)

In [46]:
# set up parser
html = browser.html
soup = bs(html, "html.parser")

In [47]:
soup.find('img', class_="main_image")['src']

'/spaceimages/images/largesize/PIA17832_hires.jpg'

In [48]:
soup.find('img', class_="main_image")['title']

"The six red dots in this composite picture indicate the location of the first new near-Earth asteroid, called 2013 YP139, as seen by NASA's NEOWISE."

In [49]:
# get url of the featured image_
featured_image = soup.select_one('figure.lede a img').get("src")
featured_image_url = f'https://www.jpl.nasa.gov{featured_image}'
title = soup.select_one('figure.lede a img').get("title")

In [50]:
# create dictionary of feature image url
featured_image_url_dict = {"Feature Url": featured_image_url,
                      "Title":title}
pprint(featured_image_url_dict) 

{'Feature Url': 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA17832_hires.jpg',
 'Title': 'The six red dots in this composite picture indicate the location of '
          'the first new near-Earth asteroid, called 2013 YP139, as seen by '
          "NASA's NEOWISE."}


In [51]:
# close browser
browser.quit()

In [13]:
# f'https://www.jpl.nasa.gov{imgRelativeUrl}'
# https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA14400_hires.jpg

## Mars Weather

https://twitter.com/marswxreport?lang=en

Scrape the latest Mars weather tweet to obtain Mars weather report. 
Save the tweet text for the weather report as a variable called mars_weather.

In [14]:
# set browser with url of Twitter
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

url3 = 'https://twitter.com/marswxreport?lang=en'
browser.visit(url3)
time.sleep(5)

# set up parser
html = browser.html
soup = bs(html, 'html.parser')

In [15]:
pattern = re.compile(r'sol')
Weather_Tweet = soup.find("span", text=pattern).text

'InSight sol 607 (2020-08-11) low -93.1ºC (-135.6ºF) high -18.9ºC (-2.1ºF)\nwinds from the WNW at 8.2 m/s (18.4 mph) gusting to 21.4 m/s (47.8 mph)\npressure at 7.90 hPa'

In [16]:
# close browser
browser.quit()

## Mars Facts

https://space-facts.com/mars/

Use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.<br>
Convert the data to a HTML table string.

In [17]:
# get url of the table to be scraped
url4 = 'https://space-facts.com/mars/'

In [18]:
# use pandas to read the tables
tables = pd.read_html(url4)
tables

[                      0                              1
 0  Equatorial Diameter:                       6,792 km
 1       Polar Diameter:                       6,752 km
 2                 Mass:  6.39 × 10^23 kg (0.11 Earths)
 3                Moons:            2 (Phobos & Deimos)
 4       Orbit Distance:       227,943,824 km (1.38 AU)
 5         Orbit Period:           687 days (1.9 years)
 6  Surface Temperature:                   -87 to -5 °C
 7         First Record:              2nd millennium BC
 8          Recorded By:           Egyptian astronomers,
   Mars - Earth Comparison             Mars            Earth
 0               Diameter:         6,779 km        12,742 km
 1                   Mass:  6.39 × 10^23 kg  5.97 × 10^24 kg
 2                  Moons:                2                1
 3      Distance from Sun:   227,943,824 km   149,598,262 km
 4         Length of Year:   687 Earth days      365.24 days
 5            Temperature:    -153 to 20 °C      -88 to 58°C,
           

In [19]:
df= tables[0]
df.columns = ['Features','Values']
df.set_index('Features')

Unnamed: 0_level_0,Values
Features,Unnamed: 1_level_1
Equatorial Diameter:,"6,792 km"
Polar Diameter:,"6,752 km"
Mass:,6.39 × 10^23 kg (0.11 Earths)
Moons:,2 (Phobos & Deimos)
Orbit Distance:,"227,943,824 km (1.38 AU)"
Orbit Period:,687 days (1.9 years)
Surface Temperature:,-87 to -5 °C
First Record:,2nd millennium BC
Recorded By:,Egyptian astronomers


In [20]:
df.to_html('Mars_Facts.html')

## Mars Hemispheres

https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars

Obtain high resolution images for each of Mar's hemispheres from USGS Astrogeology site.<br>
Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys img_url and title.<br>

In [2]:
# set browser with url of Mars Hemispheres
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)
url5 = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
browser.visit(url5)

In [3]:
# check list of all link to mars hemispheres pages
links = browser.find_by_css("a.product-item h3")

for link in links:
    print(link.text)

Cerberus Hemisphere Enhanced
Schiaparelli Hemisphere Enhanced
Syrtis Major Hemisphere Enhanced
Valles Marineris Hemisphere Enhanced


In [4]:
#Create dictionary of mars hemisphere images
HemisphereImg_Dict = []

for i in range(0, len(links)):
    # navigate to the specific page
    link = browser.find_by_css("a.product-item h3")[i]
    print(link.text)
    link.click()
    
    #go to 

    # scrape
    soup = bs(browser.html, 'lxml')   # set up parser 
    image_ulr = browser.find_link_by_text('Sample').first['href']   # get image's ulr
    title = soup.find('h2').text                               # get title
    
    # put in dictionary
    image_dict = {"Image_ULR": image_ulr,   
                  "Title":title}            
    HemisphereImg_Dict.append(image_dict)       # append dict to list
    
    # navigate to previous page      
    browser.back()                              

Cerberus Hemisphere Enhanced




Schiaparelli Hemisphere Enhanced
Syrtis Major Hemisphere Enhanced
Valles Marineris Hemisphere Enhanced


In [5]:
# close browser
browser.quit()

In [6]:
HemisphereImg_Dict

[{'Image_ULR': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg',
  'Title': 'Cerberus Hemisphere Enhanced'},
 {'Image_ULR': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg',
  'Title': 'Schiaparelli Hemisphere Enhanced'},
 {'Image_ULR': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg',
  'Title': 'Syrtis Major Hemisphere Enhanced'},
 {'Image_ULR': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg',
  'Title': 'Valles Marineris Hemisphere Enhanced'}]

In [None]:
#[{'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg',
#'title': 'Cerberus Hemisphere Enhanced'},