## Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.

* Create a Jupyter Notebook file called `mission_to_mars.ipynb` and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.

* Use Splinter to navigate the sites when needed and BeautifulSoup to help find and parse out the necessary data.

* Use Bootstrap to structure your HTML template.

### NASA Mars News

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

```python
# Example:
news_title = "NASA's Next Mars Mission to Investigate Interior of Red Planet"

news_p = "Preparation of NASA's next spacecraft to Mars, InSight, has ramped up this summer, on course for launch next May from Vandenberg Air Force Base in central California -- the first interplanetary launch in history from America's West Coast."
```

In [1]:
# Dependencies
from bs4 import BeautifulSoup
import requests
# Dependencies for 
from splinter import Browser
from splinter.exceptions import ElementDoesNotExist
# Dependencies for pandas table scrapping
import pandas as pd


In [2]:
# URL of page to be scraped
url = 'https://mars.nasa.gov/news/'

In [3]:
# Retrieve page with the requests module
response = requests.get(url)
response

<Response [200]>

In [4]:
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(response.text, 'html.parser')

In [5]:
# Examine the results, then determine element that contains sought info
print(soup.prettify())

<!DOCTYPE html>
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <!-- Always force latest IE rendering engine or request Chrome Frame -->
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
  <script type="text/javascript">
   window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","licenseKey":"5e33925808","applicationID":"59562082","transactionName":"JVcPR0MLWApSRU1eAQVVEhxSC1oSUlkWbBMHXwRAHhdcCUA=","queueTime":0,"applicationTime":272,"agent":""}
  </script>
  <script type="text/javascript">
   (window.NREUM||(NREUM={})).loader_config={xpid:"VQcPUlZTDxAFXVRUBQEPVA=="};window.NREUM||(NREUM={}),__nr_require=function(t,n,e){function r(e){if(!n[e]){var o=n[e]={exports:{}};t[e][0].call(o.exports,function(n){var o=t[e][1][n];return r(o||n)},o,o.exports)}return n[e].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0

In [6]:
# we needs news title and the paragraph
results = soup.find_all(class_= "slide")
results

[<div class="slide">
 <div class="image_and_description_container">
 <a href="/news/8426/nasa-garners-7-webby-award-nominations/">
 <div class="rollover_description">
 <div class="rollover_description_inner">
 Nominees include four JPL projects: the solar system and climate websites, InSight social media, and a 360-degree Earth video. Public voting closes April 18, 2019.
 </div>
 <div class="overlay_arrow">
 <img alt="More" src="/assets/overlay-arrow.png"/>
 </div>
 </div>
 <img alt="NASA Garners 7 Webby Award Nominations" class="img-lazy" data-lazy="/system/news_items/list_view_images/8426_Webby2019-320x240.jpg" src="/assets/loading_320x240.png"/>
 </a>
 </div>
 <div class="content_title">
 <a href="/news/8426/nasa-garners-7-webby-award-nominations/">
 NASA Garners 7 Webby Award Nominations
 </a>
 </div>
 </div>, <div class="slide">
 <div class="image_and_description_container">
 <a href="/news/8413/nasas-opportunity-rover-mission-on-mars-comes-to-end/">
 <div class="rollover_descript

In [7]:
# Loop through returned results and grab title and description
for result in results:
    # Error handling
    try:
        # Obtain the article title
        news_title = result.find('div', class_="content_title").text
        news_p = result.find('div', class_= "rollover_description_inner").text
        
#         # Obtain the article description
#         news_pg = result.find('div', class_ = "article_teaser_body").text

        # Print results only if title, price, and link are available
        if (news_title, news_p):
            print('-------------')
            print(news_title)
            print(news_p)
#             print(news_pg)
    except AttributeError as e:
        print(e)

-------------


NASA Garners 7 Webby Award Nominations



Nominees include four JPL projects: the solar system and climate websites, InSight social media, and a 360-degree Earth video. Public voting closes April 18, 2019.

-------------


NASA's Opportunity Rover Mission on Mars Comes to End



NASA's Opportunity Mars rover mission is complete after 15 years on Mars. Opportunity's record-breaking exploration laid the groundwork for future missions to the Red Planet.

-------------


NASA's InSight Places First Instrument on Mars



In deploying its first instrument onto the surface of Mars, the lander completes a major mission milestone.

-------------


NASA Announces Landing Site for Mars 2020 Rover



After a five-year search, NASA has chosen Jezero Crater as the landing site for its upcoming Mars 2020 rover mission.

-------------


Opportunity Hunkers Down During Dust Storm



It's the beginning of the end for the planet-encircling dust storm on Mars. But it could still be weeks, 

### JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

```python
# Example:
featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16225_hires.jpg'
```

In [8]:
# https://splinter.readthedocs.io/en/latest/drivers/chrome.html
!which chromedriver

/usr/local/bin/chromedriver


In [9]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

In [10]:
url = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
browser.visit(url)

In [11]:
html = browser.html
soup = BeautifulSoup(html, 'html.parser')
imagebar = browser.find_by_tag('img')
src = []
for image in imagebar:
    src.append(image._element.get_attribute('src'))
    print(src)

['https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png']
['https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png']
['https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png']
['https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png', 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23095-640x350.jpg']
['https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png', 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23095-640x350.jpg', 'https://www.jpl.nasa.gov/ass

['https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/logo_nasa_trio_black@2x.png', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png', 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23095-640x350.jpg', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png', 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23135-640x350.jpg', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png', 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23134-640x350.jpg', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png', 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23133-640x350.jpg', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png', 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23094-640x350.jpg', 'https://www.jpl.nasa.gov/assets/images/overlay-arrow.png', 'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23106-640x350.jpg', 'https://www.jpl.nasa.gov/asset

In [12]:
# Get the most recent 
featured_image_url = src[21]
featured_image_url

'https://www.jpl.nasa.gov/spaceimages/images/wallpaper/PIA23093-640x350.jpg'

### Mars Weather

* Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called `mars_weather`.

```python
# Example:
mars_weather = 'Sol 1801 (Aug 30, 2017), Sunny, high -21C/-5F, low -80C/-112F, pressure at 8.82 hPa, daylight 06:09-17:55'
```



In [13]:
# URL of page to be scraped
url = 'https://twitter.com/marswxreport?lang=en'

In [14]:
# Retrieve page with the requests module
response = requests.get(url)
response

<Response [200]>

In [15]:
# Create BeautifulSoup object; parse with 'html.parser'
soup = BeautifulSoup(response.text, 'html.parser')

In [16]:
# Examine the results, then determine element that contains sought info
print(soup.prettify())

<!DOCTYPE html>
<html data-scribe-reduced-action-queue="true" lang="en">
 <head>
  <meta charset="utf-8"/>
  <script nonce="/J5BllAM45SEb5ZawDA/dQ==">
   !function(){window.initErrorstack||(window.initErrorstack=[]),window.onerror=function(r,i,n,o,t){r.indexOf("Script error.")>-1||window.initErrorstack.push({errorMsg:r,url:i,lineNumber:n,column:o,errorObj:t})}}();
  </script>
  <script id="bouncer_terminate_iframe" nonce="/J5BllAM45SEb5ZawDA/dQ==">
   if (window.top != window) {
  window.top.postMessage({'bouncer': true, 'event': 'complete'}, '*');
}
  </script>
  <script id="swift_action_queue" nonce="/J5BllAM45SEb5ZawDA/dQ==">
   !function(){function e(e){if(e||(e=window.event),!e)return!1;if(e.timestamp=(new Date).getTime(),!e.target&&e.srcElement&&(e.target=e.srcElement),document.documentElement.getAttribute("data-scribe-reduced-action-queue"))for(var t=e.target;t&&t!=document.body;){if("A"==t.tagName)return;t=t.parentNode}return i("all",o(e)),a(e)?(document.addEventListener||(e=o(

In [17]:
# results are returned as an iterable list (get the latest tweet from the page)
twitter_results = soup.find_all('div', class_="content")
twitter_results

[<div class="content">
 <div class="stream-item-header">
 <a class="account-group js-account-group js-action-profile js-user-profile-link js-nav" data-user-id="786939553" href="/MarsWxReport">
 <img alt="" class="avatar js-action-profile-avatar" src="https://pbs.twimg.com/profile_images/2552209293/220px-Mars_atmosphere_bigger.jpg"/>
 <span class="FullNameGroup">
 <strong class="fullname show-popup-with-id u-textTruncate " data-aria-label-part="">Mars Weather</strong><span>‏</span><span class="UserBadges"></span><span class="UserNameBreak"> </span></span><span class="username u-dir u-textTruncate" data-aria-label-part="" dir="ltr">@<b>MarsWxReport</b></span></a>
 <small class="time">
 <a class="tweet-timestamp js-permalink js-nav js-tooltip" data-conversation-id="1114714732492218369" href="/MarsWxReport/status/1114714732492218369" title="7:21 PM - 6 Apr 2019"><span aria-hidden="true" class="_timestamp js-short-timestamp js-relative-timestamp" data-long-form="true" data-time="1554603677"

In [18]:
# Loop through returned results and grab the text for all tweets in the content section
mars_weather_data = []
for result in twitter_results:
    # Error handling
    try:
        # Obtain the article title
        tweet = result.find('p', class_="TweetTextSize TweetTextSize--normal js-tweet-text tweet-text").text
        mars_weather_data.append(tweet)

        # Print results only if title, price, and link are available
        if (tweet):
            print('-------------')
            print(tweet)
    except AttributeError as e:
        print(e)

-------------
InSight sol 127 (2019-04-05) low -96.6ºC (-141.9ºF) high -16.8ºC (1.8ºF)
winds from the SW at 4.2 m/s (9.3 mph) gusting to 11.2 m/s (25.0 mph)
pressure at 7.30 hPapic.twitter.com/wky4Uf2fyY
-------------
It’s time!

#ExploreJPL tickets are AVAILABLE NOW. Tickets are free, but limited, for our annual public event, this year on May 18-19: https://explore.jpl.nasa.gov/ pic.twitter.com/sDqzFKqOc7
-------------
InSight sol 126 (2019-04-04) low -97.0ºC (-142.7ºF) high -17.0ºC (1.3ºF)
winds from the SW at 4.0 m/s (8.8 mph) gusting to 10.7 m/s (23.9 mph)
pressure at 7.30 hPapic.twitter.com/yIkUgwaoIc
-------------
InSight sol 125 (2019-04-03) low -97.2ºC (-143.0ºF) high -16.8ºC (1.7ºF)
winds from the SW at 4.0 m/s (8.9 mph) gusting to 11.7 m/s (26.2 mph)
pressure at 7.30 hPapic.twitter.com/ht1lEraC6M
-------------
The @MarsCuriosity captured the Martian moon Phobos as it transits the Sun. 

https://www.nasa.gov/feature/jpl/curiosity-captured-two-solar-eclipses-on-mars/ …pic.twitt

In [19]:
# Get the most recent weather update
mars_weather = mars_weather_data[0]
mars_weather

'InSight sol 127 (2019-04-05) low -96.6ºC (-141.9ºF) high -16.8ºC (1.8ºF)\nwinds from the SW at 4.2 m/s (9.3 mph) gusting to 11.2 m/s (25.0 mph)\npressure at 7.30 hPapic.twitter.com/wky4Uf2fyY'

### Mars Facts

* Visit the Mars Facts webpage [here](https://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

* Use Pandas to convert the data to a HTML table string.

In [20]:
url = 'https://space-facts.com/mars/'

In [21]:
tables = pd.read_html(url)
tables

[                      0                              1
 0  Equatorial Diameter:                       6,792 km
 1       Polar Diameter:                       6,752 km
 2                 Mass:  6.42 x 10^23 kg (10.7% Earth)
 3                Moons:            2 (Phobos & Deimos)
 4       Orbit Distance:       227,943,824 km (1.52 AU)
 5         Orbit Period:           687 days (1.9 years)
 6  Surface Temperature:                  -153 to 20 °C
 7         First Record:              2nd millennium BC
 8          Recorded By:           Egyptian astronomers]

### Mars Hemispheres

* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

```python
# Example:
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]
```

- - -

In [22]:
# https://splinter.readthedocs.io/en/latest/drivers/chrome.html
!which chromedriver

/usr/local/bin/chromedriver


### This step will prepare a list of URL for the 4 hemispheres

In [33]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

# URL of page to be scraped
url = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'
# Opens url in browser
browser.visit(url)

html = browser.html
soup = BeautifulSoup(html, 'html.parser')
xpath = '//*[@id="product-section"]/div[2]'
high_res_locator = []
elems = browser.find_by_xpath(xpath).find_by_tag("a")
for e in elems:
    if e["href"] not in high_res_locator:
        high_res_locator.append(e["href"])

print(high_res_locator)

['https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced', 'https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced', 'https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced', 'https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced']


### Prototype code to get the URL for high res picture

In [34]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

# Iterate through the first link in high resolution url list
url = high_res_locator[0]

# Opens url in browser
browser.visit(url)
html = browser.html
soup = BeautifulSoup(html, 'html.parser')
xpath = '//*[@id="wide-image"]/div/ul/li[2]/a'

# Initialized empty list for high resolution urls (hrefs)
high_def_urls = []

# Find new tag
elems = browser.find_by_xpath(xpath)
for e in elems:
    if e["href"] not in high_def_urls:
        high_def_urls.append(e["href"])
    print(e["href"])
    print(high_def_urls)


http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif
['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif']


### Confirm that for loop iterates through each browser

In [37]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

# Iterate through the all the link in high resolution url list
for i in high_res_locator:
    browser.visit(i)
    print(i)

https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced


In [38]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=False)

# Initialized empty list for high resolution urls (hrefs)
high_def_urls = []

# Iterate through the all the link in high resolution url list
for i in high_res_locator:
    browser.visit(i)
    html = browser.html
    soup = BeautifulSoup(html, 'html.parser')
    xpath = '//*[@id="wide-image"]/div/ul/li[2]/a'
    elems = browser.find_by_xpath(xpath)
    for e in elems:
        if e["href"] not in high_def_urls:
            high_def_urls.append(e["href"])
        print(e["href"])

http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif
['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif']
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif
['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif', 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif']
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif
['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif', 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif', 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif']
http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif
['http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif', 'http://astropedia.astrogeology.us

In [44]:
count = 0
for e in high_def_urls:
    print(f"link number: {count}, url: {e}") 
    count += 1

link number: 0, url: http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif
link number: 1, url: http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif
link number: 2, url: http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif
link number: 3, url: http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif


In [45]:
cer_hem = high_def_urls[0]
sch_hem = high_def_urls[1]
syr_m_hem = high_def_urls[2]
valles_m_hem = high_def_urls[3]

In [48]:
# Loop through list and store title (key) : url (img)
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": valles_m_hem},
    {"title": "Cerberus Hemisphere", "img_url": cer_hem},
    {"title": "Schiaparelli Hemisphere", "img_url": sch_hem},
    {"title": "Syrtis Major Hemisphere", "img_url": syr_m_hem},
]


[{'title': 'Valles Marineris Hemisphere',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif'},
 {'title': 'Cerberus Hemisphere',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif'},
 {'title': 'Schiaparelli Hemisphere',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif'},
 {'title': 'Syrtis Major Hemisphere',
  'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif'}]

## Step 2 - MongoDB and Flask Application

Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.

* Start by converting your Jupyter notebook into a Python script called `scrape_mars.py` with a function called `scrape` that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.

* Next, create a route called `/scrape` that will import your `scrape_mars.py` script and call your `scrape` function.

  * Store the return value in Mongo as a Python dictionary.

* Create a root route `/` that will query your Mongo database and pass the mars data into an HTML template to display the data.

* Create a template HTML file called `index.html` that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.

![final_app_part1.png](Images/final_app_part1.png)
![final_app_part2.png](Images/final_app_part2.png)

* Use Pymongo for CRUD applications for your database. For this homework, you can simply overwrite the existing document each time the `/scrape` url is visited and new data is obtained.