In [1]:
# mission_to_mars.ipynb

In [2]:
from splinter import Browser
from bs4 import BeautifulSoup
import pandas as pd
import datetime as dt
import time

#### * Use Splinter to navigate the sites when needed and BeautifulSoup to help find and parse out the necessary data.

In [3]:
# https://splinter.readthedocs.io/en/latest/drivers/chrome.html
!which chromedriver

/usr/local/bin/chromedriver


In [4]:
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
# For development, set headless=False, but for project change headless=True
browser = Browser('chrome', **executable_path, headless=False)

In [5]:
# First Url to visit 
url = 'http://mars.nasa.gov/news/'
browser.visit(url)

In [6]:
# Convert current page into html object
html = browser.html
# Parse with BeautifulSoup into object: news_soup
news_soup = BeautifulSoup(html, "html.parser")

In [7]:
# Shows whole HTML page - everything... need to fine tune
# news_soup


#### * Scrape the [NASA Mars News Site]  https://mars.nasa.gov/news/ 
#### and collect the latest News Title and Paragraph Text. 
#### Assign the text to variables that you can reference later.


In [8]:
# Find div with content_title
news_soup.find('div', class_="content_title")

<div class="content_title"><a href="/news/8360/six-things-about-opportunitys-recovery-efforts/" target="_self">Six Things About Opportunity's Recovery Efforts</a></div>

In [9]:
# Get text from div: two options 
# news_soup.find('div',class_="content_title").text
# or
news_soup.find('div',class_="content_title").get_text()

"Six Things About Opportunity's Recovery Efforts"

In [10]:
# Assign variable from website inspect showing article title under <div class = "content_title">
news_title = news_soup.find('div',class_="content_title").get_text()
time.sleep(2)

In [11]:
print(news_title)

Six Things About Opportunity's Recovery Efforts


In [12]:
# Get paragraph from this:
# <div class="article_teaser_body">The global dust storm on Mars could soon let in enough sunlight for the Opportunity rover to recharge.</div>
news_p = news_soup.find('div',class_="article_teaser_body").text
time.sleep(2)

In [13]:
print(news_p)

The global dust storm on Mars could soon let in enough sunlight for the Opportunity rover to recharge.


### JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

```python
# Example:
featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16225_hires.jpg'
```

In [14]:
# Image - 1st step main page
url = "https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars"
browser.visit(url)

In [15]:
# Grabbed this from website for FULL IMAGE  ( click FULL IMAGE button, )
# <a class="button fancybox" data-description="Dione hangs in front of Saturn and its icy rings in this view, captured during Cassini's final close flyby of the icy moon. North on Dione is up." data-fancybox-group="images" data-fancybox-href="/spaceimages/images/mediumsize/PIA17200_ip.jpg" data-link="/spaceimages/details.php?id=PIA17200" data-title="Dione with Rings and Shadows" id="full_image">
# 					FULL IMAGE
# 				  </a>

In [16]:
# Image - 2nd step to click 'FULL IMAGE' button to advance to next page
browser.find_by_id("full_image").click()
time.sleep(2) # wait 2 seconds to avoid being rejected

In [17]:
#Grabbed this... to use 'MORE INFO' button
# <a class="button" href="/spaceimages/details.php?id=PIA17200 " target="_top">more info     </a>

In [18]:
# # Image - 3rd step to click 'MORE INFO' button
browser.find_link_by_partial_text("more info").click()
time.sleep(2)

In [19]:
# Grabbed this element for the link to full image in .jpg
# <img alt="Dione hangs in front of Saturn and its icy rings in this view, captured during Cassini's final close flyby of the icy moon. North on Dione is up." title="Dione hangs in front of Saturn and its icy rings in this view, captured during Cassini's final close flyby of the icy moon. North on Dione is up." class="main_image" 
# src="/spaceimages/images/largesize/PIA17200_hires.jpg">

In [20]:
# Browser in now on the desired page for the image:
html = browser.html
featured_image_soup = BeautifulSoup(html,"html.parser")
time.sleep(2)

In [21]:
# Get image with the figure tag and class="lede"
featured_image = featured_image_soup.find("figure", class_="lede")

In [22]:
# returns image endpoint 
# use either option to obtain endpont
# featured_image.find("a")["href"]
featured_image.a["href"]

'/spaceimages/images/largesize/PIA19041_hires.jpg'

In [23]:
# Combine for complete URL for image
featured_img_url = url + featured_image.a["href"]

print(featured_img_url)

https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars/spaceimages/images/largesize/PIA19041_hires.jpg


#### Mars Weather

* Visit the Mars Weather twitter account https://twitter.com/marswxreport?lang=en and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called `mars_weather`.

```python
# Example:
mars_weather = 'Sol 1801 (Aug 30, 2017), Sunny, high -21C/-5F, low -80C/-112F, pressure at 8.82 hPa, daylight 06:09-17:55'
```

In [24]:
url = "https://twitter.com/marswxreport?lang=en"
browser.visit(url)
time.sleep(2)

In [25]:
html = browser.html
weather_soup = BeautifulSoup(html, "html.parser")
time.sleep(2)
# weather_soup

In [26]:
weather_soup.find("p", class_="TweetTextSize").text

'Sol 2142 (2018-08-15), high -10C/14F, low -71C/-95F, pressure at 8.65 hPa, daylight 05:28-17:41'

In [27]:
mars_weather = weather_soup.find("p", class_="TweetTextSize").text

In [28]:
mars_weather

'Sol 2142 (2018-08-15), high -10C/14F, low -71C/-95F, pressure at 8.65 hPa, daylight 05:28-17:41'

#### Mars Facts

* Visit the Mars Facts webpage http://space-facts.com/mars/ and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

* Use Pandas to convert the data to a HTML table string.


In [29]:
# Use Pandas to scrape the table containing facts about the planet 
url = "http://space-facts.com/mars/"
df_tables = pd.read_html(url)

In [30]:
# Table 0 has planet facts
df = df_tables[0]

In [31]:
# Change column names
df.columns = ['Attribute','Value']
df.set_index("Attribute", inplace=True)

In [32]:
df

In [33]:
# Use Pandas to convert the data to a HTML table string.
df.to_html(classes="table table-striped")
mars_facts_table = df.to_html(classes="table table-striped")

In [34]:
# Check HTML
mars_facts_table

'<table border="1" class="dataframe table table-striped">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Value</th>\n    </tr>\n    <tr>\n      <th>Attribute</th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>Equatorial Diameter:</th>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <th>Polar Diameter:</th>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <th>Mass:</th>\n      <td>6.42 x 10^23 kg (10.7% Earth)</td>\n    </tr>\n    <tr>\n      <th>Moons:</th>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <th>Orbit Distance:</th>\n      <td>227,943,824 km (1.52 AU)</td>\n    </tr>\n    <tr>\n      <th>Orbit Period:</th>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <th>Surface Temperature:</th>\n      <td>-153 to 20 °C</td>\n    </tr>\n    <tr>\n      <th>First Record:</th>\n      <td>2nd millennium BC</td>\n    </tr>\n    <tr>\n      <th>Recorded By:</th>\n      <td>Egyptian astronome

#### Mars Hemispheres

* Visit the USGS Astrogeology site https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars to obtain high resolution images for each of Mar's hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

```python
# Example:
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]
```

- - -


In [35]:
url = "https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars"
browser.visit(url)

In [36]:
# Create for loop each element, grab link (scrape)
html = browser.html
hemisphere_soup = BeautifulSoup(html, "html.parser")


In [37]:
# grabbed this from inspect
# <a href="/search/map/Mars/Viking/schiaparelli_enhanced" 
# class="itemLink product-item"><img class="thumb" src="/cache/images/7677c0a006b83871b5a2f66985ab5857_schiaparelli_enhanced.tif_thumb.png" alt="Schiaparelli Hemisphere Enhanced thumbnail"></a>
# hemisphere_soup.find_all("a", class_= "itemLink product-item")

In [38]:
# Get links 
hemisphere_links = hemisphere_soup.find_all("a", class_= "itemLink product-item")
for link in range(len(hemisphere_links)):
    print(hemisphere_links[link]["href"])

/search/map/Mars/Viking/cerberus_enhanced
/search/map/Mars/Viking/cerberus_enhanced
/search/map/Mars/Viking/schiaparelli_enhanced
/search/map/Mars/Viking/schiaparelli_enhanced
/search/map/Mars/Viking/syrtis_major_enhanced
/search/map/Mars/Viking/syrtis_major_enhanced
/search/map/Mars/Viking/valles_marineris_enhanced
/search/map/Mars/Viking/valles_marineris_enhanced


In [39]:
# Duplicated links to visit - get every other & return as full url & test
hemisphere_links = hemisphere_soup.find_all("a", class_= "itemLink product-item")
for link in range(0,len(hemisphere_links),2):
    print("https://astrogeology.usgs.gov" + hemisphere_links[link]["href"])

https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced


In [40]:
# Duplicated links to visit - get every other & return as full url & test
hemisphere_links = hemisphere_soup.find_all("a", class_= "itemLink product-item")
# Create list of image url & title in dictionary format
hemisphere_dictionary = {}
hemisphere_image_urls = []

for link in range(0,len(hemisphere_links),2):
    # Visit each hemisphere url
    browser.visit("https://astrogeology.usgs.gov" + hemisphere_links[link]['href'])
    time.sleep(2)
    
    # convert page to html to parse title
    html = browser.html
    hemisphere_soup = BeautifulSoup(html, "html.parser")
    
    # Scrape for text
    # extract title from h2 tag
    hemisphere_title = hemisphere_soup.find("h2", class_="title").text
     
    # Look for 'sample' button that contains our image links
    sample_button = browser.find_link_by_text("Sample").first
    
    # save image links to our list
    hemisphere_dictionary = {'title': hemisphere_title, 'img_url': sample_button['href']}
    hemisphere_image_urls.append(hemisphere_dictionary)

# Example:
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]


In [41]:
hemisphere_image_urls

[{'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg',
  'title': 'Cerberus Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/schiaparelli_enhanced.tif/full.jpg',
  'title': 'Schiaparelli Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/syrtis_major_enhanced.tif/full.jpg',
  'title': 'Syrtis Major Hemisphere Enhanced'},
 {'img_url': 'http://astropedia.astrogeology.usgs.gov/download/Mars/Viking/valles_marineris_enhanced.tif/full.jpg',
  'title': 'Valles Marineris Hemisphere Enhanced'}]

## Step 1 - Scraping

Complete your initial scraping using Jupyter Notebook, BeautifulSoup, Pandas, and Requests/Splinter.
### JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

```python
# Example:
featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16225_hires.jpg'
```
* Create a Jupyter Notebook file called `mission_to_mars.ipynb` and use this to complete all of your scraping and analysis tasks. The following outlines what you need to scrape.

### NASA Mars News

* Scrape the [NASA Mars News Site](https://mars.nasa.gov/news/) and collect the latest News Title and Paragraph Text. Assign the text to variables that you can reference later.

```python
# Example:
news_title = "NASA's Next Mars Mission to Investigate Interior of Red Planet"

news_p = "Preparation of NASA's next spacecraft to Mars, InSight, has ramped up this summer, on course for launch next May from Vandenberg Air Force Base in central California -- the first interplanetary launch in history from America's West Coast."
```

### JPL Mars Space Images - Featured Image

* Visit the url for JPL Featured Space Image [here](https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars).

* Use splinter to navigate the site and find the image url for the current Featured Mars Image and assign the url string to a variable called `featured_image_url`.

* Make sure to find the image url to the full size `.jpg` image.

* Make sure to save a complete url string for this image.

```python
# Example:
featured_image_url = 'https://www.jpl.nasa.gov/spaceimages/images/largesize/PIA16225_hires.jpg'
```

### Mars Weather

* Visit the Mars Weather twitter account [here](https://twitter.com/marswxreport?lang=en) and scrape the latest Mars weather tweet from the page. Save the tweet text for the weather report as a variable called `mars_weather`.

```python
# Example:
mars_weather = 'Sol 1801 (Aug 30, 2017), Sunny, high -21C/-5F, low -80C/-112F, pressure at 8.82 hPa, daylight 06:09-17:55'
```

### Mars Facts

* Visit the Mars Facts webpage [here](http://space-facts.com/mars/) and use Pandas to scrape the table containing facts about the planet including Diameter, Mass, etc.

* Use Pandas to convert the data to a HTML table string.

### Mars Hemispheres

* Visit the USGS Astrogeology site [here](https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars) to obtain high resolution images for each of Mar's hemispheres.

* You will need to click each of the links to the hemispheres in order to find the image url to the full resolution image.

* Save both the image url string for the full resolution hemisphere image, and the Hemisphere title containing the hemisphere name. Use a Python dictionary to store the data using the keys `img_url` and `title`.

* Append the dictionary with the image url string and the hemisphere title to a list. This list will contain one dictionary for each hemisphere.

```python
# Example:
hemisphere_image_urls = [
    {"title": "Valles Marineris Hemisphere", "img_url": "..."},
    {"title": "Cerberus Hemisphere", "img_url": "..."},
    {"title": "Schiaparelli Hemisphere", "img_url": "..."},
    {"title": "Syrtis Major Hemisphere", "img_url": "..."},
]
```

- - -

## Step 2 - MongoDB and Flask Application

Use MongoDB with Flask templating to create a new HTML page that displays all of the information that was scraped from the URLs above.

* Start by converting your Jupyter notebook into a Python script called `scrape_mars.py` with a function called `scrape` that will execute all of your scraping code from above and return one Python dictionary containing all of the scraped data.

* Next, create a route called `/scrape` that will import your `scrape_mars.py` script and call your `scrape` function.

  * Store the return value in Mongo as a Python dictionary.

* Create a root route `/` that will query your Mongo database and pass the mars data into an HTML template to display the data.

* Create a template HTML file called `index.html` that will take the mars data dictionary and display all of the data in the appropriate HTML elements. Use the following as a guide for what the final product should look like, but feel free to create your own design.

![final_app_part1.png](Images/final_app_part1.png)
![final_app_part2.png](Images/final_app_part2.png)

- - -

## Hints

* Use Splinter to navigate the sites when needed and BeautifulSoup to help find and parse out the necessary data.

* Use Pymongo for CRUD applications for your database. For this homework, you can simply overwrite the existing document each time the `/scrape` url is visited and new data is obtained.

* Use Bootstrap to structure your HTML template.

## Copyright

Trilogy Education Services © 2017. All Rights Reserved.