# [Deliverable #1] Mars News - Scrape Titles and Preview Text
---
## Step #0 - Import Dependencies and Setup
---

In [1]:
# Import Splinter and BeautifulSoup Libraries (For Automated Web Browsing and Scraping)
# Import json and os libraries (For Dumping Scraped Data to a JSON file in a specific directory)
from splinter import Browser
from bs4 import BeautifulSoup
import json
import os

In [2]:
# Set up Splinter (Google Chrome Browser)
browser = Browser('chrome')

## Step #1 - Visit Website (Splinter Automated Browsing)
---

In [3]:
# Visit the Mars NASA news site
url = 'https://static.bc-edx.com/data/web/mars_news/index.html'

browser.visit(url)

## Step #2 - Scrape Website Using BeautifulSoup (Extract All Text Elements)
---

In [4]:
# Parse the website
html = browser.html
soup = BeautifulSoup(html, 'html.parser')

In [5]:
# For every HTML element with text in the website, scrape the text and store in a list
all_text = [element.get_text(strip = True) for element in soup.find_all(text = True)]

# Log all scraped from the website
for text in all_text:
    print(text)







News - Mars Exploration Program








MARS Planet Science

Exploration Program








The Red Planet



The Program



News & Events



Multimedia



Missions



More











News









Latest





All Categories





















November 9, 2022

NASA's MAVEN Observes Martian Light Show Caused by Major Solar Storm

For the first time in its eight years orbiting Mars, NASA’s MAVEN mission witnessed two different types of ultraviolet aurorae simultaneously, the result of solar storms that began on Aug. 27.















November 1, 2022

NASA Prepares to Say 'Farewell' to InSight Spacecraft

A closer look at what goes into wrapping up the mission as the spacecraft’s power supply continues to dwindle.















October 28, 2022

NASA and ESA Agree on Next Steps to Return Mars Samples to Earth

The agency’s Perseverance rover will establish the first sample depot on Mars.















October 27, 2022

NASA's InSight Lander Detects Stunning Meteoroid Impact on M

  all_text = [element.get_text(strip = True) for element in soup.find_all(text = True)]


## Step #3 - Scrape Website Using BeautifulSoup (Extract `title` and `preview` Text)
---

### Store All Scraped `title` Texts in a List (`title_texts`)

In [6]:
# From inspecting the website, the title text is found in div tags w/ class as 'content_title'
title_elements = soup.find_all('div', class_= "content_title")

# For every title element in the list, extract the text and store in a list
title_texts = [element.get_text(strip = True) for element in title_elements]

# Display the list of all extracted title texts
title_texts

["NASA's MAVEN Observes Martian Light Show Caused by Major Solar Storm",
 "NASA Prepares to Say 'Farewell' to InSight Spacecraft",
 'NASA and ESA Agree on Next Steps to Return Mars Samples to Earth',
 "NASA's InSight Lander Detects Stunning Meteoroid Impact on Mars",
 'NASA To Host Briefing on InSight, Mars Reconnaissance Orbiter Findings',
 'Why NASA Is Trying To Crash Land on Mars',
 'Curiosity Mars Rover Reaches Long-Awaited Salty Region',
 'Mars Mission Shields Up for Tests',
 "NASA's InSight Waits Out Dust Storm",
 "NASA's InSight 'Hears' Its First Meteoroid Impacts on Mars",
 "NASA's Perseverance Rover Investigates Geologically Rich Mars Terrain",
 'NASA to Host Briefing on Perseverance Mars Rover Mission Operations',
 "NASA's Perseverance Makes New Discoveries in Mars' Jezero Crater",
 "10 Years Since Landing, NASA's Curiosity Mars Rover Still Has Drive",
 "SAM's Top 5 Discoveries Aboard NASA's Curiosity Rover at Mars"]

### Store All Scraped `preview` Texts in a List (`preview_texts`)

In [7]:
# From inspecting the website, the article preview text is found in div tags w/ class as 'article_teaser_body'
preview_elements = soup.find_all('div', class_= "article_teaser_body")

# For every article preview element in the list, extract the text and store in a list
preview_texts = [element.get_text(strip = True) for element in preview_elements]

# display the list of all extracted article preview texts
preview_texts

['For the first time in its eight years orbiting Mars, NASA’s MAVEN mission witnessed two different types of ultraviolet aurorae simultaneously, the result of solar storms that began on Aug. 27.',
 'A closer look at what goes into wrapping up the mission as the spacecraft’s power supply continues to dwindle.',
 'The agency’s Perseverance rover will establish the first sample depot on Mars.',
 'The agency’s lander felt the ground shake during the impact while cameras aboard the Mars Reconnaissance Orbiter spotted the yawning new crater from space.',
 'Scientists from two Mars missions will discuss how they combined images and data for a major finding on the Red Planet.',
 'Like a car’s crumple zone, the experimental SHIELD lander is designed to absorb a hard impact.',
 'After years of climbing, the Mars rover has arrived at a special region believed to have formed as Mars’ climate was drying.',
 'Protecting Mars Sample Return spacecraft from micrometeorites requires high-caliber work.',

### Store Each `title` & `preview` Pair in Python dictionary, then store all dictionaries in a List (`final_result`)

In [8]:
# For every title and preview text in every list of title and preview from the nested list (Using the Zip Function)...
# Store them in a python dictionary where the title value is paired with the 'title' key...
# And the preview value is paired with 'preview' key...
final_result = [{'title': title, 'preview': preview} for title, preview in zip(title_texts, preview_texts)]

# Display the list of Python Dictionaries where each dict contains a title-and-preview pair
final_result

[{'title': "NASA's MAVEN Observes Martian Light Show Caused by Major Solar Storm",
  'preview': 'For the first time in its eight years orbiting Mars, NASA’s MAVEN mission witnessed two different types of ultraviolet aurorae simultaneously, the result of solar storms that began on Aug. 27.'},
 {'title': "NASA Prepares to Say 'Farewell' to InSight Spacecraft",
  'preview': 'A closer look at what goes into wrapping up the mission as the spacecraft’s power supply continues to dwindle.'},
 {'title': 'NASA and ESA Agree on Next Steps to Return Mars Samples to Earth',
  'preview': 'The agency’s Perseverance rover will establish the first sample depot on Mars.'},
 {'title': "NASA's InSight Lander Detects Stunning Meteoroid Impact on Mars",
  'preview': 'The agency’s lander felt the ground shake during the impact while cameras aboard the Mars Reconnaissance Orbiter spotted the yawning new crater from space.'},
 {'title': 'NASA To Host Briefing on InSight, Mars Reconnaissance Orbiter Findings',


## Step #4 - Export `final_result` Scraped Data to JSON File (`title_preview_data.json`)
---

In [9]:
# Join path components to establish the directory the JSON file will be saved in
output_json = os.path.join('Data', 'mars_news.json')

# Use Open() module to create a JSON text file and store in the 'Data' folder
# Write the scraped data results to the JSON file
# Indent twice in the JSON for better readability
with open(output_json, 'w') as json_file:
    json.dump(final_result, json_file, indent = 2)
    
#After scraped data is written, close the JSON file
json_file.close()

#Inform user the JSON file is created and stored
print("******List of title-and-preview Dictionaries Results Saved to JSON File (Data/mars_news.json)******")

******List of title-and-preview Dictionaries Results Saved to JSON File (Data/mars_news.json)******


### Terminate Browsing Session

In [10]:
browser.quit()