# Data Acquisition via API Calls (NewsAPI)

---

## Phase 1: Setup and Security

### Install Dependencies

In [1]:
# Install the necessary library for loading environment variables (like API keys)
!pip install python-dotenv

print("Dependencies installed successfully.")

Collecting python-dotenv
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.1.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Dependencies installed successfully.


### Import Modules
Import all necessary Python libraries. Note that `requests` is the primary tool for submitting HTTP requests.

In [2]:
import numpy as np
import pandas as pd
import os
import requests
import json
import dotenv
from datetime import datetime

print("Modules imported.")

Modules imported.


### Load Environment Variables
This loads your `.env` file, which is crucial for securely handling your NewsAPI key without exposing it in the notebook.

In [3]:
# Load the .env file to access environment variables (e.g., your API Key)
dotenv.load_dotenv()

print("Environment variables loaded.")

Environment variables loaded.


### Retrieve API Key
We access the stored API key using os.getenv().

In [4]:
# Get the News API key from the environment variables
newskey = os.getenv('newskey')

# We won't print the key, but we confirm it's loaded (e.g., by checking its length)
if newskey:
    print("News API key successfully retrieved.")
else:
    print("ERROR: News API key not found. Check your .env file.")

News API key successfully retrieved.


---

## Phase 2: Building the API Request Components
An HTTP request requires four main components: URL root, endpoint, headers (for security/metadata), and parameters (the query).

### Build User Agent Header
Many APIs require a User-agent to identify the client application. We use httpbin.org to dynamically grab a standard User-agent string.

In [5]:
# 1. Get a standard User-agent string
r = requests.get('https://httpbin.org/user-agent')
useragent = json.loads(r.text)['user-agent']

# 2. Build the full headers dictionary
# The API key is sent via the 'X-Api-Key' header, a common secure method.
headers = {'User-agent': useragent,
           'X-Api-Key': newskey}

print(f"User-agent: {useragent}")
print("Headers dictionary created, containing the API key.")

User-agent: python-requests/2.32.4
Headers dictionary created, containing the API key.


### Define URL Root and Endpoint
This defines the fixed address for the NewsAPI service we want to use.

In [7]:
# Define the fixed URL parts
root = 'https://newsapi.org'
endpoint = '/v2/everything' # This endpoint searches all articles

print(f"API Base URL: {root + endpoint}")

API Base URL: https://newsapi.org/v2/everything


### Define Query Parameters
Parameters are used to customize the search (e.g., topic, language, date).

In [8]:
# Define the search parameters as a Python dictionary
params = {'q': '"tallest mountain"',  # Topic to search for (using quotes for exact phrase)
         'searchIn': 'content',      # Search within the article content
         'language': 'en',           # Restrict to English articles
         'pageSize': 100}            # Request up to 100 articles

print("Query parameters defined:")
print(params)

Query parameters defined:
{'q': '"tallest mountain"', 'searchIn': 'content', 'language': 'en', 'pageSize': 100}


---

## Phase 3: API Execution and Parsing

### Submit the GET Request
This cell sends the request and checks the response status. A `<Response [200]>` indicates success.

In [9]:
# Combine the components and submit the GET request
r = requests.get(root + endpoint,
                headers = headers,
                params = params)

# Display the response object (it should show <Response [200]>)
print(f"Request submitted. Status: {r}")

Request submitted. Status: <Response [200]>


### Parse the JSON Response
The response (`r.text`) is a single string containing the JSON data. We use `json.loads()` to convert this string into a usable Python dictionary.

In [10]:
# Convert the JSON response string into a Python dictionary
myjson = json.loads(r.text)

print(f"JSON response converted to a Python dictionary (Type: {type(myjson)})")
print("Top-level keys in the response:")
print(list(myjson.keys()))

JSON response converted to a Python dictionary (Type: <class 'dict'>)
Top-level keys in the response:
['status', 'totalResults', 'articles']


### View Raw JSON Data Structure (Optional)
This cell is often left commented out to avoid printing a massive wall of text but serves as a way to inspect the data structure.

In [11]:
# Uncomment this line to inspect the full structure of the JSON dictionary
# print(json.dumps(myjson, indent=4))

### Normalize JSON to DataFrame
The key data is nested under the articles key. Pandas’ json_normalize() function flattens this nested data into a clean, tabular DataFrame.

In [12]:
# Use json_normalize to extract the list of article dictionaries ('articles')
news_df = pd.json_normalize(myjson, record_path = ['articles'])

print(f"DataFrame created with {len(news_df)} articles.")
print("First 5 rows of the DataFrame:")
display(news_df.head())

DataFrame created with 21 articles.
First 5 rows of the DataFrame:


Unnamed: 0,author,title,description,url,urlToImage,publishedAt,content,source.id,source.name
0,Lydia Mansel,This Historic Train Climbs the Tallest Mountai...,Riding the Mount Washington Cog Railway is one...,https://www.travelandleisure.com/mount-washing...,https://s.yimg.com/ny/api/res/1.2/JmSXRiaGiWkv...,2025-09-20T16:00:00Z,Key Points\r\n<ul><li>The Mount Washington Cog...,,Travel+Leisure
1,Express Web Desk,"Polish skier becomes first to climb, ski down ...","In 2018, the Polish climber was the first pers...",https://indianexpress.com/article/world/polish...,https://images.indianexpress.com/2025/09/polan...,2025-09-28T04:50:59Z,Polish skier Andrzej Bargiel made history this...,,The Indian Express
2,Georgie English,CLEAREST EVER SIGNS OF LIFE ON MARS...,NASA has revealed the clearest signs of life o...,https://www.the-sun.com/tech/15159152/nasa-mar...,https://www.the-sun.com/wp-content/uploads/sit...,2025-09-10T17:34:37Z,NASA has revealed the clearest signs of life o...,,The-sun.com
3,ABC News,"Sudan landslide claims 1,000 lives, village 'c...",The Sudan Liberation Movement/Army is appealin...,https://www.abc.net.au/news/2025-09-02/sudan-l...,https://live-production.wcms.abc-cdn.net.au/73...,2025-09-02T07:52:51Z,"At least 1,000 people were killed in a landsli...",abc-news-au,ABC News (AU)
4,"sascha.pare@futurenet.com (Sascha Pare) , Sasc...",The geology that holds up the Himalayas is not...,A 100-year-old theory explaining how Asia can ...,https://www.livescience.com/planet-earth/geolo...,https://cdn.mos.cms.futurecdn.net/KMwyEqed8eMT...,2025-08-30T15:50:00Z,Scientists may have just toppled a 100-year-ol...,,Live Science


---

## Phase 4: Data Analysis and Export

### Clean and Prepare for Export
Perform a final step to ensure the data is properly formatted before saving.

In [13]:
# Clean up the publishedAt column (convert to datetime)
news_df['publishedAt'] = pd.to_datetime(news_df['publishedAt'])

# Select a final set of columns for the CSV
final_df = news_df[['publishedAt', 'title', 'description', 'url', 'source.name', 'author']].copy()

print("DataFrame prepared for export.")

DataFrame prepared for export.


### Export to CSV
This is the final Load (L) phase of the data acquisition process.

In [14]:
# Define a filename based on the current date for organization
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_filename = f'news_articles_{timestamp}.csv'

# Export the final clean DataFrame to a CSV file
final_df.to_csv(output_filename, index=False)

print(f"Successfully exported {len(final_df)} articles to {output_filename}")

Successfully exported 21 articles to news_articles_20250929_235342.csv


---

## Function Definition (Optional Consolidation)
This final cell provides the consolidation of all steps into a single, reusable function—demonstrating how a script would execute the entire API call process.

In [15]:
def grab_latest_articles():
    """
    Consolidates all steps: builds request, calls API, parses JSON, and returns a DataFrame.
    """
    
    # Prompt the user
    topic = input("Please enter your topic of interest: ")
    
    # Build our Headers
    newskey = os.getenv('newskey')
    r = requests.get('https://httpbin.org/user-agent')
    useragent = json.loads(r.text)['user-agent']
    headers = {'User-agent': useragent,
               'X-Api-Key': newskey}

    # Build our URL and Parameters
    root = 'https://newsapi.org'
    endpoint = '/v2/everything'
    params = {'q': topic,
              'searchIn': 'content',
              'language': 'en',
              'pageSize': 100}
    
    # Submit our Request
    r = requests.get(root + endpoint,
                headers = headers,
                params = params)
    
    # Create and return the pandas dataframe
    myjson = json.loads(r.text)
    news_df = pd.json_normalize(myjson, record_path = ['articles'])
    
    return news_df

In [16]:
grab_latest_articles()

Unnamed: 0,author,title,description,url,urlToImage,publishedAt,content,source.id,source.name
0,A.R.E. Taylor,"Inside the Nuclear Bunkers, Mines, and Mountai...",Companies are going to great lengths to protec...,https://www.wired.com/story/inside-the-nuclear...,https://media.wired.com/photos/68d67a6e8c02238...,2025-09-27T12:00:00Z,Data centers are responsible for running many ...,wired,Wired
1,Sophie Hurwitz,Big Tech Dreams of Putting Data Centers in Space,"A sci-fi idea is gaining supporters, from bill...",https://www.wired.com/story/data-centers-gobbl...,https://media.wired.com/photos/68cd33040864a1e...,2025-09-20T11:00:00Z,This story originally appeared on Grist and is...,wired,Wired
2,"Zoë Schiffer, Will Knight, Lauren Goode",OpenAI Teams Up With Oracle and SoftBank to Bu...,The new sites will boost Stargate’s planned ca...,https://www.wired.com/story/openai-oracle-soft...,https://media.wired.com/photos/68d188016a137b6...,2025-09-23T20:59:50Z,OpenAI is planning to build five new data cent...,wired,Wired
3,AJ Dellinger,Spotify Would Prefer You Didn’t Sell Your Own ...,"Stick to listening, would you?",https://gizmodo.com/spotify-would-prefer-you-d...,https://gizmodo.com/app/uploads/2023/07/fe9aee...,2025-09-13T18:15:54Z,Spotify has never been shy about the fact that...,,Gizmodo.com
4,Thomas Ricker,Apple Watch hypertension feature cleared by FDA,"Starting next week, Apple’s new hypertension n...",https://www.theverge.com/news/776942/apple-wat...,https://platform.theverge.com/wp-content/uploa...,2025-09-12T06:10:41Z,"Starting next week, Apple’s new hypertension n...",the-verge,The Verge
...,...,...,...,...,...,...,...,...,...
95,Katie Drummond,Matthew Prince Wants AI Companies to Pay for T...,The Cloudflare CEO joined ‘The Big Interview’ ...,https://www.wired.com/story/big-interview-podc...,https://media.wired.com/photos/68bf6103e1944be...,2025-09-16T10:00:00Z,My evidence that we're onto something is we've...,wired,Wired
96,Zeyi Yang,China Turns Legacy Chips Into a Trade Weapon,"As Washington pushes for a TikTok deal, Beijin...",https://www.wired.com/story/china-probe-us-chi...,https://media.wired.com/photos/68cb4d9a4eff22d...,2025-09-18T15:00:00Z,While the Trump administration was trying to m...,wired,Wired
97,,UK and US unveil nuclear energy deal promising...,Key among the plans is a proposal to build up ...,https://www.bbc.com/news/articles/ckgzevzwxwro,https://ichef.bbci.co.uk/news/1024/branded_new...,2025-09-14T23:23:54Z,"Charlotte EdwardsBusiness reporter, BBC News\r...",,BBC News
98,,Murdochs likely to be involved in US TikTok de...,Oracle chairman Larry Ellison and Dell founder...,https://www.bbc.com/news/articles/crkjjv28ykjo,https://ichef.bbci.co.uk/news/1024/branded_new...,2025-09-21T21:35:04Z,Rupert Murdoch and his son Lachlan are expecte...,,BBC News
