Installation

This project provides a CLI for collecting news data. Given a timeframe and a set of top-level domains, the tool looks for corresponding news articles in the Wayback Machine, GDELT, and Media Cloud.

Installation

git clone https://github.com/NHagar/archive_check.git

pip install -r requirements.txt

Usage

python get_urls.py retrieves URLs for designated domains and date range, then saves to databases. Queries Wayback Machine, GDELT, and Media Cloud

parameters --sites comma-separated list of domains to query, of format: nytimes.com,latimes.com,vox.com --start desired start date, of format "YYYY-MM-DD" --end desired end date, of format "YYYY-MM-DD"

scripts/ contains example one-off scripts for collecting data directly from sites.

python get_fulltext.py attempts to scrape full text from all URLs in database. PATTERNS can be modified to apply site-specific URL filtering heuristics before scraping.

python compare_results.py runs set of analyses (link count, LDA, headline regression) for each database table and outputs result comparisons across services. Links that 404 in the fulltext collection step won't be counted here.

Requirements

Media Cloud API key (in root .env file, called API_KEY_MC)

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
data		data
pipe		pipe
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Research-archive_coverage.Rproj		Research-archive_coverage.Rproj
cc_to_db.py		cc_to_db.py
compare_results.py		compare_results.py
data_patches.ipynb		data_patches.ipynb
get_fulltext.py		get_fulltext.py
get_urls.py		get_urls.py
requirements.txt		requirements.txt
results.R		results.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Usage

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

NHagar/archive_check

Folders and files

Latest commit

History

Repository files navigation

Installation

Usage

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages