<a href="https://colab.research.google.com/github/zbovaird/OSINT/blob/main/Copy_of_OSINT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OSINT Tools

A categorized list of powerful, free (or freemium) OSINT tools with brief
notes on programmatic API connectivity and whether they offer an MCP-server
implementation (Model Context Protocol). "API" indicates a documented
programmatic interface (REST/GraphQL/official SDK). "MCP server" is listed
only if the project natively implements MCP semantics (rare).

## Intelligence Platforms & Frameworks
- MISP: API: Yes (comprehensive REST API). MCP server: No.
- OpenCTI: API: Yes (REST + GraphQL). MCP server: No.
- SpiderFoot (Community): API: Yes (web/server mode exposes HTTP endpoints). MCP server: No.
- Maltego CE (Community Edition): API: Limited (transforms/TRX SDK; server features commercial). MCP server: No.
- Recon-ng: API: No (module framework; scriptable/automatable via Python).

## Internet-Wide Search / Scanners
- Shodan: API: Yes (free limited tier with API key). MCP server: No.
- Censys: API: Yes (free limited tier). MCP server: No.
- ZoomEye: API: Yes (limited free access). MCP server: No.
- BinaryEdge: API: Yes (free tester tier; mostly paid). MCP server: No.

## Domain / DNS / WHOIS / Passive DNS
- amass: API: No (CLI/Go library; programmatic usage via library). MCP server: No.
- PassiveTotal / RiskIQ: API: Yes (primarily paid; limited free trials). MCP server: No.
- Farsight DNSDB: API: Yes (commercial; limited community access). MCP server: No.
- SecurityTrails: API: Yes (free tier/limited). MCP server: No.
- `whois` (command-line): API: No (many registrars provide APIs separately).

## Subdomain & Asset Discovery
- Subfinder: API: No (CLI; libraries/wrappers exist). MCP server: No.
- Sublist3r: API: No (CLI). MCP server: No.
- Assetfinder: API: No (CLI). MCP server: No.
- Amass (again): API: No (see above; widely used for passive/active discovery).
- theHarvester: API: No (CLI; scriptable). MCP server: No.

## Breach / Credentials / Dark Web
- Have I Been Pwned (HIBP): API: Yes (API key; free for basic queries). MCP server: No.
- DeHashed: API: Yes (limited/paid tiers). MCP server: No.
- Ahmia (Tor search): API: Limited/No (public web index; some indexers provide APIs). MCP server: No.
- OnionScan: API: No (tooling for Tor/hidden-service analysis; scriptable).

## Social Media & People Search
- snscrape: API: No (library/scraper; programmatic usage). MCP server: No.
- Social media official APIs (X/Twitter, Facebook/Graph, Instagram, TikTok): API: Yes (access and free tiers vary, often require keys). MCP server: No.
- Twint (historic): API: No (scraper; maintenance varies). MCP server: No.

## Image, Metadata & Media Analysis
- ExifTool: API: No (CLI/library). MCP server: No.
- SauceNAO: API: Yes (free limited API keys). MCP server: No.
- TinEye: API: Yes (commercial; limited free trial). MCP server: No.
- FotoForensics / ELA tools: API: Limited/No.

## Mapping & Geolocation
- OpenStreetMap / Overpass API: API: Yes (free). MCP server: No.
- Mapillary: API: Yes (community/free tiers). MCP server: No.
- OpenAerialMap: API: Yes (varies). MCP server: No.

## Network Recon & Scanning (OSINT-adjacent)
- Nmap: API: No (CLI; libraries/wrappers exist). MCP server: No.
- masscan: API: No (CLI). MCP server: No.
- ZMap: API: No (CLI). MCP server: No.

## Aggregation, Automation & Recon Suites
- SpiderFoot HX (commercial cloud) / SpiderFoot Community: API: Yes (community/server exposes API). MCP server: No.
- TheHarvester: API: No (see above). MCP server: No.
- OSINT Framework (website index): API: No (catalog resource).

## Browser Extensions & Utilities
- Hunchly (paid): API: Limited (commercial). MCP server: No.
- Hunter.io: API: Yes (free limited tier). MCP server: No.

## Notes & Summary
- API availability: Many powerful OSINT projects expose programmatic APIs (Shodan, Censys, VirusTotal, MISP, OpenCTI, Overpass, etc.), but free access is typically limited by quotas or requires registration and an API key.
- MCP server: I did not find mainstream OSINT tools that natively implement the Model Context Protocol (MCP). Most are self-hostable services or CLI tools exposing REST/GraphQL endpoints you can integrate into automation or an MCP-compatible wrapper.

If you want, I can:
- export this list to Markdown or CSV, or
- add one-line usage examples, install/run instructions, and direct links for each tool.

Generated: 2026-02-16

## Integration Index (MCP servers / API)
This quick index groups the tools above by whether they provide an MCP-server, a programmatic API, or both.

- Tools with MCP servers:
	- None (no mainstream OSINT projects in this list natively implement MCP semantics). Many can be wrapped or proxied behind an MCP server if desired.

- Tools with API connections (documented programmatic interfaces / REST/GraphQL/official SDKs):
	- MISP
	- OpenCTI
	- SpiderFoot (Community / HX)
	- Maltego CE (limited transform/TRX integration)
	- Shodan
	- Censys
	- ZoomEye
	- BinaryEdge
	- PassiveTotal / RiskIQ
	- Farsight DNSDB
	- SecurityTrails
	- Have I Been Pwned (HIBP)
	- DeHashed
	- Social media official APIs (X/Twitter, Facebook/Graph, Instagram, TikTok)
	- SauceNAO
	- TinEye
	- OpenStreetMap / Overpass API
	- Mapillary
	- OpenAerialMap
	- SpiderFoot HX
	- Hunter.io

	## Political donations / Campaign finance (US)
	This section lists public-data and OSINT-focused tools and APIs useful for tracing political donations, PACs, and campaign finance in the United States.

	- Federal Election Commission (FEC) / OpenFEC: API: Yes (official FEC REST API / OpenFEC endpoints with campaign finance filings, receipts, disbursements). MCP server: No.
	- OpenSecrets (Center for Responsive Politics): API: Yes (OpenSecrets API; requires API key, free limited access). MCP server: No.
	- FollowTheMoney (National Institute on Money in Politics / FollowTheMoney.org): API: Yes (data access and API/bulk downloads; registration may be required). MCP server: No.
	- LittleSis: API: Yes (API for persons/organizations/relationships; useful to correlate donors and recipients). MCP server: No.
	- MapLight: API: Limited (historically provided programmatic access; availability varies). MCP server: No.
	- Data.gov / Campaign finance datasets: API: Yes (various datasets and endpoints aggregated by Data.gov). MCP server: No.
	- ProPublica / Investigative datasets: API: Limited (ProPublica offers APIs for several datasets; campaign finance analysis often integrates FEC data). MCP server: No.

	Notes: The FEC and OpenSecrets APIs are the primary programmatic sources for raw campaign finance data. Many investigative datasets combine these sources with corporate registries and donor-lookup services. None of the above natively implement MCP semantics, but all with APIs can be wrapped behind an MCP server or ingested into an MCP-capable pipeline.

	## Additional OSINT Categories
	Below are additional useful OSINT categories that were not previously listed, with representative tools and notes about API availability.

	- Public Records & Government Data
		- Examples: PACER / RECAP (court records), Data.gov datasets, state business registries, local property records. API: Varies (Data.gov: Yes; PACER: paid; RECAP: limited). MCP server: No.

	- Financial Filings & Corporate Intelligence
		- Examples: SEC EDGAR (API/bulk), OpenCorporates (API), Companies House (UK API). API: Yes (EDGAR bulk, OpenCorporates API). MCP server: No.

	- Academic & Publication Search
		- Examples: Semantic Scholar, CrossRef, Google Scholar (no official public API), PubMed. API: Semantic Scholar/CrossRef/PubMed: Yes (limited); Google Scholar: No official API. MCP server: No.

	- Code Repositories & Package Search
		- Examples: GitHub, GitLab, npm, PyPI, public repo search. API: Yes (GitHub/GitLab APIs; registry APIs). MCP server: No.

	- Mobile App Stores & APK Analysis
		- Examples: APKMirror, Google Play scraping tools, Mobile App security scanners. API: Limited (most stores restrict programmatic scraping; some services provide APIs). MCP server: No.

	- Phone Number, SMS & VOIP Lookup
		- Examples: Numverify, Twilio lookup, OpenCNAM, carrier lookup services. API: Yes (typically paid/free-tier). MCP server: No.

	- FOIA, Court & Legal Datasets
		- Examples: RECAP (court dockets), state court portals, FOIA request aggregators. API: Limited/varies. MCP server: No.

	- Satellite & High-Resolution Imagery
		- Examples: Sentinel Hub, USGS EarthExplorer, Planet (commercial), Google Earth Engine. API: Yes (Sentinel/USGS/GEE APIs; Planet commercial). MCP server: No.

	- Business & People Data Aggregators
		- Examples: Pipl, Clearbit, FullContact, Whitepages Pro. API: Yes (mostly commercial with limited free tiers). MCP server: No.

	- Forum / Paste / Leak Search
		- Examples: Pastebin scrapers, LeakForums trackers, public paste monitoring services. API: Varies (Pastebin: limited API). MCP server: No.

	- Darknet / Tor Monitoring Tools
		- Examples: OnionScan, Tor i2p indices, specialized darknet crawlers. API: Limited/varies. MCP server: No.

	- Document / PDF Analysis & Metadata Extraction
		- Examples: ExifTool (metadata), pdfgrep, Tika. API: Limited (libraries exist). MCP server: No.

	If you'd like, I can fold these into the main list with brief links and one-line usage hints for each tool, or export the entire `tools` file to Markdown/CSV for sharing.

	## One-line Descriptions (what each tool is geared toward)
	- MISP: sharing, storing, and collaborating on threat intelligence (IOCs, events, attributes).
	- OpenCTI: building and querying threat intelligence knowledge graphs (actors, indicators, relationships).
	- SpiderFoot (Community / HX): automated reconnaissance across domains, IPs, emails, and OSINT sources.
	- Maltego CE: visual link analysis and relationship mapping between entities (people, domains, infrastructure).
	- Recon-ng: modular reconnaissance framework for automated data collection and enrichment.
	- Shodan: discovery and metadata on internet-connected devices, exposed services, and banners.
	- Censys: internet-wide host and certificate discovery with searchable metadata and fingerprints.
	- ZoomEye: searchable index of internet-facing assets and services (IPs, banners, hosts).
	- BinaryEdge: internet asset scanning and threat intelligence focused on exposed services and vulnerabilities.
	- amass: passive and active subdomain enumeration and attack-surface mapping.
	- PassiveTotal / RiskIQ: domain and passive DNS intelligence, historical records, and asset context.
	- Farsight DNSDB: historical passive DNS resolution data for domain/IP correlation.
	- SecurityTrails: domain history, DNS records, WHOIS, and hosting/ownership metadata.
	- whois (CLI/registrar APIs): domain registration and registrar/owner metadata.
	- Subfinder: fast passive subdomain discovery using many public sources.
	- Sublist3r: subdomain enumeration via search engines and public sources.
	- Assetfinder: finding related domains and assets for an organization or domain.
	- theHarvester: harvesting emails, subdomains, hosts, and related data from public sources.
	- Have I Been Pwned (HIBP): checking whether emails or domains appear in public data breaches.
	- DeHashed: searchable breach and credential datasets for exposed accounts and leaks.
	- Ahmia: search index for Tor hidden services and .onion sites.
	- OnionScan: discovery and analysis of Tor hidden services and metadata leakage.
	- snscrape: scraping public social media posts and profile metadata without official APIs.
	- Social media official APIs (X/Twitter, Facebook/Graph, Instagram, TikTok): programmatic access to posts, users, and engagement metadata.
	- Twint: Twitter scraping for timelines, searches, and user metadata (community-maintenance varies).
	- ExifTool: extracting embedded metadata from images, documents, and media files (EXIF, XMP).
	- SauceNAO: reverse-image search for image sources and visual matches.
	- TinEye: reverse-image search and matching to find image usage and origins.
	- FotoForensics / ELA tools: basic image integrity and Error Level Analysis to detect manipulation.
	- OpenStreetMap / Overpass API: querying crowd-sourced map data and geospatial features.
	- Mapillary: street-level imagery and location metadata from user-contributed photos.
	- OpenAerialMap: aggregated aerial and drone imagery for geolocation and analysis.
	- Nmap: port and service scanning, OS detection, and network reconnaissance.
	- masscan: very high-speed port scanner for scanning large IP ranges quickly.
	- ZMap: internet-scale network scanning framework for wide-area scans.
	- SpiderFoot HX (commercial) / SpiderFoot Community: centralized automated OSINT aggregation and reporting.
	- OSINT Framework (website): curated index of OSINT tools and categorized resources.
	- Hunchly: capturing and preserving web evidence with contextual notes and indexing.
	- Hunter.io: discovering and verifying professional email addresses associated with domains.
	- FEC / OpenFEC: official campaign finance filings, contributions, expenditures, and committee data.
	- OpenSecrets: lobbying, donation, and influence data linking donors, PACs, and candidates.
	- FollowTheMoney: state-level campaign finance data and donor-tracking across jurisdictions.
	- LittleSis: relationship mapping between people, organizations, and funding/influence networks.
	- MapLight: analysis of money-in-politics and policy influence (data access varies).
	- Data.gov (campaign datasets): federated government datasets including mapped campaign finance files.
	- ProPublica: investigative datasets and reporting often derived from public campaign finance sources.
	- PACER / RECAP: court dockets, filings, and public legal records (PACER is paid; RECAP archives some public data).
	- SEC EDGAR: corporate filings, financial disclosures, and officer/director relationships.
	- OpenCorporates: global company registry data and corporate metadata.
	- Companies House (UK): official company registry data and filings for UK entities.
	- Semantic Scholar / CrossRef / PubMed: academic paper metadata, citations, and publication records.
	- GitHub / GitLab / npm / PyPI: code repository and package metadata, commits, contributors, and dependency data.
	- APKMirror / Google Play scraping tools: mobile app metadata, APK artifacts, and historical versions (APIs limited).
	- Numverify / Twilio Lookup / OpenCNAM: phone-number validation, carrier lookup, and basic owner metadata.
	- RECAP / FOIA portals: aggregated court records and public-record FOIA-requested documents.
	- Sentinel Hub / USGS EarthExplorer / Planet / Google Earth Engine: satellite & aerial imagery datasets and geospatial analysis APIs.
	- Pipl / Clearbit / FullContact / Whitepages Pro: person and company enrichment data (commercial aggregation).
	- Pastebin scrapers / leak trackers: monitoring paste sites and public leak postings for exposed data.
	- LeakForums trackers / darknet monitors: indexes and alerts for content posted on darknet forums.
	- Tika / pdfgrep / document parsers: extracting text and metadata from PDFs and documents for content analysis.

	## Real-time monitoring (events, incidents, and live signals)
	This section lists tools and data sources suitable for tracking domestic events and incidents in near real time.

	- Social listening / micro-post streams:
		- X/Twitter API: real-time tweets, search and filtered streams (requires API key and rate limits).
		- CrowdTangle (Meta): public page/group monitoring (research access; near real-time).
		- snscrape: programmatic scraping of public social posts (not an official API).

	- Messaging & community channels:
		- Telegram API / Telethon: monitor public channels and posts programmatically.
		- Reddit API / Pushshift: monitor live subreddit activity and threads.
		- Discord bots/APIs: monitor public servers where permitted.

	- Live maps & incident aggregators:
		- LiveUAMap and similar crowd-sourced incident maps: visual tracking of civil unrest and conflict.
		- Local community-run incident maps and feeds (varies by region).

	- First-responder & radio feeds:
		- Broadcastify: live police/fire/EMS radio streams in many jurisdictions.
		- Local scanner / 911 audio streams and feeds (availability varies; check local terms).

	- News, alerts & media monitoring:
		- Google News / GDELT / MediaCloud: near real-time media ingestion and analytics (APIs available).
		- RSS feeds and local news live-blogs for immediate situation updates.

	- Traffic, transit & camera feeds:
		- State DOT traffic cameras and incident feeds (many provide public camera URLs and APIs).
		- Waze for Cities / INRIX: traffic incident streams (partnership/commercial access).

	- Aviation & maritime telemetry:
		- ADS-B / Flight tracking (FlightRadar24, ADSBExchange): live flight positions and metadata.
		- AIS / MarineTraffic: vessel movements and port activity streams.

	- Emergency / hazard sensors:
		- USGS earthquake feeds (real-time API).
		- NWS / NOAA alerts and weather watches (APIs and RSS feeds).
		- NASA FIRMS / VIIRS: near-real-time thermal hotspot (fire) detections.

	- Satellite / imagery (frequent updates):
		- Sentinel Hub / USGS / Planet / Google Earth Engine: repeated imagery and analytics (APIs; some commercial).

	- Public safety & government feeds:
		- Local government incident dashboards and open data portals (varies by city/county).
		- Data.gov aggregated datasets and emergency feeds.

	- Specialized monitoring:
		- Pastebin/Dump monitors and darknet monitoring services: watch for leaked data or operational chatter (legal precautions required).

	Operational notes:
	- Trade-offs: social feeds are fastest but noisy; official feeds are authoritative but slower. Combine both for signal/verification.
	- Legal/ethical: respect platform ToS, PII rules, and local laws. Prefer official APIs and rate-limit handling; consider on-prem LLMs for sensitive analysis.

	## Real-time monitoring — links & quick setup notes
	- X / Twitter API: https://developer.twitter.com/en/docs/twitter-api
		- Quick setup: register developer account, create project/app, get bearer/token keys; use filtered stream endpoints for live tweets; implement rate‑limit handling and replay storage.

	- CrowdTangle (Meta): https://www.crowdtangle.com/
		- Quick setup: request researcher access or apply for API access; use to track public Pages, Groups, and Instagram accounts with near-real-time ingestion.

	- snscrape: https://github.com/JustAnotherArchivist/snscrape
		- Quick setup: Python package install `pip install snscrape`; use cron or streaming loops to collect recent posts; respect ToS and scraping limits.

	- Telegram API / Telethon: https://core.telegram.org/api, https://docs.telethon.dev/
		- Quick setup: create a Telegram app to get API id/hash, use Telethon or pyrogram to subscribe to public channels, and persist messages via webhooks or DB.

	- Reddit API / Pushshift: https://www.reddit.com/dev/api/, https://pushshift.io/
		- Quick setup: register Reddit app for OAuth credentials; use Reddit streaming libraries or Pushshift for historical/near-real-time data (Pushshift has rate/availability caveats).

	- Discord (bots / API): https://discord.com/developers/docs/intro
		- Quick setup: create a bot, add to servers with permission, use gateway events or webhooks to capture messages (respect server rules and privacy).

	- LiveUAMap: https://liveuamap.com/
		- Quick setup: use the site for visual monitoring; programmatic access limited — consider scraping feeds carefully and respecting terms.

	- Broadcastify (scanner audio): https://www.broadcastify.com/
		- Quick setup: browse public streams and embed or capture via provided stream URLs; check local legality and terms before recording.

	- GDELT (media monitoring): https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/
		- Quick setup: use GDELT event and media APIs for near real-time media mentions and tone analysis; ingest via scheduled pulls.

	- MediaCloud: https://mediacloud.org/
		- Quick setup: request API key for research; query stories and social amplification metrics programmatically.

	- State DOT traffic cameras / feeds: (varies by state) — common pattern: DOT websites list camera URLs and incident feeds
		- Quick setup: identify your state/city DOT portal; many offer JSON feeds or camera endpoints you can poll or proxy.

	- Waze for Cities / INRIX: https://www.waze.com/ccp, https://inrix.com/
		- Quick setup: apply for partnership or data access; commercial access gives streaming of incidents and jam data.

	- ADS‑B / Flight tracking (ADSBExchange): https://www.adsbexchange.com/
		- Quick setup: use community feeds or APIs to stream flight positions; FlightRadar24 offers commercial APIs.

	- MarineTraffic / AIS: https://www.marinetraffic.com/
		- Quick setup: commercial APIs available for vessel positions and port calls; community AIS feeders provide raw streams.

	- USGS earthquake feeds: https://earthquake.usgs.gov/fdsnws/event/1/
		- Quick setup: use the FDSN event API or GeoJSON feeds for near-real-time earthquake notifications.

	- NWS / NOAA alerts API: https://www.weather.gov/documentation/services-web-api
		- Quick setup: poll alerts or subscribe to CAP feeds for watches/warnings; integrate geofencing for affected areas.

	- NASA FIRMS / VIIRS thermal hotspots: https://firms.modaps.eosdis.nasa.gov/
		- Quick setup: download near-real-time CSV/GeoJSON or use APIs to ingest hotspot detections (fires, thermal anomalies).

	- Sentinel Hub / USGS / Planet imagery: https://www.sentinel-hub.com/, https://earthexplorer.usgs.gov/, https://www.planet.com/
		- Quick setup: register for API keys (Sentinel/USGS free tiers available); schedule frequent tile pulls or subscribe to change-detection services.

	- Data.gov emergency & city datasets: https://www.data.gov/
		- Quick setup: search for local incident datasets and subscribe to publisher endpoints or RSS; ingest via ETL jobs.

	- Pastebin API: https://pastebin.com/api
		- Quick setup: register for API key; poll new pastes or use monitored lists; treat leaked data carefully and follow legal constraints.

	- Darknet / Tor monitoring tools (OnionScan, darknet monitors): https://github.com/s-rah/onionscan
		- Quick setup: run crawlers responsibly, store metadata (not raw illegal content), and rely on commercial darknet monitoring providers for broad coverage.

	If you want, I can (a) add these links into `tools` as clickable markdown-style links, (b) create short example scripts (Python) to subscribe to one or two of these feeds, or (c) export a CSV of these sources and setup notes. Tell me which option you prefer.

- Tools with both MCP server + API:
	- None identified (APIs are common; native MCP implementations were not found).

Notes: "API connections" above includes tools marked as "Yes" or "Limited/Yes" in the main list. If you want, I can produce a CSV/Markdown table with exact API endpoints, sign-up links, and notes about free-tier limits.


In [None]:
"""
Telegram channel listener example (Telethon)

Requirements:
- pip install telethon

Usage:
- Obtain `api_id` and `api_hash` from https://my.telegram.org
- Set `CHANNEL` to the public channel username or ID to monitor.
- Optional: set `WEBHOOK_URL` to forward collected messages as POST JSON.

This script listens for new messages in a channel and appends them to a local JSONL file.
"""
import asyncio
import json
import os
from datetime import datetime
from telethon import TelegramClient, events

# === Configure ===
API_ID = int(os.getenv('TG_API_ID', '0'))  # replace or set env TG_API_ID
API_HASH = os.getenv('TG_API_HASH', '')   # replace or set env TG_API_HASH
CHANNEL = os.getenv('TG_CHANNEL', 'examplechannel')  # e.g. 'cnn'
OUTPUT_FILE = os.getenv('TG_OUTPUT', 'telegram_messages.jsonl')
WEBHOOK_URL = os.getenv('TG_WEBHOOK', '')  # optional: POST new messages
PERSIST_EVERY = 1
# =================

if API_ID == 0 or API_HASH == '':
    raise SystemExit('Set API_ID and API_HASH (see script header).')

client = TelegramClient('osint_session', API_ID, API_HASH)

@client.on(events.NewMessage(chats=CHANNEL))
async def handler(event):
    msg = event.message
    data = {
        'id': msg.id,
        'date': msg.date.isoformat() if hasattr(msg, 'date') else None,
        'sender_id': getattr(msg.sender, 'id', None) if msg.sender else None,
        'text': msg.message,
        'raw': str(msg.to_dict()),
    }

    # append to JSONL file
    with open(OUTPUT_FILE, 'a', encoding='utf-8') as fh:
        fh.write(json.dumps(data, ensure_ascii=False) + '\n')

    print(f"Saved message {data['id']} from {CHANNEL} at {data['date']}")

    if WEBHOOK_URL:
        # optional: send to webhook (best-effort, don't block)
        import requests
        try:
            requests.post(WEBHOOK_URL, json=data, timeout=5)
        except Exception as e:
            print('Webhook POST failed:', e)

async def main():
    await client.start()
    print(f'Listening to {CHANNEL} -- output -> {OUTPUT_FILE}')
    await client.run_until_disconnected()

if __name__ == '__main__':
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print('Stopped by user')


In [None]:
"""
USGS alerts poller example

Requirements:
- pip install requests

Usage:
- This script polls the USGS "all_hour" GeoJSON feed and prints/saves new events.
- Configure `POLL_INTERVAL` for how often to poll (seconds).
- Optionally set `WEBHOOK_URL` to POST new events.
"""
import json
import time
import requests
import os

FEED_URL = os.getenv('USGS_FEED', 'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.geojson')
POLL_INTERVAL = int(os.getenv('USGS_POLL_INTERVAL', '60'))
OUTPUT_FILE = os.getenv('USGS_OUTPUT', 'usgs_new_events.jsonl')
WEBHOOK_URL = os.getenv('USGS_WEBHOOK', '')
SEEN_FILE = os.getenv('USGS_SEEN', '.usgs_seen.json')

seen = set()
if os.path.exists(SEEN_FILE):
    try:
        with open(SEEN_FILE, 'r', encoding='utf-8') as fh:
            seen = set(json.load(fh))
    except Exception:
        seen = set()

print('Polling USGS feed:', FEED_URL)

while True:
    try:
        r = requests.get(FEED_URL, timeout=20)
        r.raise_for_status()
        feed = r.json()
        features = feed.get('features', [])

        new_events = []
        for f in features:
            eid = f.get('id')
            if not eid or eid in seen:
                continue
            seen.add(eid)
            props = f.get('properties', {})
            geom = f.get('geometry', {})
            evt = {
                'id': eid,
                'time': props.get('time'),
                'place': props.get('place'),
                'mag': props.get('mag'),
                'url': props.get('url'),
                'geometry': geom,
                'properties': props,
            }
            new_events.append(evt)

            # append to file
            with open(OUTPUT_FILE, 'a', encoding='utf-8') as fh:
                fh.write(json.dumps(evt) + '\n')

            print('New event:', eid, evt['place'], 'mag', evt['mag'])

            if WEBHOOK_URL:
                try:
                    requests.post(WEBHOOK_URL, json=evt, timeout=5)
                except Exception as e:
                    print('Webhook post failed:', e)

        # persist seen ids
        with open(SEEN_FILE, 'w', encoding='utf-8') as fh:
            json.dump(list(seen), fh)

    except Exception as e:
        print('Error polling USGS feed:', e)

    time.sleep(POLL_INTERVAL)
