Skip to content

vanvt91/daily-weather-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AccuWeather Daily Forecast Scraper (Python)

Playwright (Python) port of the original TypeScript scraper. It scrapes the AccuWeather daily forecast for a configured city, validates F⇄C temperature conversions, enriches humidity from the hourly forecast page, and writes JSON

  • HTML reports.

What This Project Does

  1. Open https://www.accuweather.com
  2. Navigate to the configured city's daily forecast page
  3. Scrape every visible day card (typically up to 15 on the free tier)
  4. For each day, open the detail page and extract Day/Night periods
  5. Enrich humidity from the hourly forecast page (only the first ~4 days have hourly data on the AccuWeather free tier; days 5-15 fall back to N/A)
  6. Validate Fahrenheit ⇄ Celsius conversion
  7. Save the report as JSON + HTML

Project Structure

daily-weather-python/
├── pyproject.toml
├── requirements.txt
├── conftest.py            # pytest-playwright fixtures + AppContext wiring
├── .env.example
├── src/
│   ├── config/
│   │   └── config.py
│   ├── context/
│   │   ├── app_context.py
│   │   └── test_store.py
│   ├── pages/
│   │   ├── base_page.py
│   │   ├── home_page.py
│   │   ├── daily_forecast_page.py
│   │   ├── day_detail_page.py
│   │   └── hourly_forecast_page.py
│   ├── types/
│   │   └── weather.py
│   └── utils/
│       ├── temperature_converter.py
│       ├── report_helpers.py
│       ├── file_reporter.py
│       └── logger.py
├── tests/
│   └── weather/
│       └── test_daily_weather.py
├── reports/
└── scripts/
    ├── run-hourly.sh
    └── com.weather.scraper.plist

Test Design

Two sequential pytest tests in tests/weather/test_daily_weather.py:

  • test_scrape_all_days_and_validate_weather_data
  • test_save_reports_in_json_and_html_formats

The two tests share a session-scoped TestStore (see src/context/test_store.py), mirroring the worker-scoped fixture in the TypeScript original. Pytest collects in declaration order, so the save test runs after the scrape test.

Selector Strategy

Page objects use layered selector fallback in priority order:

  1. data-qa selectors (most stable)
  2. semantic class/structure selectors
  3. text-based selectors (last fallback)

This reduces flaky scraping when AccuWeather changes markup.

Configuration

Copy .env.example to .env and tweak:

  • CITY_NAME
  • CITY_LOCATION_ID
  • CITY_COUNTRY
  • CITY_SLUG
  • HEADLESS
  • TIMEZONE
  • REPORT_OUTPUT_DIR
  • SCRAPE_DELAY_MS
  • LOG_LEVEL (DEBUG / INFO / WARNING / ERROR / CRITICAL — stdlib logging levels)

Install

Requires Python 3.14+.

python -m venv .venv
# Linux / macOS
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1

pip install -r requirements.txt
playwright install chromium

The runtime uses Playwright's chromium engine with two WAF bypass tricks:

  1. channel="chrome" — launches the system-installed Google Chrome instead of bundled Chromium (real TLS fingerprint).
  2. ...playwright.devices["Desktop Chrome"] spread into the context — overrides the User-Agent to a real-Chrome string. Without this, headless Chrome's UA contains HeadlessChrome which AccuWeather's WAF rejects with ERR_HTTP2_PROTOCOL_ERROR.

Both are wired up in conftest.py. Make sure Google Chrome is installed; headless runs work out of the box.

Run

# Run the full scrape + report generation
pytest tests/weather/test_daily_weather.py

# Show browser (pytest-playwright CLI flag)
pytest --headed tests/weather/test_daily_weather.py

# Slow operations down to watch what's happening
pytest --headed --slowmo 500 tests/weather/test_daily_weather.py

# Record traces / videos / screenshots only on failure
pytest --tracing retain-on-failure --video retain-on-failure --screenshot only-on-failure

# Print stdout (logger output)
pytest -s tests/weather/test_daily_weather.py

Useful pytest-playwright CLI flags (from the official docs):

Flag Purpose
--browser (chromium / firefox / webkit) Engine to use; can be passed multiple times
--browser-channel Use a system browser channel (e.g. chrome)
--headed Show the browser window
--slowmo <ms> Add delay between operations
--device <name> Emulate a device profile
--tracing on / off / retain-on-failure Record Playwright traces
--video on / off / retain-on-failure Record videos
--screenshot on / off / only-on-failure Capture screenshots
--output <dir> Where artifacts go (default test-results/)

Defaults are pinned in pyproject.toml[tool.pytest.ini_options]: --browser chromium --browser-channel chrome (system Chrome via the chromium engine).

Hourly Scheduling

# Run 24 times, once per hour
bash scripts/run-hourly.sh

# Single run (suitable for cron / launchd)
bash scripts/run-hourly.sh 1

scripts/com.weather.scraper.plist is a macOS LaunchAgent example.

Output

Each run writes files to reports/:

  • weather_*.json — full report payload
  • weather_report_*.html — readable report page

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors