Playwright (Python) port of the original TypeScript scraper. It scrapes the AccuWeather daily forecast for a configured city, validates F⇄C temperature conversions, enriches humidity from the hourly forecast page, and writes JSON
- HTML reports.
- Open
https://www.accuweather.com - Navigate to the configured city's daily forecast page
- Scrape every visible day card (typically up to 15 on the free tier)
- For each day, open the detail page and extract Day/Night periods
- Enrich humidity from the hourly forecast page (only the first ~4 days have hourly data on the AccuWeather free tier; days 5-15 fall back to N/A)
- Validate Fahrenheit ⇄ Celsius conversion
- Save the report as JSON + HTML
daily-weather-python/
├── pyproject.toml
├── requirements.txt
├── conftest.py # pytest-playwright fixtures + AppContext wiring
├── .env.example
├── src/
│ ├── config/
│ │ └── config.py
│ ├── context/
│ │ ├── app_context.py
│ │ └── test_store.py
│ ├── pages/
│ │ ├── base_page.py
│ │ ├── home_page.py
│ │ ├── daily_forecast_page.py
│ │ ├── day_detail_page.py
│ │ └── hourly_forecast_page.py
│ ├── types/
│ │ └── weather.py
│ └── utils/
│ ├── temperature_converter.py
│ ├── report_helpers.py
│ ├── file_reporter.py
│ └── logger.py
├── tests/
│ └── weather/
│ └── test_daily_weather.py
├── reports/
└── scripts/
├── run-hourly.sh
└── com.weather.scraper.plist
Two sequential pytest tests in tests/weather/test_daily_weather.py:
test_scrape_all_days_and_validate_weather_datatest_save_reports_in_json_and_html_formats
The two tests share a session-scoped TestStore (see src/context/test_store.py),
mirroring the worker-scoped fixture in the TypeScript original. Pytest collects
in declaration order, so the save test runs after the scrape test.
Page objects use layered selector fallback in priority order:
data-qaselectors (most stable)- semantic class/structure selectors
- text-based selectors (last fallback)
This reduces flaky scraping when AccuWeather changes markup.
Copy .env.example to .env and tweak:
CITY_NAMECITY_LOCATION_IDCITY_COUNTRYCITY_SLUGHEADLESSTIMEZONEREPORT_OUTPUT_DIRSCRAPE_DELAY_MSLOG_LEVEL(DEBUG / INFO / WARNING / ERROR / CRITICAL — stdliblogginglevels)
Requires Python 3.14+.
python -m venv .venv
# Linux / macOS
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
pip install -r requirements.txt
playwright install chromiumThe runtime uses Playwright's chromium engine with two WAF bypass tricks:
channel="chrome"— launches the system-installed Google Chrome instead of bundled Chromium (real TLS fingerprint)....playwright.devices["Desktop Chrome"]spread into the context — overrides the User-Agent to a real-Chrome string. Without this, headless Chrome's UA containsHeadlessChromewhich AccuWeather's WAF rejects withERR_HTTP2_PROTOCOL_ERROR.
Both are wired up in conftest.py. Make sure Google Chrome is installed; headless runs work out of the box.
# Run the full scrape + report generation
pytest tests/weather/test_daily_weather.py
# Show browser (pytest-playwright CLI flag)
pytest --headed tests/weather/test_daily_weather.py
# Slow operations down to watch what's happening
pytest --headed --slowmo 500 tests/weather/test_daily_weather.py
# Record traces / videos / screenshots only on failure
pytest --tracing retain-on-failure --video retain-on-failure --screenshot only-on-failure
# Print stdout (logger output)
pytest -s tests/weather/test_daily_weather.pyUseful pytest-playwright CLI flags (from the official docs):
| Flag | Purpose |
|---|---|
--browser (chromium / firefox / webkit) |
Engine to use; can be passed multiple times |
--browser-channel |
Use a system browser channel (e.g. chrome) |
--headed |
Show the browser window |
--slowmo <ms> |
Add delay between operations |
--device <name> |
Emulate a device profile |
--tracing on / off / retain-on-failure |
Record Playwright traces |
--video on / off / retain-on-failure |
Record videos |
--screenshot on / off / only-on-failure |
Capture screenshots |
--output <dir> |
Where artifacts go (default test-results/) |
Defaults are pinned in pyproject.toml → [tool.pytest.ini_options]:
--browser chromium --browser-channel chrome (system Chrome via the chromium engine).
# Run 24 times, once per hour
bash scripts/run-hourly.sh
# Single run (suitable for cron / launchd)
bash scripts/run-hourly.sh 1scripts/com.weather.scraper.plist is a macOS LaunchAgent example.
Each run writes files to reports/:
weather_*.json— full report payloadweather_report_*.html— readable report page