Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

16 lines (10 sloc) 0.328 kb
Grab all Wikipedia abstracts, in all languages
For every dump in:
http://dumps.wikimedia.org/backup-index.html
find the file abstract.xml and wget it.
USAGE:
./grab-wikipedia-abstracts.py
This will create a directory download.wikimedia.org/ with the abstract.xml files.
REQUIREMENTS:
* BeautifulSoup
* wget
Jump to Line
Something went wrong with that request. Please try again.