Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Grab all Wikipedia abstracts, in all languages
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
README
grab-wikipedia-abstracts.py

README

Grab all Wikipedia abstracts, in all languages

For every dump in:
    http://dumps.wikimedia.org/backup-index.html
find the file abstract.xml and wget it.

USAGE:
    ./grab-wikipedia-abstracts.py

This will create a directory download.wikimedia.org/ with the abstract.xml files.

REQUIREMENTS:
    * BeautifulSoup

    * wget
Something went wrong with that request. Please try again.