Skip to content
A tool for harvesting media files from Open Access articles for upload into Wikimedia Commons
Python Shell Stata
Find file
Pull request Compare This branch is 389 commits ahead, 13 commits behind RaphaelWimmer:master.
Failed to load latest commit information.
doc/man + man page for oa-get(1)
helpers + put URL in page template
sources reverting the license to licence change
tests adapted for CC0
.gitignore added .gitignore file with *.pyc
README added GPL3 licensing
doi_pref.tsv tiny fix + added converting flag to database
notes + URLs relevant to this project
oa-cache + oa-cache print-database-path functionality
oa-get * display name of MediaWiki installation instead of API URL
oa-pmc-ids * handle timeout of HTTP connections in oa-pmc-ids
oa-put + put URL in page template * test script for media helper
oami_pmc_doi_detect_duplicates + script for DOI-based duplicate detection
oami_pmc_doi_detect_duplicates_test + test script for duplicate detection
oami_pmc_doi_import - remove database removal for DOI and PMCID import
oami_pmc_doi_import_test + test script for problematic DOIs identified by Daniel Mietchen
oami_pmc_pmcid_import - remove database removal for DOI and PMCID import cron script now kill existing instances on startup, Ctrl-C, to avoid …
plot-helper * larger area for plot * plot does not count edits anymore, but edits resulting in new pages
screencast + screencast
userconfig.example +10.3352


The aim of this project is to write a tool that would:
* regularly spider PubMed Central to locate audio and video files published in the supplementary materials of CC BY-licensed articles in the Open subset
* convert these files to OGG
* upload them to Wikimedia Commons, along with the respective metadata
* provide for easy extension to other CC-BY sources, beyond PubMed Central
* (possibly) suggest Wikipedia articles for which the video might be relevant

Wiki page:

    oa-get [download-metadata|download-media] [dummy|pmc|pmc_doi]
    oa-cache [browse-database|clear-database|clear-media|convert-media|find-media|list-articles|stats] [dummy|pmc|pmc_doi]
    oa-put upload-media [dummy|pmc|pmc_doi]

    python-dateutil <>
    python-elixir <>
    python-gst0.10 <>
    python-magic <>
    python-mutagen <>
    python-progressbar <>
    python-xdg <>
    python-werkzeug <>
    python-wikitools <> (python-wikitools was imported into our tree and patched to ease deployment)

    sqlitebrowser <>

To use the upload feature of oa-put, copy the userconfig.example file to

A screencast showing usage can be played back with “ttyplay screencast”.

To plot mimetypes occurring in sources, install python-matplotlib and pipe the output of “oa-cache stats [source]” to the included plot-helper script.

The Open Access Media Importer is free software: 
you can redistribute it and/or modify it 
under the terms of the GNU General Public License 
as published by the Free Software Foundation, 
either version 3 of the License, 
or (at your option) any later version.
Something went wrong with that request. Please try again.