Skip to content
Ten Thousand German News Articles Dataset for Topic Classification
Python
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
code added script to generate lowshot sets Feb 15, 2019
LICENSE Initial commit Jan 30, 2019
README.md fixed typo Jan 31, 2019
articles.csv removed article authors from dataset Feb 15, 2019
requirements.txt removed article authors from dataset Feb 15, 2019
test.csv removed article authors from dataset Feb 15, 2019
train.csv removed article authors from dataset Feb 15, 2019

README.md

Ten Thousand German News Articles Dataset

For more information visit the detailed project page.

  1. Install the required python packages pip install -r requirements.txt.
  2. Download the corpus.sqlite3 file into the project root from here (compressed) or directly from here.
  3. Run python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv to extract the articles.
  4. Run python code/split_articles_into_train_test.py to split the dataset.

License

All code in this repository is licensed under a MIT License.

The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

You can’t perform that action at this time.