Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

Ten Thousand German News Articles Dataset

For more information visit the detailed project page.

  1. Install the required python packages pip install -r requirements.txt.
  2. Download the corpus.sqlite3 file into the project root from here (compressed) or directly from here.
  3. Run python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv to extract the articles.
  4. Run python code/split_articles_into_train_test.py to split the dataset.

License

All code in this repository is licensed under a MIT License.

The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.