Skip to content

xssChauhan/datasets_saver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tests

datasets_saver

Export Huggingface datasets in persistable formats using CLI.

Installation

  1. Clone the repo
  2. Install using pip pip install .
  3. Now it is available as a CLI command.

CLI Interface

datasets download --help cli help

Example: datasets download imdb This will download the imdb dataset and persists it in csv format. The default output location is ~/saved_datasets/. A dataset can be saved in csv, json and parquet files. All the splits/files of a dataset are downloaded and stored separately. The director ~/saved_datasets is populated as follows:

$ tree ~/saved_datasets/
.
└── imdb
    ├── test.csv
    ├── train.csv
    └── unsupervised.csv

2 directories, 3 files

Similarly, the dataset can be downloaded in json and parquet files by using the --format option:

JSON: $ datasets download imdb --format json
Parquet: $ datasets download imdb --format parquet

About

Export Huggingface datasets in persist able formats

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages