Kaidan Nihonbunka: A Journey Through Hyakumonogatari's Ghostly Tales

Welcome to the Kaidan Nihonbunka Dataset

About Name

kaidan Nihonbunka translates to 怪談日本文化 in Japanese:

怪談 (Kwaidan): Ghost story or supernatural tale.
日本文化 (Nihonbunka): Japanese culture.

So, the translated name would be 怪談日本文化.

Overview

The kaidan Nihonbunka Dataset is a collection of Japanese folklore of ghost stories, also known as "kaidan", associated with the traditional Japanese ritual of Hyakumonogatari. This dataset contains approximately 8000 rows of ghost stories, including their old names, new names generated by GPT-4, the text content of the stories, and URLs for additional information or sources.

You find code of this dataset in my Huggingface account v3xlrm1nOwo1.

Data Format

The dataset is provided in two formats `Parquet` and `Pickle`:

These formats and fields provide flexibility for different use cases, allowing researchers and data scientists to work with the dataset using their preferred tools and programming languages.

Parquet File: Contains structured data in a columnar format, suitable for data analysis and processing with tools like Apache Spark.
Pickle File: Contains a serialized Python object, allowing for easy loading and manipulation of the dataset in Python environments.

Dataset Fields

Each entry in the dataset is represented by a row with the following fields:

Field	Description
`Old Name`	The old name or previous designation of the ghost story.
`New Name`	Generated by GPT-4, this column contains the new name or a modernized version of the ghost story's title.
`Kaidan`	The text or content of the ghost story.
`URL`	Contains URLs related to the ghost story, such as links to additional information or sources.

Usage

Researchers, data scientists, and enthusiasts interested in Japanese folklore, ghost stories, or cultural rituals like Hyakumonogatari can utilize this dataset for various purposes, including:

Analyzing themes and patterns in ghost stories.
Building machine learning models for story generation or classification.
Exploring connections between traditional rituals and storytelling.

import datasets

# Load the dataset
dataset = datasets.load_dataset('v3xlrm1nOwo1/KaidanNihonbunka')

print(dataset)

DatasetDict({
    train: Dataset({
        features: ['old name', 'new name', 'kaidan', 'url'],
        num_rows: 8559
      })
  })

Acknowledgments

We would like to acknowledge the creators of the original ghost stories and the individuals or sources that contributed to compiling this dataset. Without their efforts, this collection would not be possible.

License

This dataset is distributed under the Apache License 2.0, allowing for flexible usage and modification while ensuring proper attribution and adherence to copyright laws.

NOTE: To contribute to the project, please contribute directly. I am happy to do so, and if you have any comments, advice, job opportunities, or want me to contribute to a project, please contact me I am happy to do so v3xlrm1nOwo1@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
am2ji-shorthorror		am2ji-shorthorror
assets		assets
data		data
hukumusume		hukumusume
kikikaikai		kikikaikai
恐怖の泉		恐怖の泉
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
auto_restart.sh		auto_restart.sh
clean_data.py		clean_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaidan Nihonbunka: A Journey Through Hyakumonogatari's Ghostly Tales

About Name

Overview

Data Format

The dataset is provided in two formats `Parquet` and `Pickle`:

Dataset Fields

Usage

Acknowledgments

License

About

Releases

Packages

Languages

License

v3xlrm1nOwo1/KaidanNihonbunka

Folders and files

Latest commit

History

Repository files navigation

Kaidan Nihonbunka: A Journey Through Hyakumonogatari's Ghostly Tales

About Name

Overview

Data Format

The dataset is provided in two formats Parquet and Pickle:

Dataset Fields

Usage

Acknowledgments

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

The dataset is provided in two formats `Parquet` and `Pickle`:

Packages