Skip to content

zombbie/entity-synthetic-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Entity Synthetic Dataset

The Entity Synthetic Dataset is a multi-speaker multi-locale (en-*) TTS synthetic dataset for entities collected from NELL and Yago for paper "Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems".

Please use the dataset for research or non-commercial purpose.

Get the data

The dataset is available both on OneDrive and BaiduCloud with scripts in txt files and synthetic audio in zip files. Please select either the resource according to your convenience.

Changes

August 2022: update entity synthetic dataset and examples.

References

[1] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling, “Neverending learning,” in Proc. AAAI, 2015.

[2] T. P. Tanon, G. Weikum, and F. Suchanek, “Yago 4: A reasonable knowledge base,” in Extended Semantic Web Conference, 2020.

About

Documentation on how to access and use the entity tts synthetic dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published