Entity Synthetic Dataset

The Entity Synthetic Dataset is a multi-speaker multi-locale (en-*) TTS synthetic dataset for entities collected from NELL and Yago for paper "Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems".

Please use the dataset for research or non-commercial purpose.

Get the data

The dataset is available both on OneDrive and BaiduCloud with scripts in txt files and synthetic audio in zip files. Please select either the resource according to your convenience.

Changes

August 2022: update entity synthetic dataset and examples.

References

[1] T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling, “Neverending learning,” in Proc. AAAI, 2015.

[2] T. P. Tanon, G. Weikum, and F. Suchanek, “Yago 4: A reasonable knowledge base,” in Extended Semantic Web Conference, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Entity Synthetic Dataset

Get the data

Changes

References

About

Releases

Packages

License

zombbie/entity-synthetic-dataset

Folders and files

Latest commit

History

Repository files navigation

Entity Synthetic Dataset

Get the data

Changes

References

About

Resources

License

Stars

Watchers

Forks