Skip to content

VinAIResearch/PhoDisfluency

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

PhoDisfluency: Disfluency Detection for Vietnamese

PhoDisfluency is a dataset for Vietnamese disfluency detection task, consisting of about 6K Vietnamese utterances with nearly 14K manual annotations over two disfluency types (Reparandum and Interregnum). The construction of PhoDisfluency and experimental results can be found in our WNUT 2022 paper:

@inproceedings{PhoDisfluency,
    title     = {{Disfluency Detection for Vietnamese}},
    author    = {Mai Hoang Dao and Thinh Hung Truong and Dat Quoc Nguyen},
    booktitle = {Proceedings of the 8th Workshop on Noisy User-generated Text (WNUT)},
    year      = {2022}
}  

By downloading the PhoDisfluency dataset, USER agrees:

  • to use PhoDisfluency for research or educational purposes only.
  • to not distribute PhoDisfluency or part of PhoDisfluency in any original or modified form.
  • and to cite our WNUT 2022 paper above whenever PhoDisfluency is employed to help produce published results.

Copyright (c) 2022 VinAI Research

THE DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE DATA OR THE USE OR OTHER DEALINGS IN THE
DATA.

About

Disfluency Detection for Vietnamese (WNUT 2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages