Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

How to train using my own dataset? #516

@ndvbd

Description

@ndvbd

I would like to train en->fr using the Transformer model on one GPU, and by using wordpieces (sentencepiece) of 32k vocab, like in GNMT. Now I have my own dataset of English sentences and corresponding French sentences.

What's the right way to do it with T2T? Is there a way to pass the parameters of the test/dev/test source and target files in the command line, or must I register my own Problem, by adding a class to the data-generators folder?

Will the 32k vocab be generated automatically given the source and target corpuses I supply?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions