Skip to content

Tokenization and data loading #24

@matanzuckerman

Description

@matanzuckerman

Hi @tomerm @semion1956,

As it seems, today I need to run Tokenization part on the raw data and then load the output for the models.
The problem as I can see it is that we are going to run many tests and if each time I will need to create a new folder with the new files/delete the old own it can be confusing. can we implement that in case I did the data loader before the tokenizaition part the preprocess will be done on the files we loaded? and there is no need to save them if it wasn't explicit (another parameter)

Thanks

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions