Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Is it a good idea to apply NER for translation #1106

@lkluo

Description

@lkluo

This may not directly related to tensor2tensor, but I am curious about to what extend NER could improve translation quality in general. Here are some examples where NER could apply.

  1. numbers, especially long digits e.g., 100,000,000. To my understand, it would be split into a serious of tokens with each element of being one digits. If it could be replaced by a special token such as _number, the length will be reduced to one.
  2. dates, for example 2 October 2018 will also be one token if converted properly.

The benefit of doing this is to shorten sentences and thus yielding a more simplified sentence structure. On the other hand, there are also bad sides. For example, it relies on a good NER system. It may also cause trouble when post-processing those NEs after translated into another language, one of which could be to retain the orders of NEs as of in source sentences.

Any comments, suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions