Skip to content

Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, date/time, etc. in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI, NL. Partial support for JA, KO, AR, SV). Packages available at: https://www.nuget.org/profiles/Recognizers.Text, https://www.npmjs.com/~recognizers.text

License

Notifications You must be signed in to change notification settings

xuanhua/Recognizers-Text

 
 

Repository files navigation

Extented Chinese date-time recognition for Python

This project is based on Miscrosoft Recognizers Text. We add some extension to meet our internal usage and are glad to share it with the community.

Building and installation

Clone this repo

git clone https://github.com/xuanhua/Recognizers-Text.git 
cd Recognizers-Text

Uninstall official version of recognizers-text if you have installed it

If you have installed recognizers-text via pip you could use following command to do uninstallation:

pip uninstall recognizers-text

If some sub-package of recognizers-text remains, you might meet weird errors later.

Build and do installatioon

Launch build.sh file to install requirements, generate resources, install local packages and run all tests.

  cd Recognizers-Text/Python
  ./build.sh

New features besides official recognizers-text supported

  • Support for next week of next monday only for Chinese, e.g "下下周一"

Below is the original README content

Microsoft Recognizers Text Overview

Build Status Build Status

Microsoft.Recognizers.Text provides robust recognition and resolution of entities like numbers, units, and date/time; expressed in multiple languages. Full support for Chinese, English, French, Spanish, Portuguese, German, Italian, Turkish, Hindi, and Dutch. Partial support for Japanese, Korean, Arabic, and Swedish. More on the way.

Utilizing the Project

Microsoft.Recognizers.Text powers pre-built entities in LUIS: Language Understanding Intelligent Service, Power Virtual Agents, and Microsoft Bot Framework; base entity types in Text Analytics Cognitive Service; and it is also available as standalone packages (for the base classes and the different entity recognizers).

The Microsoft.Recognizers.Text packages currently target four platforms:

Contributions are greatly welcome! Both for fixes and extensions in the currently supported languages and for expansion to new ones. Especially for Japanese, Korean, Arabic, Swedish, and others! More info below.

.NET is the primary package version and contributions propagate to the other platforms with time.

Citing the Recognizers-Text project

If you utilize the recognizers in academic works, please cite it as below (you can omit the version number or update it to a specific version if relevant):

@software{soft:recognizers-text,
  author    = {Wenhao Huang and Zijia Lin and Chris McConnell and B{\"{o}}rje F. Karlsson},
  title     = {{Recognizers-Text}: {R}ecognition and resolution of numbers, units, and date/time entities expressed across multiple languages},
  month     = jul,
  year      = 2017,
  publisher = {Zenodo},
  version   = {1.0.0},
  doi       = {10.5281/zenodo.6860598},
  url       = {https://doi.org/10.5281/zenodo.6860598}
}

Feel free to change "@software" to "@misc" if it better fits your templates.

Help

If you have any questions, please go ahead and open an issue, even if it's not an actual bug. Issues are an acceptable discussion forum as well.

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Good starting points for contribution are:

  • the list of open issues (especially those marked as help wanted);
  • the json spec cases temporarily marked as NotSupported (Specs); and
  • translating json test spec cases that work in English, but don't yet exist in a target language.

The links below describe the project structure and provide both an overview and tips on how to contribute (although some steps may have become a little out-of-date). Thank you!

Supported Entities across Cultures

The table below summarizes the currently supported entities. Support for English is usually more complete than others. The primary platform is .NET (shown in table) and support should propagate to the others.

Entity Type EN ZH-CN NL FR DE IT JA KO PT ES
Number (cardinal)
Ordinal
Percentage
Number Range PA/EO
Unit - Age PA/EO
Unit - Currency PA/EO
Unit - Dimensions PA/EO
Unit - Temperature
Choice - Boolean SO
Seq. - E-mail G G* G G G G G* G* G G
Seq. - GUID G G G G G G G G G G
Seq. - Social G G G G G G G G G G
Seq. - IP Address G G G G G G G G G G
Seq. - Phone Number G G G G G G G G G G
Seq. - URL G G* G G G G G* G* G G
DateTime (+subtypes) SO
Entity Type SV BG TR HI AR
Number (cardinal) PA/EO
Ordinal PA/EO
Percentage PA/EO
Number Range PA/EO
Unit - Age
Unit - Currency
Unit - Dimensions
Unit - Temperature
Choice - Boolean
Seq. - E-mail G G G G G
Seq. - GUID G G G G G
Seq. - Social G G G G G
Seq. - IP Address G G G G G
Seq. - Phone Number
Seq. - URL G G G G* G*
DateTime (+subtypes) SP SO
  • G: Generic entity, not language-specific (* unicode TLDs not-supported);
  • EO: Extraction-only (parsing/resolution/normalization pending);
  • PA: Partial support (type not fully supported);
  • SO: Specs-only (test specs coverage OK, but support pending);
  • SP: Partial specs;
  • SI: Very initial specs (typically language support start for a new language).

About

Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, date/time, etc. in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI, NL. Partial support for JA, KO, AR, SV). Packages available at: https://www.nuget.org/profiles/Recognizers.Text, https://www.npmjs.com/~recognizers.text

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 41.6%
  • Python 19.7%
  • Java 19.1%
  • JavaScript 9.8%
  • TypeScript 9.6%
  • HTML 0.1%
  • Other 0.1%