# HanLP: Han Language Processing

The doc and code example is from https://hanlp.hankcs.com/.

The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry. HanLP was designed from day one to be efficient, user friendly and extendable. It comes with pretrained models for various human languages including English, Chinese, Japanese and many others.

HanLP offers out-of-the-box RESTful API and native Python API which share very similar interfaces while they are designed for different scenes.

## RESTful API

In [1]:
pip install hanlp_restful

Collecting hanlp_restful
  Downloading hanlp_restful-0.0.23.tar.gz (11 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting hanlp_common (from hanlp_restful)
  Downloading hanlp_common-0.0.19.tar.gz (28 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting phrasetree (from hanlp_common->hanlp_restful)
  Downloading phrasetree-0.0.8.tar.gz (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: hanlp_restful, hanlp_common, phrasetree
  Building wheel for hanlp_restful (setup.py) ... [?25ldone
[?25h  Created wheel for hanlp_restful: filename=hanlp_restful-0.0.23-py3-none-any.whl size=10641 sha256=474175103e7069dab5b105479af1094a4f099e6c35d70598d49e08c43d0ea440
  Stored in directory: /home/jovyan/.cache/pip/wheels/cf/ea/9e/87939847776d8ed56baadf3b56f4ca56761e93cfc111189bd8
  Building wheel 

In [2]:
from hanlp_restful import HanLPClient
# Fill in your auth, set language='zh' to use Chinese models
HanLP = HanLPClient('https://hanlp.hankcs.com/api', auth=None, language='mul')
doc = HanLP('In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments. ' \
            '2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。' \
            '2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。')
# print(doc)

### Visualization

In [3]:
doc.pretty_print()

## Native API

In [4]:
pip install hanlp

Collecting hanlp
  Obtaining dependency information for hanlp from https://files.pythonhosted.org/packages/b4/a1/24f1b30d1cef8c1598bded3c2145f10473b3240452e136fe9ea81614dd81/hanlp-2.1.0b50-py3-none-any.whl.metadata
  Downloading hanlp-2.1.0b50-py3-none-any.whl.metadata (13 kB)
Collecting hanlp-downloader (from hanlp)
  Downloading hanlp_downloader-0.0.25.tar.gz (13 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting hanlp-trie>=0.0.4 (from hanlp)
  Downloading hanlp_trie-0.0.5.tar.gz (6.7 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting pynvml (from hanlp)
  Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
Collecting termcolor (from hanlp)
  Downloading termcolor-2.3.0-py3-none-any.whl (6.9 kB)
Collecting tokenizers==0.11.6 (from hanlp)
  Downloading tokenizers-0.11.6-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.5 MB

### Multi-Task Learning

In [5]:
import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_XLMR_BASE)
print(HanLP(['In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments.',
             '2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。',
             '2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。']))

Downloading https://file.hankcs.com/hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20220608_003435.zip to /home/jovyan/.hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20220608_003435.zip
Decompressing /home/jovyan/.hanlp/mtl/ud_ontonotes_tok_pos_lem_fea_ner_srl_dep_sdp_con_xlm_base_20220608_003435.zip to /home/jovyan/.hanlp/mtl
Downloading https://file.hankcs.com/hanlp/transformers/xlm-roberta-base_20210706_125502.zip to /home/jovyan/.hanlp/transformers/xlm-roberta-base_20210706_125502.zip
Decompressing /home/jovyan/.hanlp/transformers/xlm-roberta-base_20210706_125502.zip to /home/jovyan/.hanlp/transformers
                                   

{
  "tok": [
    ["In", "2021", ",", "HanLPv2.1", "delivers", "state-of-the-art", "multilingual", "NLP", "techniques", "to", "production", "environments", "."],
    ["2021年", "、", "HanLPv2.1", "は", "次世代", "の", "最", "先端", "多言語", "NLP", "技術", "を本番環境", "に", "導入", "し", "ます", "。"],
    ["2021", "年", "HanLPv2.1", "为", "生产", "环境", "带来", "次世代", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"]
  ],
  "ner": [
    [["2021", "DATE", 1, 2], ["HanLPv2.1", "WORK_OF_ART", 3, 4]],
    [["2021年", "DATE", 0, 1]],
    [["2021 年", "DATE", 0, 2]]
  ],
  "srl": [
    [[["In 2021", "ARGM-TMP", 0, 2], ["HanLPv2.1", "ARG0", 3, 4], ["delivers", "PRED", 4, 5], ["to production environments", "ARG2", 9, 12]]],
    [],
    [[["2021 年", "ARGM-TMP", 0, 2], ["为 生产 环境", "ARG2", 3, 6], ["带来", "PRED", 6, 7], ["次世代 最 先进 的 多 语种 NLP 技术", "ARG1", 7, 15]], [["最", "ARGM-ADV", 8, 9], ["先进", "PRED", 9, 10]]]
  ],
  "sdp/dm": [
    [[], [[1, "ARG2"]], [], [[5, "ARG1"]], [[1, "ARG1"]], [], [], [], [[5, "ARG2"], [6, "ARG1"], [7, "ARG1"

### Single-Task Learning

HanLP also provides a full spectrum of single-task learning models for core NLP tasks including tagging and parsing. Please refer to the documentations of [pretrained models](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/index.html) for details.